-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make noqa parsing consistent and more robust #16483
Conversation
CodSpeed Performance ReportMerging #16483 will not alter performanceComparing Summary
|
Oh jeez! Well that performance regression is no good. I'd still welcome thoughts on the "design spec" and I'll work on improving the implementation in the meantime. Converting to draft for now... |
|
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
F405 | 48 | 0 | 48 | 0 | 0 |
F811 | 9 | 0 | 9 | 0 | 0 |
F403 | 9 | 0 | 9 | 0 | 0 |
Linter (preview)
ℹ️ ecosystem check detected linter changes. (+0 -66 violations, +0 -0 fixes in 1 projects; 54 projects unchanged)
qdrant/qdrant-client (+0 -66 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview
- qdrant_client/http/api/aliases_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports - qdrant_client/http/api/aliases_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/aliases_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names - qdrant_client/http/api/beta_api.py:51:54: F405 `AsyncApiClient` may be undefined, or defined from star imports - qdrant_client/http/api/beta_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/beta_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names - qdrant_client/http/api/collections_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports - qdrant_client/http/api/collections_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/collections_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names - qdrant_client/http/api/distributed_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports - qdrant_client/http/api/distributed_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/distributed_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names - qdrant_client/http/api/indexes_api.py:126:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/indexes_api.py:144:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/indexes_api.py:162:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/indexes_api.py:180:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/indexes_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports - qdrant_client/http/api/indexes_api.py:59:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/indexes_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/indexes_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names - qdrant_client/http/api/indexes_api.py:94:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/points_api.py:158:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/points_api.py:192:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/points_api.py:226:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/points_api.py:356:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/points_api.py:424:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/points_api.py:458:19: F405 `WriteOrdering` may be undefined, or defined from star imports ... 32 additional changes omitted for rule F405 - qdrant_client/http/api/points_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/points_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names - qdrant_client/http/api/search_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/search_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names - qdrant_client/http/api/service_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/service_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names - qdrant_client/http/api/snapshots_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/snapshots_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names ... 31 additional changes omitted for project
Changes by rule (3 rules affected)
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
F405 | 48 | 0 | 48 | 0 | 0 |
F811 | 9 | 0 | 9 | 0 | 0 |
F403 | 9 | 0 | 9 | 0 | 0 |
@dylwil3 Additionally, the original implementation uses |
Yeah, I tried removing disallowed whitespace as well and I don't think it makes a difference. I think the regex is too heavy to compete with the manual approach for this simple of a pattern. I sort of agree regarding ascii whitespace, and I do like that the regex is a little more self documenting... I'm torn. I'll try re-implementing the manual approach tomorrow and see if it's worth it - I can always mention the regex we're implementing in the doc-comment even if it's not used directly. |
Okay, performance fixed by reverting to manually implementing regex (now at least the file exemption and in-line cases are handled in visibly the same way). I think the ecosystem check is actually correct, and is one of the changes in behavior consistent with the design: We are now allowing comments after a file-level exemption like: As an aside: It'd be nice if there were a not-too-complex way to alert users to the footgun |
Looks like ecosystem check revealed a few small issues - again reverting to draft briefly Update: Fixed! |
Hmm, I don't understand the ecosystem results. Are we now ignoring the code for file level suppressions? |
There is a missing colon, it should be - # flake8: noqa E501
+ # flake8: noqa: E501 See the comment above #16483 (comment) Note this is now consistent with what we have always done for inline We could instead allow this "syntax error" and merge in the logic used in the rule blanket-noqa (PGH004), but that seemed like a larger change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this.
I've some smaller nit comments that I think will improve readability (and fewer offset calculations!).
My main design question are:
- I find it hard to reason about the 3 different lexing methods and how they interact together. I wonder if we should have a single
NoqaLexer
that does the lexing for the entire comment instead. It may have helper methods similar tolex_code
orlex_codes
to split the logic but we avoid constructing multipleCursor
s. - whether it would make sense to parse all suppression comments eagerly and collect them into a
Vec<SuppressionComment>
similar to what we do in Red Knot.
@@ -263,13 +123,23 @@ pub(crate) fn rule_is_ignored( | |||
code: Rule, | |||
offset: TextSize, | |||
noqa_line_for: &NoqaMapping, | |||
comment_ranges: &CommentRanges, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware that Ruff sometimes tests if a rule is ignored at a specific location similar to how it is done in Red Knot. I assumed it only filters out diagnostics at the very end.
I don't know if this would be a large change but it seems we now have multiple places where we lex noqa comments. Would it make more sense to analyze all comments once and extract all suppression comments (see linked Red Knot issues). Looking up if a rule is ignored would then be simplified to searching for a suppression comment for that line (or range), which should require fewer binary search steps than searching all comments. Emitting errors should then be as simple as iterating over all suppression comments and emitting errors for the incorrect once.
continue; | ||
|
||
for warning in warnings { | ||
warn!("Missing or joined rule code(s) at {path_display}:{line}: {warning}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards moving the message into LexicalWarning
. It's otherwise very easy to forget changing the message here when adding a new variant.
Thanks Micha, for the detailed review and correcting my rookie I agree that incorporating the prefix stripping into the lexer and keeping everything going from left to right would be easier to reason about - I hadn't done this because I thought there was some performance reason for the approach that was already present in the code. But I'll give it a try! I like the idea of parsing the suppression comments eagerly, but I wonder if that should be a follow-up PR. I would have to think about when we parse them and where we store them; I'll have to check out what you've set up in Red Knot and see if the same thing works here. |
I'm okay with that. Also considering that this should go into 0.10 |
For the changelog: Could you write a short summary detailing which comments are no longer accepted that previously were and how the difference will manifest for users? |
Same here. Feel free to merge this PR into the ruff-0.10 feature branch once it's ready |
44eec2b
to
e4e1c58
Compare
For the changelog (let me know if you want me to actually put it in, say,
|
# Summary The goal of this PR is to address various issues around parsing suppression comments by 1. Unifying the logic used to parse in-line (`# noqa`) and file-level (`# ruff: noqa`) noqa comments 2. Recovering from certain errors and surfacing warnings in these cases Closes #15682 Supersedes #12811 Addresses #14229 (comment) Related: #14229 , #12809
# Summary The goal of this PR is to address various issues around parsing suppression comments by 1. Unifying the logic used to parse in-line (`# noqa`) and file-level (`# ruff: noqa`) noqa comments 2. Recovering from certain errors and surfacing warnings in these cases Closes #15682 Supersedes #12811 Addresses #14229 (comment) Related: #14229 , #12809
## Summary Adds @dylwil3's new `noqa` specification to the linter `Error suppression` page instead of the release blog post. Originally taken from his PR comment [here](#16483 (comment)). ## Test Plan None
Summary
The goal of this PR is to address various issues around parsing suppression comments by
# noqa
) and file-level (# ruff: noqa
) noqa commentsCloses #15682
Supersedes #12811
Addresses #14229 (comment)
Related: #14229 , #12809
Current suppression behavior
The implementations for parsing in-line and file-level noqa comments are entirely separate, so a number of discrepancies have arisen. You can unfurl some examples of this below.
Examples of discrepancies
Leading Hashes
This works:
This does not, and emits no warning:
Missing colon
This silently ignores all rules on the line, even though
the user intended a single rule to be ignored:
This prints a warning and does not suppress anything:
Missing Delimiter
This works to suppress both violations:
This warns and suppresses neither:
Multiple Commas
This works to suppress both violations:
This suppresses the first but not the second.
It does not emit a warning:
Missing whitespace before additional comments
This works:
This warns and does nothing:
Additional comments after noqa
This works:
And so does this:
And so does this:
But this emits a warning and suppresses nothing:
Design
The specification for the unified parsing logic is as follows:
#\s*(?:ruff|flake8)\s*:\s*(?i)noqa
#\s*(?i)noqa
noqa
is'#'
, whitespace, or if there are no characters following, then the suppression comment suppressed all rules. If the character following is":"
, proceed to (3). Otherwise, warn the user forInvalidSuffix
.MissingCodes
.[A-Z]+[0-9]+
. While it is expected that there is one rule code per item, parsing continues even if that is not true, (e.g.,F401F841
orF401,,F841
), and the user is warned of aMissingDelimiter
orMissingItem
. If an item does not consist entirely of codes (e.g.F401abc
), parsing aborts with an error and the user is warned of anInvalidCodeSuffix
- the exception is if we encounter#
, in which case parsing stops and returns all codes collected to that point (e.g.# noqa: F401#comment
).Implementation
After trimming the prefix for a file-level or inline noqa, the remaining text is passed to a lexer in the style of
SimpleTokenizer
. There is some small logic to "recover" in the case of missing delimiters or missing items.Other changes and affected rules/behavior
The only rules that appear to be affected are blanket-noqa (PGH004)) and unused-noqa (RUF100) (but we'll see if others come up in the ecosystem check). For example, we no longer parse the first code in
# noqa F401.F841
and instead emit a warning, which changes one of the snapshots forRUF100
.Internally, we no longer distinguish between a
Directive
and aParsedFileExemption
(keeping just the former), since these were used almost identically. They are carried around inside ofNoqaDirectiveLine
andFileNoqaDirectedLine
, respectively, so there is little risk in misunderstanding the context if we just useDirective
in both cases.Review
After responding to comments it may no longer be helpful to proceed commit by commit. However, to see changes in behavior it may be best to view the diff against the first commit rather than main, since it contains a few additional snapshots.
Miscellaneous comments
# noqa
codes #14229 (comment)?