-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider 2-character EOL before line continuation #12035
Conversation
|
match ch { | ||
'\n' => { | ||
current_position -= ch.text_len(); | ||
if let Some(carriage_return) = reverse_chars.next_if_eq(&'\r') { | ||
current_position -= carriage_return.text_len(); | ||
} | ||
current_position -= TextSize::new('\\'.text_len().to_u32() * backslash_count); | ||
} | ||
'\r' => { | ||
current_position -= ch.text_len(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main change is to split the match with \r
and \n
into it's own branch and then a common check for line continuation character.
0ed0107
to
61d7288
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. But I agree with you that this is all backwards lexing again. Could we use the previous token kind (e.g. by storing it on the lexer) and test if it is a NonLogicalNewline
to avoid all the complexity of guessing if something's a newline or not?
Or is this not sufficient if there are multiple newlines (in which case storing just one previous token isn't enough?)
I guess there's also:
[a,
# comment
# more comment
def foo
where there's more in between def
and ,
match ch { | ||
'\n' => { | ||
current_position -= ch.text_len(); | ||
if let Some(carriage_return) = reverse_chars.next_if_eq(&'\r') { | ||
current_position -= carriage_return.text_len(); | ||
} | ||
current_position -= TextSize::new('\\'.text_len().to_u32() * backslash_count); | ||
} | ||
'\r' => { | ||
current_position -= ch.text_len(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simplify this logic a bit. It doesn't seem to matter how many new line characters there are, we just want to skip all of them
if matches!(c, '\n' | '\r') {
while Some(newline) = reverse_chars.next_if(|c| matches!(c, '\n' | '\r')) {
current_position -= newline.text_len();
}
// after any new line, handle line continuation
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will work because we want the position of the last unescaped newline character. We would still need to keep track of that position, so we might as well be explicit about this.
Yeah, that's correct. There can be multiple trivia tokens in between which makes it hard to just look at the last token. Although we could split the logic and move "finding the non-logical newline" part to the |
Summary
This PR fixes a bug introduced in #12008 which didn't consider the two character newline after the line continuation character.
For example, consider the following code highlighted with whitespaces:
The lexer is at
def
when it's running the re-lexing logic and trying to move back to a newline character. It encounters\n
and it's being escaped (incorrect) but\r
is being escaped, so it moves the lexer to\n
character. This creates an overlap in token ranges which causes the panic.fixes: #12028
Test Plan
Add a test case with line continuation and windows style newline character.