-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't crash on invalid UTF-8 byte sequences. #664
Don't crash on invalid UTF-8 byte sequences. #664
Conversation
If a line of code has an invalid byte sequence in UTF-8, count it as a relevant line rather than crashing. Useful for projects that use `track_files "**/*.rb"` and have the builder gem in a subdir such as vendor. Builder has a line of code with an invalid UTF-8 byte sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, build fails on too long method... I can quickly handle this myself. Either ignore or more likely extract :)
RELEVANT | ||
end | ||
rescue ArgumentError | ||
# E.g., line contains an invalid byte sequence in UTF-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good explanatory why comment!
Ok, so I won't fix this now... maybe later. Problem is the Problem is also that the same logic is applied elsewhere - best we'd put that logic inside the lines classifier so there's no need to duplicate. Question is how to best do that. Currently not coming up with a nice lightweight abstraction to suit both cases. I guess we could use blocks. But wanna check the performance of that, one of the reasons I got #665 - that helps us evaluate if it creates a performance problem or not. |
Good point that the same logic is applied elsewhere. I'm motivated to get this fixed (it's a must-have for us so I need to either fix it here or fork the project for Airbnb, which I'd rather not do). I'm trying out different solutions so you don't have to spend too much time on this. I know open-source maintenance is time-consuming. ... later ... I thought about using I also attempted to refactor the two places that use this shared logic into one but I don't see any way to do it. There are too many logic differences:
I've got an alternate solution in another branch that Rubocop is ok with but does not attempt to share logic between the two methods since they are too different. I'll make a separate PR. |
super seeded by #669 |
If a line of code has an invalid byte sequence in UTF-8, count it as a relevant line rather than crashing.
Please don't dismiss this as a useless thing to do. It turns out the Builder gem has a line of code with an invalid UTF-8 byte sequence. If a project uses
track_files "**/*.rb"
and has the builder gem in a subdir such as vendor, SimpleCov will fail even though the project code is perfectly valid.This fixes a regression that was introduced in v0.15.0 when
add_not_loaded_files
was changed to callLinesClassifier#classify
, which runs a RegExp on each line--and raises ArgumentError if the line had an invalid byte sequence.