Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't crash on invalid UTF-8 byte sequences. #664

Conversation

BMorearty
Copy link
Contributor

If a line of code has an invalid byte sequence in UTF-8, count it as a relevant line rather than crashing.

Please don't dismiss this as a useless thing to do. It turns out the Builder gem has a line of code with an invalid UTF-8 byte sequence. If a project uses track_files "**/*.rb" and has the builder gem in a subdir such as vendor, SimpleCov will fail even though the project code is perfectly valid.

This fixes a regression that was introduced in v0.15.0 when add_not_loaded_files was changed to call LinesClassifier#classify, which runs a RegExp on each line--and raises ArgumentError if the line had an invalid byte sequence.

If a line of code has an invalid byte sequence in UTF-8, count it
as a relevant line rather than crashing.

Useful for projects that use `track_files "**/*.rb"` and have the
builder gem in a subdir such as vendor. Builder has a line of code with
an invalid UTF-8 byte sequence.
Copy link
Collaborator

@PragTob PragTob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, build fails on too long method... I can quickly handle this myself. Either ignore or more likely extract :)

RELEVANT
end
rescue ArgumentError
# E.g., line contains an invalid byte sequence in UTF-8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good explanatory why comment!

@PragTob
Copy link
Collaborator

PragTob commented Mar 10, 2018

Ok, so I won't fix this now... maybe later. Problem is the skipping variable that relies on the local scope. I guess we could make it an instance variable. Currently evaluating options.

Problem is also that the same logic is applied elsewhere - best we'd put that logic inside the lines classifier so there's no need to duplicate. Question is how to best do that. Currently not coming up with a nice lightweight abstraction to suit both cases. I guess we could use blocks. But wanna check the performance of that, one of the reasons I got #665 - that helps us evaluate if it creates a performance problem or not.

@BMorearty
Copy link
Contributor Author

Good point that the same logic is applied elsewhere.

I'm motivated to get this fixed (it's a must-have for us so I need to either fix it here or fork the project for Airbnb, which I'd rather not do). I'm trying out different solutions so you don't have to spend too much time on this. I know open-source maintenance is time-consuming.

... later ...

I thought about using if (line =~ self.class.no_cov_line rescue true) but it's less clear and still doesn't make Rubocop happy.

I also attempted to refactor the two places that use this shared logic into one but I don't see any way to do it. There are too many logic differences:

  1. One checks for WHITESPACE_OR_COMMENT_LINE and one does not.
  2. One has a final else clause and one does not.

I've got an alternate solution in another branch that Rubocop is ok with but does not attempt to share logic between the two methods since they are too different. I'll make a separate PR.

@PragTob
Copy link
Collaborator

PragTob commented Mar 10, 2018

super seeded by #669

@PragTob PragTob closed this Mar 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants