-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String::remove_matches O(n^2) -> O(n) #83515
Conversation
Clever. Not sure why I didn't think of that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the mentioned comments it looks good to me but not sure if it improves the performance. Maybe we should do a perf run?
0e052e2
to
ba528fb
Compare
Not sure what a perf run would do - does this function have good benchmark coverage? Seems like no and since it's new it's unlikely to show up in any other perf paths. |
No, it's not about benchmark. Although it does have benchmarks but perf run is to check if this affects compile time and runtime of generated code. |
r? @m-ou-se |
After this change, all bytes are still copied/moved several times (exactly the number of matches before them). With 100 matches, the last segment gets moved 100 times. You cannot efficiently do this in reverse: The last segment cannot be put into the right place right away, because its new place still holds data that needs to be moved. Doing this in order starting at the first match means that every segment can directly be moved into the right position. To do so, you'd have to look at the start of the next match to find the end of the segment to move. |
@m-ou-se there's a kernel of truth to what you say, but the optimization you describe requires bookkeeping which does not exist before this change. Consider this string:
When iterating in forward-order, all bytes after |
@tamird Yes, exactly. The current implementation and your implementation are both O(n²). Iterating forwards allows for an O(n) implementation, by only copying the part until the next match.
The current implementation already finds all the matches and puts them all in a Vec before it moves anything, so all the information is already there. |
☔ The latest upstream changes (presumably #84266) made this pull request unmergeable. Please resolve the merge conflicts. |
ba528fb
to
4055e71
Compare
@m-ou-se how does this look? |
4055e71
to
38e8408
Compare
Very nice! It'd be good to narrow the scope of the |
38e8408
to
65adf91
Compare
Done. Replaced |
Copy only non-matching bytes.
bc17b9c
to
977903b
Compare
@bors r+ |
📌 Commit 977903b has been approved by |
Should be perf-sensitive, @bors rollup=never |
⌛ Testing commit 977903b with merge 5943cc133f74e8b9065e6c8c767c2506971a415b... |
💔 Test failed - checks-actions |
This function isn't used anywhere in the standard library or compiler. |
@bors retry |
⌛ Testing commit 977903b with merge cec1b46a23d8b88c4afe74607d7a1d799b3eb025... |
Oh sorry, my bad! |
The job Click to see the possible cause of the failure (guessed by this bot)
|
💔 Test failed - checks-actions |
☀️ Test successful - checks-actions |
Copy only non-matching bytes. Replace collection of matches into a
vector with iteration over rejections, exploiting the guarantee that we
mutate parts of the haystack that have already been searched over.
r? @joshtriplett