Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change regex source generator to emit returns rather than jumps to NoMatch #65599

Merged
merged 1 commit into from
Feb 27, 2022

Conversation

stephentoub
Copy link
Member

Fixes #62657
This makes the code a bit cleaner to read, e.g. for the pattern a{5,10}b", instead of:

                protected override void Go()
                {
                    global::System.ReadOnlySpan<char> inputSpan = base.runtext;
                    int pos = base.runtextpos, end = base.runtextend;
                    int original_pos = pos;
                    global::System.ReadOnlySpan<char> slice = inputSpan.Slice(pos, end - pos);
                    
                    // Match 'a' atomically at least 5 and at most 10 times.
                    {
                        int iteration = 0;
                        while (iteration < 10 && (uint)iteration < (uint)slice.Length && slice[iteration] == 'a')
                        {
                            iteration++;
                        }
                        
                        if (iteration < 5)
                        {
                            goto NoMatch;
                        }
                        
                        slice = slice.Slice(iteration);
                        pos += iteration;
                    }
                    
                    // Match 'b'.
                    {
                        if (slice.IsEmpty || slice[0] != 'b')
                        {
                            goto NoMatch;
                        }
                    }
                    
                    // The input matched.
                    pos++;
                    base.runtextpos = pos;
                    base.Capture(0, original_pos, pos);
                    return;
                    
                    // The input didn't match.
                    NoMatch:;
                }

we get:

                protected override void Go()
                {
                    global::System.ReadOnlySpan<char> inputSpan = base.runtext;
                    int pos = base.runtextpos, end = base.runtextend;
                    int original_pos = pos;
                    global::System.ReadOnlySpan<char> slice = inputSpan.Slice(pos, end - pos);
                    
                    // Match 'a' atomically at least 5 and at most 10 times.
                    {
                        int iteration = 0;
                        while (iteration < 10 && (uint)iteration < (uint)slice.Length && slice[iteration] == 'a')
                        {
                            iteration++;
                        }
                        
                        if (iteration < 5)
                        {
                            return; // The input didn't match.
                        }
                        
                        slice = slice.Slice(iteration);
                        pos += iteration;
                    }
                    
                    // Match 'b'.
                    {
                        if (slice.IsEmpty || slice[0] != 'b')
                        {
                            return; // The input didn't match.
                        }
                    }
                    
                    // The input matched.
                    pos++;
                    base.runtextpos = pos;
                    base.Capture(0, original_pos, pos);
                }

@Elfocrash, I suspect this is also going to address the small perf difference you saw between RegexOptions.Compiled and the regex source generator for an input that failed early in a very large, complicated pattern.

@joperezr, assuming your span changes go in first, I'll fix up the merge conflicts this causes (e.g. I'll need to change the return;s to return false;s).

@ghost
Copy link

ghost commented Feb 18, 2022

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #62657
This makes the code a bit cleaner to read, e.g. for the pattern a{5,10}b", instead of:

                protected override void Go()
                {
                    global::System.ReadOnlySpan<char> inputSpan = base.runtext;
                    int pos = base.runtextpos, end = base.runtextend;
                    int original_pos = pos;
                    global::System.ReadOnlySpan<char> slice = inputSpan.Slice(pos, end - pos);
                    
                    // Match 'a' atomically at least 5 and at most 10 times.
                    {
                        int iteration = 0;
                        while (iteration < 10 && (uint)iteration < (uint)slice.Length && slice[iteration] == 'a')
                        {
                            iteration++;
                        }
                        
                        if (iteration < 5)
                        {
                            goto NoMatch;
                        }
                        
                        slice = slice.Slice(iteration);
                        pos += iteration;
                    }
                    
                    // Match 'b'.
                    {
                        if (slice.IsEmpty || slice[0] != 'b')
                        {
                            goto NoMatch;
                        }
                    }
                    
                    // The input matched.
                    pos++;
                    base.runtextpos = pos;
                    base.Capture(0, original_pos, pos);
                    return;
                    
                    // The input didn't match.
                    NoMatch:;
                }

we get:

                protected override void Go()
                {
                    global::System.ReadOnlySpan<char> inputSpan = base.runtext;
                    int pos = base.runtextpos, end = base.runtextend;
                    int original_pos = pos;
                    global::System.ReadOnlySpan<char> slice = inputSpan.Slice(pos, end - pos);
                    
                    // Match 'a' atomically at least 5 and at most 10 times.
                    {
                        int iteration = 0;
                        while (iteration < 10 && (uint)iteration < (uint)slice.Length && slice[iteration] == 'a')
                        {
                            iteration++;
                        }
                        
                        if (iteration < 5)
                        {
                            return; // The input didn't match.
                        }
                        
                        slice = slice.Slice(iteration);
                        pos += iteration;
                    }
                    
                    // Match 'b'.
                    {
                        if (slice.IsEmpty || slice[0] != 'b')
                        {
                            return; // The input didn't match.
                        }
                    }
                    
                    // The input matched.
                    pos++;
                    base.runtextpos = pos;
                    base.Capture(0, original_pos, pos);
                }

@Elfocrash, I suspect this is also going to address the small perf difference you saw between RegexOptions.Compiled and the regex source generator for an input that failed early in a very large, complicated pattern.

@joperezr, assuming your span changes go in first, I'll fix up the merge conflicts this causes (e.g. I'll need to change the return;s to return false;s).

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: 7.0.0

@joperezr
Copy link
Member

@stephentoub is this ready to go or are you waiting on @Elfocrash to confirm that this fixes his regression?

@stephentoub
Copy link
Member Author

I'm waiting for your span PR to be merged as the code logically conflicts with that.

@stephentoub stephentoub merged commit ad3847c into dotnet:main Feb 27, 2022
@stephentoub stephentoub deleted the emitternomatch branch February 27, 2022 03:20
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider removing goto statements on Regex source-generated code
2 participants