Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with inverting Regex scan loop #62443

Closed
stephentoub opened this issue Dec 6, 2021 · 1 comment
Closed

Experiment with inverting Regex scan loop #62443

stephentoub opened this issue Dec 6, 2021 · 1 comment

Comments

@stephentoub
Copy link
Member

stephentoub commented Dec 6, 2021

The Regex scan loop is currently along the lines of:

while (true)
{
    if (!FindFirstChar()) return false;
    Go();
    if (matched) return true;
}

This is good for cases where matches don't start at the beginning of the string. However, for cases where the match would likely be at the beginning of the input, a form like:

do
{
    Go();
    if (matched) return true;
}
while (FindFirstChar());
return false;

would likely be better, avoiding the extra FindFirstChar call to validate something which Go is already going to validate.

We should evaluate how much this helps one set of cases vs how much this penalizes the other set of cases, and decide whether it's worth switching. (If we do this, note that certain cases in Go currently assume that FindFirstChar has already performed the match, e.g. if the entire expression is a case-sensitive string... that would need to be removed.)

@stephentoub stephentoub added this to the 7.0.0 milestone Dec 6, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Dec 6, 2021
@ghost
Copy link

ghost commented Dec 6, 2021

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

The Regex scan loop is currently along the lines of:

while (true)
{
    if (!FindFirstChar()) return false;
    Go();
    if (matched) return true;
}

This is good for cases where matches don't start at the beginning of the string. However, for cases where the match would likely be at the beginning of the input, a form like:

do
{
    Go();
    if (matched) return true;
}
while (FindFirstChar());
return false;

would likely be better, avoiding the extra FindFirstChar call to validate something which Go is already going to validate.

We should evaluate how much this helps one set of cases vs how much this penalizes the other set of cases, and decide whether it's worth switching.

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions, tenet-performance, untriaged

Milestone: 7.0.0

@joperezr joperezr removed the untriaged New issue has not been triaged by the area owner label Dec 15, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jul 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants