-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Regex.Count #61425
Comments
Tagging subscribers to this area: @eerhardt, @dotnet/area-system-text-regularexpressions Issue DetailsBackground and motivationWith regexes, you often want to know how many matches there are in a given piece of text. There are various ways you can do that today, including: public static int Count(Regex r, string input)
{
int count = 0;
Match m = r.Match(input);
while (m.Success)
{
count++;
m = m.NextMatch();
}
return count;
} or public static int Count(Regex r, string input) => r.Matches(input).Count(); but these have a variety of downsides. In addition to being non-obvious, they're more costly than they need to be. Both of them force Match objects and all the state they wrap (captures, etc.) into existence, when for counting that's not necessary, and the latter also forces the MatchCollection into existence and holds on to all of the Match objects until after the operation has completed. We should add Count methods to Regex for this relatively common case. API Proposalnamespace System.Text.RegularExpressions
{
public class Regex
{
+ public int Count(string input);
+ public static int Count(string input, string pattern);
+ public static int Count(string input, string pattern, RegexOptions options);
+ public static int Count(string input, string pattern, RegexOptions options, TimeSpan matchTimeout);
// And once https://github.com/dotnet/runtime/issues/59629 is approved and we support spans
+ public int Count(ReadOnlySpan<char> input);
+ public static int Count(ReadOnlySpan<char>, string pattern);
+ public static int Count(ReadOnlySpan<char>, string pattern, RegexOptions options);
+ public static int Count(ReadOnlySpan<char>, string pattern, RegexOptions options, TimeSpan matchTimeout);
}
} API Usageint numWords = Regex.Count(text, "\b\w+\b"); Alternative Designs
RisksJust additional surface area and all the typical concerns that come with that.
|
What is a use case where you want the count of matches and not just Regex.IsMatch()? |
How many words are in this document? |
I'm definitely not against this API, and I for sure see the use cases for it, my only question is: Are there any big advantages of having this API over the current one-line workaround ( |
Nevermind, I see you talk about this in the proposal and missed it. I think this proposal is ready to review. |
Thanks, @joperezr. I'm curious if you have any thoughts on the Alternative Designs section? |
Regarding the static methods, I think it is fine to have them for consistency with the rest of the Regex API as well as for convenience. I feel like people more often use our static methods as opposed to creating instances, or at least I’ve noticed that they are just as popular. Regarding using a Count on Enumerate instead, I haven’t given too much thought on that one part of the proposal yet, but at least to me it feels like discoverability of the functionality would hurt if we decide to do that instead, and perhaps more people would be inclined to find different solutions (like calling Count() on Matches result). Making this an API directly on Regex type makes it discoverable and preferable to write over the other alternatives. That is my 2 cents from your alternative designs, but as always I’m open for discussion. |
Ok, thanks. Let's stick with the original proposal then.
Yeah, they're just a potential pit of performance failure, so I get a little sad each time I see them used in any larger app. But they are nicely convenient. |
namespace System.Text.RegularExpressions
{
public partial class Regex
{
public int Count(string input);
public static int Count(string input, string pattern);
public static int Count(string input, string pattern, RegexOptions options);
public static int Count(string input, string pattern, RegexOptions options, TimeSpan matchTimeout);
// And once https://github.com/dotnet/runtime/issues/59629 is approved and we support spans
public int Count(ReadOnlySpan<char> input);
public static int Count(ReadOnlySpan<char> input, string pattern);
public static int Count(ReadOnlySpan<char> input, string pattern, RegexOptions options);
public static int Count(ReadOnlySpan<char> input, string pattern, RegexOptions options, TimeSpan matchTimeout);
}
} |
Background and motivation
With regexes, you often want to know how many matches there are in a given piece of text. There are various ways you can do that today, including:
or
but these have a variety of downsides. In addition to being non-obvious, they're more costly than they need to be. Both of them force Match objects and all the state they wrap (captures, etc.) into existence, when for counting that's not necessary, and the latter also forces the MatchCollection into existence and holds on to all of the Match objects until after the operation has completed.
We should add Count methods to Regex for this relatively common case.
API Proposal
API Usage
Alternative Designs
Count()
method to that ref struct, and a consumer would writer.Enumerate(input).Count()
. This would mean we wouldn't add any of the APIs outlined above and instead just add apublic int Count();
method to that type.Risks
Just additional surface area and all the typical concerns that come with that.
The text was updated successfully, but these errors were encountered: