Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect DFA example in documentation #8

Closed
PhilipHazel opened this issue Aug 23, 2021 · 1 comment
Closed

Incorrect DFA example in documentation #8

PhilipHazel opened this issue Aug 23, 2021 · 1 comment
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@PhilipHazel
Copy link
Collaborator

This is #2756 in the old Bugzilla, submitted by S. Shuck.

The DFA example in the docs demonstrating finding every match does not work as expected (details omitted).

PH: This is not a bug, but a misunderstanding. You used match_data_create_from_code() to set up a match data block. As your pattern contains no capturing parentheses, this will create a block with a very small ovector (enough to hold just the whole match, no captured groups). However, when you use the DFA matcher, the ovector is used in a different way, as explained in the pcre2api page:

"On success, the yield of the function is a number greater than zero, which is
the number of matched substrings. The offsets of the substrings are returned in
the ovector, and can be extracted by number in the same way as for
\fBpcre2_match()\fP, but the numbers bear no relation to any capture groups
that may exist in the pattern, because DFA matching does not support capturing."

As your example should yield 3 matches, the ovector is not big enough, and therefore the yield is zero. If you change the match data creation to create a match data block with at least 3 ovector pairs, your example should return 3.

SS: Thanks for the insight. I'm unblocked for the moment.

The docs for pcre2_match_data_create_from_pattern() says "The ovector is created to be exactly the right size to hold all the substrings a pattern might capture." I guess I could have figured out that this number is not computable in the general case for DFA matching. Nevertheless, this sentence is false without a disclaimer about this case.

PH: Yes, I've noted that the documentation needs clarification, but it's too late for 10.37, which has been released today. I'll update the doc in due course - I suspect that DFA matching is in practice not used very much.

@PhilipHazel PhilipHazel added the documentation Improvements or additions to documentation label Aug 23, 2021
@PhilipHazel PhilipHazel self-assigned this Aug 23, 2021
@PhilipHazel
Copy link
Collaborator Author

I have done a number of documentation updates to clarify this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant