Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case insensitivity option (#41) #43

Merged
merged 2 commits into from
Dec 22, 2017
Merged

Case insensitivity option (#41) #43

merged 2 commits into from
Dec 22, 2017

Conversation

bernhof
Copy link

@bernhof bernhof commented Dec 15, 2017

Fix for #41. Implemented as CaseInsensitive property on GlobParseOptions - although I'd say it's more an evaluator/matching option. But that's just naming, I'll let you decide what you think is best here.

In the end, only the bool caseInsensitive itself is passed to Evaluator constructors where applicable. They use ToLowerInvariant on the fly before comparing, if case insensitivity is switched on.

CharacterListTokens, however, are a special case where, from a performance perspective, it would make sense to cache a list of tokens in their lower (invariant) form, since lookups are performed on this list for each character in the input string. Unsure whether caching vs no caching really makes a difference in most cases, though.

A note about "insensitivity" vs "sensitivity": it felt more natural to switch on insensitivity, rather than switching off sensitivity, due to the current default behaviour as well as how e.g. regular expressions work (where an insensitivity switch is required).

Removed IsCurrentCharEqualTo from GlobStringReader: it did case-insensitive (ToLowerInvariant) comparison of chars, which was unnecessary (only compared to path seperators) and confusing. Removed it and replaced with regular == comparison.

Added regression tests (IsMatchCaseInsensitive and added inline data to Does_Not_Match) - all tests passing. Don't currently have time to create benchmark tests that take case insensitivity into account. Hope you'll be able to assist here.

Let me know what you think.

@dazinator
Copy link
Owner

Thank you very much for the PR. I will take a look at this soon. From your explanation above, that all makes sense to me, so I don't forsee any issues getting this merged.


if (_caseInsensitive)
{
start = char.ToLowerInvariant(start);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_token.Start and _token.End don't change, so rather than calling ToLowerInvariant on them for every Match, performance wise it might be better to call ToLowerInvariant on them once in the constructor and store that for comparison.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'll look at this.

@@ -20,6 +22,12 @@ public bool IsMatch(string allChars, int currentPosition, out int newPosition)
var compareChar = _token.Value[counter];
var currentChar = allChars[newPosition];

if (_caseInsensitive)
{
compareChar = char.ToLowerInvariant(compareChar);
Copy link
Owner

@dazinator dazinator Dec 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking _token.Value[] should probably be stored in a LowerInvariant() ready for comparison, at parse time, rather than calling it every time during an IsMatch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. ToLowerInvariant is a sufficiently expensive operation to try to avoid it as much as possible.

@@ -20,6 +22,12 @@ public bool IsMatch(string allChars, int currentPosition, out int newPosition)
var compareChar = _token.Value[counter];
var currentChar = allChars[newPosition];

if (_caseInsensitive)
Copy link
Owner

@dazinator dazinator Dec 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that adding this additional if statement in, shouldn't have any noteable performance implications, because, as the _caseInsensitive flag will either stay as true or false (and shouldn't be changing), CPU branch prediction should kick in to optimise it away.

If we wanted to be super careful, we could create CaseInsensitive versions of each token, i.e CaseInsensitiveLiteralTokenEvaluator - and then use those token evaluators when in case-insensitive mode, instead of passing a flag to the existing evaluators to mutate their behaviour. Not sure whether I like that design or not though. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that lightning performance one of this library's main goals, but the performance impliciations of adding/avoiding a boolean check (even if invoked once per char in a literal token) is practically non-existent. Really not a fan of premature optimizations. I'd be surprised if you'd notice any difference in benchmarks (haven't checked your benchmarks, though! 😉)

@bernhof
Copy link
Author

bernhof commented Dec 17, 2017

Thanks for the feedback 👍 I've added comments and will look at making the changes one of the coming days.

@dazinator
Copy link
Owner

Thanks. Ive pulled your changes into a new local branch, if you dont mind I am going to make a few modifications and then i'll merge this to develop, so I would hold of makimg any further changee yourself for now!

@bernhof
Copy link
Author

bernhof commented Dec 18, 2017

Alright, sounds good. Look forward to seeing the result.

@dazinator dazinator merged commit 355f748 into dazinator:develop Dec 22, 2017
@dazinator
Copy link
Owner

dazinator commented Dec 22, 2017

Thank you for the PR. I refactored it slightly based on some performance tests, and settled on something I was happy with. Thanks for the additional tests, these were useful to make sure I hadn't broken anything! I have updated the README on the develop branch to show the new caseinsensitive option! 🎆

@dazinator
Copy link
Owner

..and the package is published as a pre-release version here if you would like to give it
a go: https://www.nuget.org/packages/DotNet.Glob/2.0.0-alpha0115

@bernhof
Copy link
Author

bernhof commented Dec 22, 2017

That's great! Glad to help. I see you went the route of different implementations for case insensitive, which does look and feel better once done across the board. Look forward to trying it out. 👍

@dazinator
Copy link
Owner

Yeah.. I went that way in the end. Not 100% on it though. It means if there are any more evaluation options added in future there are now two flavours of evaluators and both may need changing to implement that new option.. However i'm willing to cross that bridge if we come to it - for now this is ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants