Case insensitivity option (#41) #43

bernhof · 2017-12-15T11:05:22Z

Fix for #41. Implemented as CaseInsensitive property on GlobParseOptions - although I'd say it's more an evaluator/matching option. But that's just naming, I'll let you decide what you think is best here.

In the end, only the bool caseInsensitive itself is passed to Evaluator constructors where applicable. They use ToLowerInvariant on the fly before comparing, if case insensitivity is switched on.

CharacterListTokens, however, are a special case where, from a performance perspective, it would make sense to cache a list of tokens in their lower (invariant) form, since lookups are performed on this list for each character in the input string. Unsure whether caching vs no caching really makes a difference in most cases, though.

A note about "insensitivity" vs "sensitivity": it felt more natural to switch on insensitivity, rather than switching off sensitivity, due to the current default behaviour as well as how e.g. regular expressions work (where an insensitivity switch is required).

Removed IsCurrentCharEqualTo from GlobStringReader: it did case-insensitive (ToLowerInvariant) comparison of chars, which was unnecessary (only compared to path seperators) and confusing. Removed it and replaced with regular == comparison.

Added regression tests (IsMatchCaseInsensitive and added inline data to Does_Not_Match) - all tests passing. Don't currently have time to create benchmark tests that take case insensitivity into account. Hope you'll be able to assist here.

Let me know what you think.

dazinator · 2017-12-16T17:36:07Z

Thank you very much for the PR. I will take a look at this soon. From your explanation above, that all makes sense to me, so I don't forsee any issues getting this merged.

dazinator · 2017-12-16T17:52:57Z

src/DotNet.Glob/Evaluation/LetterRangeTokenEvaluator.cs

+
+            if (_caseInsensitive)
+            {
+                start = char.ToLowerInvariant(start);


_token.Start and _token.End don't change, so rather than calling ToLowerInvariant on them for every Match, performance wise it might be better to call ToLowerInvariant on them once in the constructor and store that for comparison.

Agreed. I'll look at this.

dazinator · 2017-12-16T17:57:36Z

src/DotNet.Glob/Evaluation/LiteralTokenEvaluator.cs

@@ -20,6 +22,12 @@ public bool IsMatch(string allChars, int currentPosition, out int newPosition)
                var compareChar = _token.Value[counter];
                var currentChar = allChars[newPosition];

+                if (_caseInsensitive)
+                {
+                    compareChar = char.ToLowerInvariant(compareChar);


I am thinking _token.Value[] should probably be stored in a LowerInvariant() ready for comparison, at parse time, rather than calling it every time during an IsMatch.

That makes sense. ToLowerInvariant is a sufficiently expensive operation to try to avoid it as much as possible.

dazinator · 2017-12-17T14:07:45Z

src/DotNet.Glob/Evaluation/LiteralTokenEvaluator.cs

@@ -20,6 +22,12 @@ public bool IsMatch(string allChars, int currentPosition, out int newPosition)
                var compareChar = _token.Value[counter];
                var currentChar = allChars[newPosition];

+                if (_caseInsensitive)


I think that adding this additional if statement in, shouldn't have any noteable performance implications, because, as the _caseInsensitive flag will either stay as true or false (and shouldn't be changing), CPU branch prediction should kick in to optimise it away.

If we wanted to be super careful, we could create CaseInsensitive versions of each token, i.e CaseInsensitiveLiteralTokenEvaluator - and then use those token evaluators when in case-insensitive mode, instead of passing a flag to the existing evaluators to mutate their behaviour. Not sure whether I like that design or not though. What do you think?

I know that lightning performance one of this library's main goals, but the performance impliciations of adding/avoiding a boolean check (even if invoked once per char in a literal token) is practically non-existent. Really not a fan of premature optimizations. I'd be surprised if you'd notice any difference in benchmarks (haven't checked your benchmarks, though! 😉)

bernhof · 2017-12-17T23:07:41Z

Thanks for the feedback 👍 I've added comments and will look at making the changes one of the coming days.

dazinator · 2017-12-18T00:10:19Z

Thanks. Ive pulled your changes into a new local branch, if you dont mind I am going to make a few modifications and then i'll merge this to develop, so I would hold of makimg any further changee yourself for now!

bernhof · 2017-12-18T07:24:12Z

Alright, sounds good. Look forward to seeing the result.

closes #41

dazinator · 2017-12-22T01:50:45Z

Thank you for the PR. I refactored it slightly based on some performance tests, and settled on something I was happy with. Thanks for the additional tests, these were useful to make sure I hadn't broken anything! I have updated the README on the develop branch to show the new caseinsensitive option! 🎆

dazinator · 2017-12-22T01:59:06Z

..and the package is published as a pre-release version here if you would like to give it
a go: https://www.nuget.org/packages/DotNet.Glob/2.0.0-alpha0115

bernhof · 2017-12-22T10:01:37Z

That's great! Glad to help. I see you went the route of different implementations for case insensitive, which does look and feel better once done across the board. Look forward to trying it out. 👍

dazinator · 2017-12-22T10:32:18Z

Yeah.. I went that way in the end. Not 100% on it though. It means if there are any more evaluation options added in future there are now two flavours of evaluators and both may need changing to implement that new option.. However i'm willing to cross that bridge if we come to it - for now this is ok.

Case insensitivity option (#41)

592266f

dazinator added the enhancement label Dec 16, 2017

dazinator reviewed Dec 16, 2017

View reviewed changes

dazinator reviewed Dec 17, 2017

View reviewed changes

+semver: breaking - Case Insensitivity Option

fc8a6a7

closes #41

dazinator merged commit 355f748 into dazinator:develop Dec 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Case insensitivity option (#41) #43

Case insensitivity option (#41) #43

bernhof commented Dec 15, 2017 •

edited

Loading

dazinator commented Dec 16, 2017

dazinator Dec 16, 2017

bernhof Dec 17, 2017

dazinator Dec 16, 2017 •

edited

Loading

bernhof Dec 17, 2017

dazinator Dec 17, 2017 •

edited

Loading

bernhof Dec 17, 2017

bernhof commented Dec 17, 2017

dazinator commented Dec 18, 2017

bernhof commented Dec 18, 2017

dazinator commented Dec 22, 2017 •

edited

Loading

dazinator commented Dec 22, 2017

bernhof commented Dec 22, 2017

dazinator commented Dec 22, 2017

Case insensitivity option (#41) #43

Case insensitivity option (#41) #43

Conversation

bernhof commented Dec 15, 2017 • edited Loading

dazinator commented Dec 16, 2017

dazinator Dec 16, 2017

Choose a reason for hiding this comment

bernhof Dec 17, 2017

Choose a reason for hiding this comment

dazinator Dec 16, 2017 • edited Loading

Choose a reason for hiding this comment

bernhof Dec 17, 2017

Choose a reason for hiding this comment

dazinator Dec 17, 2017 • edited Loading

Choose a reason for hiding this comment

bernhof Dec 17, 2017

Choose a reason for hiding this comment

bernhof commented Dec 17, 2017

dazinator commented Dec 18, 2017

bernhof commented Dec 18, 2017

dazinator commented Dec 22, 2017 • edited Loading

dazinator commented Dec 22, 2017

bernhof commented Dec 22, 2017

dazinator commented Dec 22, 2017

bernhof commented Dec 15, 2017 •

edited

Loading

dazinator Dec 16, 2017 •

edited

Loading

dazinator Dec 17, 2017 •

edited

Loading

dazinator commented Dec 22, 2017 •

edited

Loading