-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Case insensitivity option (#41) #43
Conversation
Thank you very much for the PR. I will take a look at this soon. From your explanation above, that all makes sense to me, so I don't forsee any issues getting this merged. |
|
||
if (_caseInsensitive) | ||
{ | ||
start = char.ToLowerInvariant(start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_token.Start and _token.End don't change, so rather than calling ToLowerInvariant on them for every Match
, performance wise it might be better to call ToLowerInvariant
on them once in the constructor and store that for comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I'll look at this.
@@ -20,6 +22,12 @@ public bool IsMatch(string allChars, int currentPosition, out int newPosition) | |||
var compareChar = _token.Value[counter]; | |||
var currentChar = allChars[newPosition]; | |||
|
|||
if (_caseInsensitive) | |||
{ | |||
compareChar = char.ToLowerInvariant(compareChar); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking _token.Value[] should probably be stored in a LowerInvariant() ready for comparison, at parse time, rather than calling it every time during an IsMatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. ToLowerInvariant is a sufficiently expensive operation to try to avoid it as much as possible.
@@ -20,6 +22,12 @@ public bool IsMatch(string allChars, int currentPosition, out int newPosition) | |||
var compareChar = _token.Value[counter]; | |||
var currentChar = allChars[newPosition]; | |||
|
|||
if (_caseInsensitive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that adding this additional if statement
in, shouldn't have any noteable performance implications, because, as the _caseInsensitive flag will either stay as true
or false
(and shouldn't be changing), CPU branch prediction should kick in to optimise it away.
If we wanted to be super careful, we could create CaseInsensitive
versions of each token, i.e CaseInsensitiveLiteralTokenEvaluator
- and then use those token evaluators when in case-insensitive mode, instead of passing a flag to the existing evaluators to mutate their behaviour. Not sure whether I like that design or not though. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that lightning performance one of this library's main goals, but the performance impliciations of adding/avoiding a boolean check (even if invoked once per char in a literal token) is practically non-existent. Really not a fan of premature optimizations. I'd be surprised if you'd notice any difference in benchmarks (haven't checked your benchmarks, though! 😉)
Thanks for the feedback 👍 I've added comments and will look at making the changes one of the coming days. |
Thanks. Ive pulled your changes into a new local branch, if you dont mind I am going to make a few modifications and then i'll merge this to develop, so I would hold of makimg any further changee yourself for now! |
Alright, sounds good. Look forward to seeing the result. |
Thank you for the PR. I refactored it slightly based on some performance tests, and settled on something I was happy with. Thanks for the additional tests, these were useful to make sure I hadn't broken anything! I have updated the README on the develop branch to show the new caseinsensitive option! 🎆 |
..and the package is published as a pre-release version here if you would like to give it |
That's great! Glad to help. I see you went the route of different implementations for case insensitive, which does look and feel better once done across the board. Look forward to trying it out. 👍 |
Yeah.. I went that way in the end. Not 100% on it though. It means if there are any more evaluation options added in future there are now two flavours of evaluators and both may need changing to implement that new option.. However i'm willing to cross that bridge if we come to it - for now this is ok. |
Fix for #41. Implemented as
CaseInsensitive
property on GlobParseOptions - although I'd say it's more an evaluator/matching option. But that's just naming, I'll let you decide what you think is best here.In the end, only the
bool caseInsensitive
itself is passed to Evaluator constructors where applicable. They useToLowerInvariant
on the fly before comparing, if case insensitivity is switched on.CharacterListTokens, however, are a special case where, from a performance perspective, it would make sense to cache a list of tokens in their lower (invariant) form, since lookups are performed on this list for each character in the input string. Unsure whether caching vs no caching really makes a difference in most cases, though.
A note about "insensitivity" vs "sensitivity": it felt more natural to switch on insensitivity, rather than switching off sensitivity, due to the current default behaviour as well as how e.g. regular expressions work (where an insensitivity switch is required).
Removed
IsCurrentCharEqualTo
from GlobStringReader: it did case-insensitive (ToLowerInvariant) comparison of chars, which was unnecessary (only compared to path seperators) and confusing. Removed it and replaced with regular==
comparison.Added regression tests (IsMatchCaseInsensitive and added inline data to Does_Not_Match) - all tests passing. Don't currently have time to create benchmark tests that take case insensitivity into account. Hope you'll be able to assist here.
Let me know what you think.