grep_notes

# Practice regexes
http://refiddle.com/


[[:alnum:]] [a-zA-Z0-9]
[[:alpha:]] [a-zA-Z]
[[:digit:]] [0-9]
[[:blank:]] [ \t]
grep '[[:digit:]]{3,6}' # match between 3 and 6 digits consecutive digits
[[:punct:]] [!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~]
[[:graph:]] [[:alnum:]] + [[:punct:]]
[[:print:]] '[[:alnum:]]', '[[:punct:]]', and space
[[:xdigit:]] [0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f]
[[:lower:]] [a-z]
[[:upper:]] [A-Z]
\bword\b  word delimiters
For example, ‘\brat\b’ matches the separate word ‘rat’, ‘\Brat\B’│matches ‘crate’ but not ‘furry rat’


# grep 
. character
[abc].e matches either a b c and a character and e
grep '[[:digit:]]\+' # one or more digits 
grep '[[:digit:]]\{3,6\}' # between 3 and 6 digits
grep 'ciao\|hello' # ciao or hello
grep '\(ciao\)\{3\}' # match ciaociaociao
grep 'c\?a ' # c is optional
$ ending
^ starting


## egrep 
basically in egrep we don't escape ? () {} | +

. character
[abc].e matches either a b c and a character and e
grep '[[:digit:]]+' # one or more digits 
grep '[[:digit:]]{3,6}' # between 3 and 6 digits
grep 'ciao|hello' # ciao or hello
grep '(ciao|hello) Luca' # ciao or hello and then Luca
grep '(ciao){3}' # match ciaociaociao
grep 'c?a ' # c is optional
$ ending
^ starting


sed can use basic regex or extended with -E


Examples of advanced matching:
grep -oP 'group="16".*>\K.+(?=</fct>)'

Note how the `\K` resets the expression starting to match from the
`\K` combination until the end.  The `\K` is very useful when we want
to extract substrings, or in general match after a specific regex.
Then the `?=` means to match only if the string after `.+` is continued
by the combination of characters `</fct>` note that these will not be
resulting in the matching string.  Here we can see how a combination of
`\K` (or a lookbehind) and a lookahead like `(?=)` is helpful to only
match something which is after a certain regex and before another regex.
NOTE that using a lookbehind like (?<=) is somewhat equivalent to using
`\K` with the only difference being that `\K` accepts regex while
the lookbehind cannot accept regex which generate varying size strings.

The above example  will match all the contents of a file done like this:
```txt
<ciao> cawda wwd wda </ciao>
<ciao> cawda wwd wda </ciao>
<ciao> cawda wwd wda </ciao>
<lal> cawda wwd wda </lal>
<lal group="16"> cawda wwd wda </lal>
<lal group="16" subgroup="lall"> cawda wwd wda </lal>
<lal group="16" thirf="dwdwa" > cawda wwd wda </lal>
<lal group="16" thirf="dwdwa" >wda </fct>
<lal group="16">cawda wwd wda 2222 </fct>
```
Resulting in:
```txt
wda
cawda wwd wda 2222 
```

Another example is if we would like to display all lines that contain a
sequence of four digits that is itself not part of any longer sequence
of digits, one way is:
```sh
grep -P '(?<!\d)\d{4}(?!\d)' file
```

## Repetitions

This would look for 2 or more occurences of the same character:
```sh
grep -E '(.)\1+' file
```
To print each match on a new line:
```sh
grep -Eo '(.)\1+' file
```

To find matches with exactly 3 matches:
```sh
grep -E '(.)\1{2}' file
```

Or 3 or more:
```sh
grep -E '(.)\1{2,}' file
```
Note tha these work with BSD and GNU grep, but for other versions
of grep they may not work, and there are workarounds for that.
Look here [1]

- [1]: https://unix.stackexchange.com/questions/70933/regular-expression-for-finding-double-characters-in-bash

We can look for repetitions of both patterns or specific elements, for example:

This means to match a specific digit e.g., '3' which is repeated 3 times:
```sh
# with the perl flag on
grep -P '(\d)\1{2}' file.txt
# with the extended regex flag on
grep -E '([[:digit:]])\1{2}' file.txt
```

This means to match whatever sequence of 2 digits e.g., '84':
```sh
# with the perl flag on
grep -P '\d{2}' file.txt
# with the extended regex flag on
grep -E '[[:digit:]]{2}' file.txt
```