-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathgrep_notes
133 lines (110 loc) · 3.79 KB
/
grep_notes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# Practice regexes
http://refiddle.com/
[[:alnum:]] [a-zA-Z0-9]
[[:alpha:]] [a-zA-Z]
[[:digit:]] [0-9]
[[:blank:]] [ \t]
grep '[[:digit:]]{3,6}' # match between 3 and 6 digits consecutive digits
[[:punct:]] [!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~]
[[:graph:]] [[:alnum:]] + [[:punct:]]
[[:print:]] '[[:alnum:]]', '[[:punct:]]', and space
[[:xdigit:]] [0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f]
[[:lower:]] [a-z]
[[:upper:]] [A-Z]
\bword\b word delimiters
For example, ‘\brat\b’ matches the separate word ‘rat’, ‘\Brat\B’│matches ‘crate’ but not ‘furry rat’
# grep
. character
[abc].e matches either a b c and a character and e
grep '[[:digit:]]\+' # one or more digits
grep '[[:digit:]]\{3,6\}' # between 3 and 6 digits
grep 'ciao\|hello' # ciao or hello
grep '\(ciao\)\{3\}' # match ciaociaociao
grep 'c\?a ' # c is optional
$ ending
^ starting
## egrep
basically in egrep we don't escape ? () {} | +
. character
[abc].e matches either a b c and a character and e
grep '[[:digit:]]+' # one or more digits
grep '[[:digit:]]{3,6}' # between 3 and 6 digits
grep 'ciao|hello' # ciao or hello
grep '(ciao|hello) Luca' # ciao or hello and then Luca
grep '(ciao){3}' # match ciaociaociao
grep 'c?a ' # c is optional
$ ending
^ starting
sed can use basic regex or extended with -E
Examples of advanced matching:
grep -oP 'group="16".*>\K.+(?=</fct>)'
Note how the `\K` resets the expression starting to match from the
`\K` combination until the end. The `\K` is very useful when we want
to extract substrings, or in general match after a specific regex.
Then the `?=` means to match only if the string after `.+` is continued
by the combination of characters `</fct>` note that these will not be
resulting in the matching string. Here we can see how a combination of
`\K` (or a lookbehind) and a lookahead like `(?=)` is helpful to only
match something which is after a certain regex and before another regex.
NOTE that using a lookbehind like (?<=) is somewhat equivalent to using
`\K` with the only difference being that `\K` accepts regex while
the lookbehind cannot accept regex which generate varying size strings.
The above example will match all the contents of a file done like this:
```txt
<ciao> cawda wwd wda </ciao>
<ciao> cawda wwd wda </ciao>
<ciao> cawda wwd wda </ciao>
<lal> cawda wwd wda </lal>
<lal group="16"> cawda wwd wda </lal>
<lal group="16" subgroup="lall"> cawda wwd wda </lal>
<lal group="16" thirf="dwdwa" > cawda wwd wda </lal>
<lal group="16" thirf="dwdwa" >wda </fct>
<lal group="16">cawda wwd wda 2222 </fct>
```
Resulting in:
```txt
wda
cawda wwd wda 2222
```
Another example is if we would like to display all lines that contain a
sequence of four digits that is itself not part of any longer sequence
of digits, one way is:
```sh
grep -P '(?<!\d)\d{4}(?!\d)' file
```
## Repetitions
This would look for 2 or more occurences of the same character:
```sh
grep -E '(.)\1+' file
```
To print each match on a new line:
```sh
grep -Eo '(.)\1+' file
```
To find matches with exactly 3 matches:
```sh
grep -E '(.)\1{2}' file
```
Or 3 or more:
```sh
grep -E '(.)\1{2,}' file
```
Note tha these work with BSD and GNU grep, but for other versions
of grep they may not work, and there are workarounds for that.
Look here [1]
- [1]: https://unix.stackexchange.com/questions/70933/regular-expression-for-finding-double-characters-in-bash
We can look for repetitions of both patterns or specific elements, for example:
This means to match a specific digit e.g., '3' which is repeated 3 times:
```sh
# with the perl flag on
grep -P '(\d)\1{2}' file.txt
# with the extended regex flag on
grep -E '([[:digit:]])\1{2}' file.txt
```
This means to match whatever sequence of 2 digits e.g., '84':
```sh
# with the perl flag on
grep -P '\d{2}' file.txt
# with the extended regex flag on
grep -E '[[:digit:]]{2}' file.txt
```