Skip to content

Commit 35e9cac

Browse files
authored
FOLLOW should control traversal of symlinks, not return of them (#223)
* `FOLLOW` should control traversal of symlinks, not return of them Some fixes to globmatch symlinks are also included * Add more symlink cases for globmatch * Windows fix * Bump version to 10 as symlik change may be surprising
1 parent 2939d2a commit 35e9cac

File tree

7 files changed

+56
-30
lines changed

7 files changed

+56
-30
lines changed

docs/src/markdown/about/changelog.md

+7
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,12 @@
11
# Changelog
22

3+
## 10.0
4+
5+
- **NEW**: Symlinks should not be traversed when `GLOBSTAR` is enabled unless `FOLLOW` is also enabled, but they
6+
should still be matched. Prior to this change, symlinks were not traversed _and_ they were ignored from matching
7+
which contradicts how Bash works, which is are general target.
8+
- **FIX**: Fix some inconsistencies with `globmatch` and symlink handling when `REALPATH` is enabled.
9+
310
## 9.0
411

512
- **NEW**: Remove deprecated function `glob.raw_escape`.

docs/src/markdown/glob.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -432,7 +432,7 @@ False
432432
If you would like for `globmatch` (or [`globfilter`](#globfilter)) to operate on your current filesystem directly,
433433
simply pass in the [`REALPATH`](#realpath) flag. When enabled, the path under consideration will be analyzed and
434434
will use that context to determine if the file exists, if it is a directory, does it's context make sense compared to
435-
what the pattern is looking vs the current working directory, or if it has symlinks that should not be matched by
435+
what the pattern is looking vs the current working directory, or if it has symlinks that should not be traversed by
436436
[`GLOBSTAR`](#globstar).
437437

438438
Here we use [`REALPATH`](#realpath) and can see that `globmatch` now knows that `doc` is a directory.
@@ -529,7 +529,7 @@ Path-like object input support is only available in Python 3.6+ as the path-like
529529
Like [`globmatch`](#globmatch), `globfilter` does not operate directly on the file system, with all the caveats
530530
associated. But you can enable the [`REALPATH`](#realpath) flag and `globfilter` will use the filesystem to gain
531531
context such as: whether the file exists, whether it is a directory or not, or whether it has symlinks that should not
532-
be matched by `GLOBSTAR`. See [`globmatch`](#globmatch) for examples.
532+
be traversed by `GLOBSTAR`. See [`globmatch`](#globmatch) for examples.
533533

534534
/// new | New 5.1
535535
- `root_dir` was added in 5.1.0.
@@ -754,8 +754,8 @@ file matches the excluded pattern. Essentially, it means if you use a pattern su
754754
patterns were given: `**` and `!*.md`, where `!*.md` is applied to the results of `**`, and `**` is specifically treated
755755
as if [`GLOBSTAR`](#globstar) was enabled.
756756

757-
Dot files will not be returned unless [`DOTGLOB`](#dotglob) is enabled. Symlinks will also be ignored in the return
758-
unless [`FOLLOW`](#follow) is enabled.
757+
Dot files will not be returned unless [`DOTGLOB`](#dotglob) is enabled. Symlinks will also not be traversed unless
758+
[`FOLLOW`](#follow) is enabled.
759759

760760
#### `glob.MINUSNEGATE, glob.M` {: #minusnegate}
761761

@@ -768,7 +768,7 @@ When `MINUSNEGATE` is used with [`NEGATE`](#negate), exclusion patterns are reco
768768

769769
#### `glob.FOLLOW, glob.L` {: #follow}
770770

771-
`FOLLOW` will cause [`GLOBSTAR`](#globstar) patterns (`**`) to match and traverse symlink directories.
771+
`FOLLOW` will cause [`GLOBSTAR`](#globstar) patterns (`**`) to traverse symlink directories.
772772

773773
#### `glob.REALPATH, glob.P` {: #realpath}
774774

@@ -784,7 +784,7 @@ file path for the given system it is running on. It will augment the patterns us
784784
logic so that the path must meet the following in order to match:
785785

786786
- Path must exist.
787-
- Directories that are symlinks will not be matched by [`GLOBSTAR`](#globstar) patterns (`**`) unless the
787+
- Directories that are symlinks will not be traversed by [`GLOBSTAR`](#globstar) patterns (`**`) unless the
788788
[`FOLLOW`](#follow) flag is enabled.
789789
- When presented with a pattern where the match must be a directory, but the file path being compared doesn't indicate
790790
the file is a directory with a trailing slash, the command will look at the filesystem to determine if it is a

tests/test_glob.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -510,7 +510,7 @@ class Testglob(_TestGlob):
510510
] if not can_symlink() else [
511511
('',), ('aab',), ('aab', 'F'), ('a',), ('a', 'bcd'), ('a', 'bcd', 'EF'), ('a', 'bcd', 'efg'),
512512
('a', 'bcd', 'efg', 'ha'), ('a', 'D'), ('aaa',), ('aaa', 'zzzF'), ('EF',), ('ZZZ',),
513-
('sym1',), ('sym2',)
513+
('sym1',), ('sym2',), ('sym3',)
514514
],
515515
glob.L
516516
],
@@ -553,7 +553,7 @@ class Testglob(_TestGlob):
553553
[
554554
('EF',), ('ZZZ',), ('',)
555555
] if not can_symlink() else [
556-
('EF',), ('ZZZ',), ('',), ('sym1',), ('sym2',)
556+
('EF',), ('ZZZ',), ('',), ('sym1',), ('sym2',), ('sym3',)
557557
],
558558
glob.N | glob.L
559559
],

tests/test_globmatch.py

+2
Original file line numberDiff line numberDiff line change
@@ -1606,12 +1606,14 @@ def test_globmatch_symlink(self):
16061606

16071607
self.assertFalse(glob.globmatch(self.tempdir + '/sym1/a.txt', '**/*.txt}', flags=self.default_flags))
16081608
self.assertTrue(glob.globmatch(self.tempdir + '/a.txt', '**/*.txt', flags=self.default_flags))
1609+
self.assertTrue(glob.globmatch(self.tempdir + '/sym1/', '**', flags=self.default_flags))
16091610

16101611
def test_globmatch_follow_symlink(self):
16111612
"""Test `globmatch` with symlinks that we follow."""
16121613

16131614
self.assertTrue(glob.globmatch(self.tempdir + '/sym1/a.txt', '**/*.txt', flags=self.default_flags | glob.L))
16141615
self.assertTrue(glob.globmatch(self.tempdir + '/a.txt', '**/*.txt', flags=self.default_flags | glob.L))
1616+
self.assertTrue(glob.globmatch(self.tempdir + '/sym1/', '**', flags=self.default_flags))
16151617

16161618
def test_globmatch_trigger_symlink_cache(self):
16171619
"""Use a pattern that exercises the symlink cache."""

wcmatch/__meta__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -193,5 +193,5 @@ def parse_version(ver: str) -> Version:
193193
return Version(major, minor, micro, release, pre, post, dev)
194194

195195

196-
__version_info__ = Version(9, 0, 0, "final")
196+
__version_info__ = Version(10, 0, 0, "final")
197197
__version__ = __version_info__._get_canonical()

wcmatch/_wcmatch.py

+37-20
Original file line numberDiff line numberDiff line change
@@ -12,15 +12,30 @@
1212
# Right half can return an empty set if not supported
1313
SUPPORT_DIR_FD = {os.open, os.stat} <= os.supports_dir_fd and os.scandir in os.supports_fd
1414

15-
1615
RE_WIN_MOUNT = (
17-
re.compile(r'\\|[a-z]:(?:\\|$)', re.I),
18-
re.compile(br'\\|[a-z]:(?:\\|$)', re.I)
16+
re.compile(r'\\|/|[a-z]:(?:\\|/|$)', re.I),
17+
re.compile(br'\\|/|[a-z]:(?:\\|/|$)', re.I)
1918
)
2019
RE_MOUNT = (
2120
re.compile(r'/'),
2221
re.compile(br'/')
2322
)
23+
RE_WIN_SPLIT = (
24+
re.compile(r'\\|/'),
25+
re.compile(br'\\|/')
26+
)
27+
RE_SPLIT = (
28+
re.compile(r'/'),
29+
re.compile(br'/')
30+
)
31+
RE_WIN_STRIP = (
32+
r'\\/',
33+
br'\\/'
34+
)
35+
RE_STRIP = (
36+
r'/',
37+
br'/'
38+
)
2439

2540

2641
class _Match(Generic[AnyStr]):
@@ -49,8 +64,7 @@ def _fs_match(
4964
self,
5065
pattern: Pattern[AnyStr],
5166
filename: AnyStr,
52-
is_dir: bool,
53-
sep: AnyStr,
67+
is_win: bool,
5468
follow: bool,
5569
symlinks: dict[tuple[int | None, AnyStr], bool],
5670
root: AnyStr,
@@ -65,36 +79,37 @@ def _fs_match(
6579
We only check for the symlink if we know we are looking at a directory.
6680
And we only call `lstat` if we can't find it in the cache.
6781
68-
We know it's a directory if:
82+
We know we need to check the directory if:
6983
70-
1. If the base is a directory, all parts are directories.
71-
2. If we are not the last part of the `globstar`, the part is a directory.
72-
3. If the base is a file, but the part is not at the end, it is a directory.
84+
1. If the match has not reached the end of the path and directory is in `globstar` match.
85+
2. Or the match is at the end of the path and the directory is not the last part of `globstar` match.
7386
7487
"""
7588

7689
matched = False
90+
split = (RE_WIN_SPLIT if is_win else RE_SPLIT)[self.ptype] # type: Any
91+
strip = (RE_WIN_STRIP if is_win else RE_STRIP)[self.ptype] # type: Any
7792

78-
end = len(filename)
93+
end = len(filename) - 1
7994
base = None
8095
m = pattern.fullmatch(filename)
8196
if m:
8297
matched = True
8398
# Lets look at the captured `globstar` groups and see if that part of the path
8499
# contains symlinks.
85100
if not follow:
86-
last = len(m.groups())
87101
try:
88102
for i, star in enumerate(m.groups(), 1):
89103
if star:
90104
at_end = m.end(i) == end
91-
parts = star.strip(sep).split(sep)
105+
parts = split.split(star.strip(strip))
92106
if base is None:
93107
base = os.path.join(root, filename[:m.start(i)])
94-
for part in parts:
108+
last_part = len(parts)
109+
for j, part in enumerate(parts, 1):
95110
base = os.path.join(base, part)
96111
key = (dir_fd, base)
97-
if is_dir or i != last or not at_end:
112+
if not at_end or (at_end and j != last_part):
98113
is_link = symlinks.get(key, None)
99114
if is_link is None:
100115
if dir_fd is None:
@@ -125,13 +140,15 @@ def _match_real(
125140
) -> bool:
126141
"""Match real filename includes and excludes."""
127142

128-
temp = '\\' if util.platform() == "windows" else '/'
143+
is_win = util.platform() == "windows"
144+
129145
if isinstance(self.filename, bytes):
130-
sep = os.fsencode(temp)
146+
sep = b'/'
147+
is_dir = (RE_WIN_SPLIT if is_win else RE_SPLIT)[1].match(self.filename[-1:]) is not None
131148
else:
132-
sep = temp
149+
sep = '/'
150+
is_dir = (RE_WIN_SPLIT if is_win else RE_SPLIT)[0].match(self.filename[-1:]) is not None
133151

134-
is_dir = self.filename.endswith(sep)
135152
try:
136153
if dir_fd is None:
137154
is_file_dir = os.path.isdir(os.path.join(root, self.filename))
@@ -153,14 +170,14 @@ def _match_real(
153170

154171
matched = False
155172
for pattern in self.include:
156-
if self._fs_match(pattern, filename, is_dir, sep, self.follow, symlinks, root, dir_fd):
173+
if self._fs_match(pattern, filename, is_win, self.follow, symlinks, root, dir_fd):
157174
matched = True
158175
break
159176

160177
if matched:
161178
if self.exclude:
162179
for pattern in self.exclude:
163-
if self._fs_match(pattern, filename, is_dir, sep, True, symlinks, root, dir_fd):
180+
if self._fs_match(pattern, filename, is_win, True, symlinks, root, dir_fd):
164181
matched = False
165182
break
166183

wcmatch/glob.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -665,7 +665,7 @@ def _glob_dir(
665665

666666
path = os.path.join(curdir, file)
667667
follow = not is_link or self.follow_links
668-
if (matcher is None and not hidden and (follow or not deep)) or (matcher and matcher(file)):
668+
if (matcher is None and not hidden) or (matcher and matcher(file)):
669669
yield path, is_dir
670670

671671
if deep and not hidden and is_dir and follow:

0 commit comments

Comments
 (0)