-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overlay differ: Do file comparison in some cases. #2480
Conversation
util/overlay/overlay_linux.go
Outdated
if err2 != nil && err2 != io.EOF { | ||
return false, err2 | ||
} | ||
if n1 != n2 || !bytes.Equal(b1[:n1], b2[:n2]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is the n1 != n2
comparison correct in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well n1 != n2
if one hit EOF before the other (in which case they are different sizes) or if a read got cut short. AFAIK the go runtime is supposed to transparently handle reads that were interrupted by signals and would otherwise return a non-nil error, but I don't see that 100% guaranteed in writing anywhere. I guess in the very obscure situation where something went wrong and a read got cut short before filling the buffer or hitting EOF
but err
is nonetheless nil
, it's probably fine to just say the files aren't equal and leave here.
That being said, it's redundant in that bytes.Equal
inherently checks this anyways, so we can remove it if you prefer.
Also, while thinking about this I realized that the only thing stopping our code from trying to read non-regular files is the check in sameDirent
for f1.Size() == 0
, which should always return true for fifos, sockets, chardevs, etc. That is a very fragile check though and more just implied by the linux manpages rather than fully guaranteed, so I'm also updating this function to verify f1
and f2
are regular files before trying to read them. Trying to compare contents of anything else could lead to pretty nasty bugs, so it's worth guarding against it from future changes to this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no guarantee that Read
returns a specific number of bytes. It just returns whatever single syscall returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and the read
syscall is made with count
parameter set to the len
of the provided buffer, which is the same for both files, so the situations where n1 != n2
are either:
- The files aren't the same size (in which case they aren't equal)
- They aren't regular files (which we guard against)
- The syscall was interrupted (which should be handled by the go runtime or otherwise result in a non-nil
err
being returned) - The underlying filesystem is behaving extremely strangely and returning less than
count
bytes for a regular file even thoughEOF
hasn't been reached and nothing interrupted it.
We can probably turn 4
into an error condition rather than just returning false, nil
, but I'd prefer to do that along with updating continuity
to have the same behavior so the differs are consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
behaving extremely strangely
I don't think this case is extremely strange. It's part of the contract. Network-based filesystems definitely do it. I have not looked at the internals of every possible linux filesystem and even when some of them guarantee it atm they can change it any time. There is no guarantee for it even in the vfs-to-syscall layer iiuc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use io.ReadFull here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I always thought the rule of thumb was that regular files only return short reads on EOF, signal interrupt or conditions where errno
is set, but you're right that it's not technically in violation of the spec to do otherwise and network filesystems or FUSE filesystems are much more likely to do things outside the norm, so we should handle this under the assumption that eventually someone will do something very weird (like you've said previously).
Can't we use io.ReadFull here?
Yeah good call, that handles the complication for us. Updated with that. I will send out a PR to continuity to have the double-walking differ do the same thing since that's where this code was originally from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I left in n1 != n2
in the update because it should be correct now that ReadFull
is used. It's technically also asserted by byte.Equal
but I'm not sure how efficiently so I figured it didn't hurt to leave it here.
Does this have performance drawback? |
d8ba7b0
to
3484ad5
Compare
I go into detail about this in the commit message, but the TL;DR is that the only possible performance drawback is in very obscure cases, and even in those cases it's more of a tradeoff rather than a performance hit. The use cases described in the issues linked to in the PR that added the overlay differ should not be impacted AFAICT (#1704 and #1192). An example of an obscure case where this changes the performance is say you have a large file This causes the overlay differ to create the same blobs as the double-walking differ, which is my main priority here. But I think it's also good in that it prevents that layers from including unneeded files, which saves time+bandwidth pushing and pulling the layers. And in the vast majority of the cases, the overlay differ retains its performance benefits over the double-walking differ anyways. |
It sounds reasonable to me that overlayfs-differ should avoid the additional redundancy that isn't created by walking-differ.
I am a bit worried about whether we can guarantee that every differ always creates the exact same blob from the same set of snapshots. e.g. What happens if the tar library is fixed to emit bits different from the older version? |
3484ad5
to
a4bbbf2
Compare
Ah, I should have said "differences between the diffs in the blobs" rather than "differences between the blobs". It's totally fine for the bits of the blobs to be different, we just want to ensure the contents resulting from unpacking the blobs are the same between the two differs. Different versions of tar, different compressions, etc. are fine. |
util/overlay/overlay_linux.go
Outdated
return false, errors.Errorf("%s is not a regular file", p2) | ||
} | ||
|
||
b1 := make([]byte, compareChunkSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good if this used sync.Pool
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
This change results in the overlay differ comparing files to determine if they are actually part of the diff. This is needed to resolve differences between the blobs created by the overlay differ and the double-walking differ. Before this change, the overlay differ always just assumed that if a file was in the upperdir it must be part of the diff and included it as an add or a modify change. However, there are situations in which files can appear in the upperdir without having been modified or even opened. For example, if "foo" is a file or dir present in the lowerdirs of an overlay mount and you run "mv foo footmp; mv footmp foo", then the upperdir will contain foo (in addition to any files found under foo if it's a dir). In this situation, the double-walking differ would not include foo as part of the diff, but the overlay differ would. This meant that the overlay differ would potentially include extra files in each blob for such diffs relative to the double-walking differ. As of now, while this does increase image size, it doesn't result in any inconsistencies in terms of the contents of images because it just results in files/dirs getting duplicated on top of their equivalents. However, for the upcoming DiffOp support, this inconsistency could actually result in the same operation producing mounts with different contents depending on which differ is used. This change is therefore necessary in order to enforce DiffOp consistency (on top of the possible improvements to exported image size). The main concern here is that this could undo the performance benefits that the overlay differ was intended to fix. However, in practice the situations where this has worse performance are quite obscure and the benefits should still be present. First, consider the case where foo is a directory and the user does the equivalent of "mv foo footmp; mv footmp foo". Even before this change, the overlay differ would see that foo is marked as opaque and thus fall back to using the double-walking differ. So there's no performance regression in this case as the double-walking differ does the same file comparisons as were added in this commit. For the case where the user shuffles a file back and forth, there will potentially be a slow file content based comparison if the underlying file has a truncated nanosecond timestamp (i.e. it was unpacked from a tar file). However, the situations in which you shuffle an individual file without changing it (or open it for writing but then write nothing) that is large enough in size for content comparisons to be slow are obscure. Additionally, while the content comparison may be slow, there will be time saved during export because the file won't be included unnecessarily in the exported blob, so it's a tradeoff rather than a pure loss. In situations where the user actually did change a file and it shows up in the upperdir, it should be extremely rare that the content comparison code path is followed. It would require that the user changed no other metadata of the file, including size, and both mod timestamps were the same (which could only really happen if their underlying filesystem lacked support for nanosecond precision and they modified the file within 1 second of its modification in the lowerdir or they manually changed the modtime with chtimes). Signed-off-by: Erik Sipsma <erik@sipsma.dev>
a4bbbf2
to
1829291
Compare
This change results in the overlay differ comparing files to determine
if they are actually part of the diff. This is needed to resolve
differences between the blobs created by the overlay differ and the
double-walking differ.
Before this change, the overlay differ always just assumed that if a
file was in the upperdir it must be part of the diff and included it as
an add or a modify change. However, there are situations in which files
can appear in the upperdir without having been modified or even opened.
For example, if "foo" is a file or dir present in the lowerdirs of an
overlay mount and you run "mv foo footmp; mv footmp foo", then the
upperdir will contain foo (in addition to any files found under foo if
it's a dir). In this situation, the double-walking differ would not
include foo as part of the diff, but the overlay differ would.
This meant that the overlay differ would potentially include extra files
in each blob for such diffs relative to the double-walking differ. As of
now, while this does increase image size, it doesn't result in any
inconsistencies in terms of the contents of images because it just
results in files/dirs getting duplicated on top of their equivalents.
However, for the upcoming DiffOp support, this inconsistency could
actually result in the same operation producing mounts with different
contents depending on which differ is used. This change is therefore
necessary in order to enforce DiffOp consistency (on top of the possible
improvements to exported image size).
The main concern here is that this could undo the performance benefits
that the overlay differ was intended to fix. However, in practice the
situations where this has worse performance are quite obscure and the
benefits should still be present.
First, consider the case where foo is a directory and the user does the
equivalent of "mv foo footmp; mv footmp foo". Even before this change,
the overlay differ would see that foo is marked as opaque and thus fall
back to using the double-walking differ. So there's no performance
regression in this case as the double-walking differ does the same
file comparisons as were added in this commit.
For the case where the user shuffles a file back and forth, there will
potentially be a slow file content based comparison if the underlying
file has a truncated nanosecond timestamp (i.e. it was unpacked from a
tar file). However, the situations in which you shuffle an individual
file without changing it (or open it for writing but then write nothing)
and it's large enough in size for content comparisons to be slow are
obscure. Additionally, while the content comparison may be slow, there
will be time saved during export because the file won't be included
unnecessarily in the exported blob, so it's a tradeoff rather than a
pure loss.
In situations where the user actually did change a file and it shows up
in the upperdir, it should be extremely rare that the content comparison
code path is followed. It would require that the user changed no other
metadata of the file, including size, and both mod timestamps were the
same (which could only really happen if their underlying filesystem
lacked support for nanosecond precision and they modified the file
within 1 second of its modification in the lowerdir or they manually
changed the modtime with chtimes).
Signed-off-by: Erik Sipsma erik@sipsma.dev
@tonistiigi @ktock This is mostly just preparation for DiffOp, but may be desirable anyways. I decided to split it out from the DiffOp PR in case there's any discussion needed on the possible performance impact, but like I explain in the commit message I don't think that should be a major concern. Let me know your thoughts.