-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add workaround for Bazel committed_size check when uploading compressed blobs #1493
Conversation
streamState, err = s.initStreamState(ctx, req) | ||
if status.IsAlreadyExistsError(err) { | ||
return handleAlreadyExists(stream, req) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this jive well with the uploadTracker stuff below or does this mean we're missing data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For compressed uploads requiring more than 1 message to upload, this does mean that now we are potentially letting the client upload a significant amount of data that is not tracked as an upload ("cache write" in the UI).
Fixed by creating a separate UploadTracker in handleAlreadyExists, but only in the case where we have to download the remaining stream.
Note that it's still tracking uncompressed size, but we have an existing issue to track the raw number of bytes sent by the client as opposed to uncompressed bytes.
Side note (probably not worth worrying about too much, since these short-circuiting cases should be relatively rare): I think we can do better in the case where the client sends only a single message that contains the entire payload, but then we short-circuit because the object already exists in cache. Currently (and also before this PR) we don't track those payloads at all, so we might present an inaccurate view of upload rate because it excludes these payloads from the total uploaded bytes. We also probably need to decide whether to present those as "cache writes" to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also probably need to decide whether to present those as "cache writes" to the user
If the user uploaded, even if it's a duplicate, I feel we should show it. It's confusing to see the number lower than what they could see with the gRPC log. Similar to when it's a read-only key, I feel we should show the number of items uploaded. Maybe we split those in two in the UI though? Uploaded versus written to cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for separating upload vs. written -- I will file an issue for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think showing a warning when they upload but it's not written because of a read-only key can remove a lot of confusion as well.
return stream.SendAndClose(&bspb.WriteResponse{CommittedSize: committedSize}) | ||
} | ||
|
||
func remainingUploadSize(stream bspb.ByteStream_WriteServer) (int64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comment what this function does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, and renamed to make it more clear that it's doing IO
// | ||
// There are two cases where we can short-circuit, though: | ||
// | ||
// - If this is an uncompressed stream, we assume that the committed_size will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably worth comment what's happening in the biggest part of the function below: if this is a compressed stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded and shortened -- hopefully it's clearer now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
nice job coming up with a workaround for this on the spot :)
Context: bazelbuild/bazel#14654
Version bump: Patch