-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected/wasteful handling of capacity by vec.split_off(0)
#119913
Comments
This behaviour seems to have been introduced by #76682, as an explicit optimization of the That PR notes that the new behaviour is compatible with the existing documentation, and notes some concerns about wastefully reallocating lots of capacity due to an outlier in how long the list gets, but I don't see any discussion of the capacity of the returned vector specifically. At the very least, this behaviour probably deserves some warning in the docs, since the actual behaviour wastes more capacity than the “obvious” implementation would. |
@rustbot label +T-libs |
Looking again at the actual documentation, it says this:
If the returned vector is “newly allocated”, it's natural to assume that the new allocation would be reasonably tight, if not exact. But in the 0 case, the returned vector isn't newly allocated at all, since it steals its allocation from the existing vector. |
When rust/library/alloc/src/vec/mod.rs Line 2200 in 1825374
it's indeed a issue, but I haven't got a better solution, this change will fix the issue, but will also introduce extra copy: if at == 0 {
// the new vector can take over the original buffer and avoid the copy
- return mem::replace(
+ let mut other = mem::replace(
self,
Vec::with_capacity_in(self.capacity(), self.allocator().clone()),
);
+ other.shrink_to(self.len());
+ return other;
} |
split_off in rust/library/alloc/src/collections/vec_deque/mod.rs Lines 1878 to 1906 in 1825374
|
I think the realistic options are:
In cases where In cases where the user specifically wants the current |
Remove special-case handling of `vec.split_off(0)` rust-lang#76682 added special handling to `Vec::split_off` for the case where `at == 0`. Instead of copying the vector's contents into a freshly-allocated vector and returning it, the special-case code steals the old vector's allocation, and replaces it with a new (empty) buffer with the same capacity. That eliminates the need to copy the existing elements, but comes at a surprising cost, as seen in rust-lang#119913. The returned vector's capacity is no longer determined by the size of its contents (as would be expected for a freshly-allocated vector), and instead uses the full capacity of the old vector. In cases where the capacity is large but the size is small, that results in a much larger capacity than would be expected from reading the documentation of `split_off`. This is especially bad when `split_off` is called in a loop (to recycle a buffer), and the returned vectors have a wide variety of lengths. I believe it's better to remove the special-case code, and treat `at == 0` just like any other value: - The current documentation states that `split_off` returns a “newly allocated vector”, which is not actually true in the current implementation when `at == 0`. - If the value of `at` could be non-zero at runtime, then the caller has already agreed to the cost of a full memcpy of the taken elements in the general case. Avoiding that copy would be nice if it were close to free, but the different handling of capacity means that it is not. - If the caller specifically wants to avoid copying in the case where `at == 0`, they can easily implement that behaviour themselves using `mem::replace`. Fixes rust-lang#119913.
Rollup merge of rust-lang#119917 - Zalathar:split-off, r=cuviper Remove special-case handling of `vec.split_off(0)` rust-lang#76682 added special handling to `Vec::split_off` for the case where `at == 0`. Instead of copying the vector's contents into a freshly-allocated vector and returning it, the special-case code steals the old vector's allocation, and replaces it with a new (empty) buffer with the same capacity. That eliminates the need to copy the existing elements, but comes at a surprising cost, as seen in rust-lang#119913. The returned vector's capacity is no longer determined by the size of its contents (as would be expected for a freshly-allocated vector), and instead uses the full capacity of the old vector. In cases where the capacity is large but the size is small, that results in a much larger capacity than would be expected from reading the documentation of `split_off`. This is especially bad when `split_off` is called in a loop (to recycle a buffer), and the returned vectors have a wide variety of lengths. I believe it's better to remove the special-case code, and treat `at == 0` just like any other value: - The current documentation states that `split_off` returns a “newly allocated vector”, which is not actually true in the current implementation when `at == 0`. - If the value of `at` could be non-zero at runtime, then the caller has already agreed to the cost of a full memcpy of the taken elements in the general case. Avoiding that copy would be nice if it were close to free, but the different handling of capacity means that it is not. - If the caller specifically wants to avoid copying in the case where `at == 0`, they can easily implement that behaviour themselves using `mem::replace`. Fixes rust-lang#119913.
I was reviewing some code that uses
split_off(0)
in a loop to take the contents of a buffer, while leaving the buffer's capacity intact to be reused by subsequent iterations. When I double-checked the actual behaviour ofsplit_off(0)
, I was surprised to find that the vector returned bysplit_off(0)
had a much larger capacity than was necessary to hold its contents.Consider this example program: (playground link)
I expected that the
split_off_1
andsplit_off_0
vectors would have similar capacities, since they each only hold one value.Instead, this happened:
The single-element vector produced by
split_off(1)
has a tight capacity of 1, but the single-element vector produced bysplit_off(0)
has a wasteful capacity of 100, which is the same as the original vector's capacity.(The method's documentation doesn't make any explicit promises about the capacity of the returned vector, so this isn't necessarily a bug per se, but it is a surprising behaviour.)
The text was updated successfully, but these errors were encountered: