Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete files in multiple ranges at once #3431

Closed
wants to merge 4 commits into from
Closed

Delete files in multiple ranges at once #3431

wants to merge 4 commits into from

Conversation

huachaohuang
Copy link
Contributor

Using DeleteFilesInRange to delete files in a lot of ranges can be slow, because
VersionSet::LogAndApply is expensive.

This PR adds a new DeleteFilesInRange function to delete files in multiple
ranges at once.

Close #2951

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good change, thanks :). couple more questions:

  • do you mind updating HISTORY.md to mention this new public API?
  • do you have any numbers to share about how long deleting files in ranges took before/after this change?

@@ -100,6 +100,14 @@ struct Range {
Range(const Slice& s, const Slice& l) : start(s), limit(l) { }
};

struct RangePtr {
const Slice* start; // Included in the range
const Slice* limit; // Not included in the range
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding of GetCleanInputsWithinInterval is that it'll include the upper endpoint -- do you mind double checking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, I just copy it from struct Range. But we are actually intent on supporting exclusive range end in DeleteFilesInRange/DeleteFilesInRanges. Do you think it is OK to add that support in this PR, and make the range end exclusive by default in the new API DeleteFilesInRanges?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add support in this PR sounds fine. But I think it'd be more consistent to make it inclusive-inclusive, with an option to make the limit exclusive. This PR (#2863) did a good job of adding the option for DeleteFilesInRange, but unfortunately we didn't merge it due to an unsolved test failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's delete // Not included in the range since it's optional

@@ -330,6 +330,8 @@ void CancelAllBackgroundWork(DB* db, bool wait = false);
// Snapshots before the delete might not see the data in the given range.
Status DeleteFilesInRange(DB* db, ColumnFamilyHandle* column_family,
const Slice* begin, const Slice* end);
Status DeleteFilesInRange(DB* db, ColumnFamilyHandle* column_family,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call the new API DeleteFilesInRanges?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Contributor Author

@huachaohuang huachaohuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll share some numbers later :)

@@ -330,6 +330,8 @@ void CancelAllBackgroundWork(DB* db, bool wait = false);
// Snapshots before the delete might not see the data in the given range.
Status DeleteFilesInRange(DB* db, ColumnFamilyHandle* column_family,
const Slice* begin, const Slice* end);
Status DeleteFilesInRange(DB* db, ColumnFamilyHandle* column_family,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@@ -100,6 +100,14 @@ struct Range {
Range(const Slice& s, const Slice& l) : start(s), limit(l) { }
};

struct RangePtr {
const Slice* start; // Included in the range
const Slice* limit; // Not included in the range
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, I just copy it from struct Range. But we are actually intent on supporting exclusive range end in DeleteFilesInRange/DeleteFilesInRanges. Do you think it is OK to add that support in this PR, and make the range end exclusive by default in the new API DeleteFilesInRanges?

Using DeleteFilesInRange to delete files in a lot of ranges can be slow, because
VersionSet::LogAndApply is expensive.

This PR adds a new DeleteFilesInRange function to delete files in multiple
ranges at once.
@facebook-github-bot
Copy link
Contributor

@huachaohuang has updated the pull request.

@facebook-github-bot
Copy link
Contributor

@huachaohuang has updated the pull request. View: changes

@huachaohuang
Copy link
Contributor Author

Calling DeleteFilesInRange 7356 times which actually deletes 4530 files in 2117 ranges takes Duration { secs: 126, nanos: 856424397 }
Calling DeleteFilesInRanges with 8421 ranges at once which actually deletes 4448 files takes Duration { secs: 2, nanos: 467631117 }
This is a rough comparison, but I think it should be enough to show the benefit to deleting files in multiple ranges at once.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@@ -100,6 +100,14 @@ struct Range {
Range(const Slice& s, const Slice& l) : start(s), limit(l) { }
};

struct RangePtr {
const Slice* start; // Included in the range
const Slice* limit; // Not included in the range
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's delete // Not included in the range since it's optional

remove comments about range inclusion vs exclusion since it should be specified in the APIs.
@facebook-github-bot
Copy link
Contributor

@huachaohuang has updated the pull request. View: changes, changes since last import

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

huachaohuang added a commit to tikv/rocksdb that referenced this pull request Jan 31, 2018
Summary:
Using `DeleteFilesInRange` to delete files in a lot of ranges can be slow, because
`VersionSet::LogAndApply` is expensive.

This PR adds a new `DeleteFilesInRange` function to delete files in multiple
ranges at once.

Close facebook#2951
Closes facebook#3431

Differential Revision: D6849228

Pulled By: ajkr

fbshipit-source-id: daeedcabd8def4b1d9ee95a58266dee77b5d68cb
@huachaohuang huachaohuang deleted the delete-files-in-multi-ranges branch February 2, 2018 11:36
amytai pushed a commit to amytai/rocksdb that referenced this pull request Mar 9, 2018
Summary:
Using `DeleteFilesInRange` to delete files in a lot of ranges can be slow, because
`VersionSet::LogAndApply` is expensive.

This PR adds a new `DeleteFilesInRange` function to delete files in multiple
ranges at once.

Close facebook#2951
Closes facebook#3431

Differential Revision: D6849228

Pulled By: ajkr

fbshipit-source-id: daeedcabd8def4b1d9ee95a58266dee77b5d68cb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants