Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for STR bulk packing #67

Merged
merged 4 commits into from
Mar 6, 2017

Conversation

ambling
Copy link
Contributor

@ambling ambling commented Mar 3, 2017

For Issue #45

add support of Rtree creation through STR packing, according to the paper:
STR: a simple and efficient algorithm for R-tree packing

add a recursive function in RTree.Builder and set the default filling factor as 0.7 (1.0 is better for readonly RTree while 0.7 reserves space for insertion).

note that this implementation is only for 2D RTree.

@ambling
Copy link
Contributor Author

ambling commented Mar 3, 2017

The CI checks failed with a buffer overflow on Java6

However, all tests passed after a re-run...

@ambling ambling closed this Mar 3, 2017
@ambling ambling reopened this Mar 3, 2017
@codecov-io
Copy link

codecov-io commented Mar 3, 2017

Codecov Report

Merging #67 into master will increase coverage by 0.42%.
The diff coverage is 95.83%.

@@             Coverage Diff              @@
##             master      #67      +/-   ##
============================================
+ Coverage      77.9%   78.33%   +0.42%     
- Complexity      640      641       +1     
============================================
  Files            57       57              
  Lines          1824     1869      +45     
  Branches        241      248       +7     
============================================
+ Hits           1421     1464      +43     
- Misses          355      357       +2     
  Partials         48       48
Impacted Files Coverage Δ Complexity Δ
...c/main/java/com/github/davidmoten/rtree/RTree.java 97.95% <95.83%> (-0.72%) 65 <1> (+1)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23fad39...e2e170c. Read the comment docs.

@davidmoten
Copy link
Owner

Thanks for the PR, looks interesting! I'll review this weekend.

@ambling
Copy link
Contributor Author

ambling commented Mar 4, 2017

I fixed a bug but the CI failed again with openjdk6.

After trying the updated workaround on travis-ci/travis-ci#5227 it seems to be working.

@davidmoten
Copy link
Owner

davidmoten commented Mar 4, 2017 via email

@ambling
Copy link
Contributor Author

ambling commented Mar 4, 2017

@davidmoten
the results on index creation time in the benchmark on my laptop:

Benchmark                                                                               Mode  Cnt        Score       Error  Units
BenchmarksRTree.bulkLoadingFullRTreeCreation010                                        thrpt   10       73.747 ±     3.650  ops/s
BenchmarksRTree.bulkLoadingRTreeCreation010                                            thrpt   10       86.520 ±     1.905  ops/s
BenchmarksRTree.defaultRTreeCreation010                                                thrpt   10        6.506 ±     0.062  ops/s
BenchmarksRTree.flatBufferRTreeCreation010                                             thrpt   10     3818.918 ±    65.382  ops/s
BenchmarksRTree.starRTreeCreation010                                                   thrpt   10        3.141 ±     0.047  ops/s

The bulk loading is about 10x faster than the default RTree creation and about 20x faster than the R*-tree creation.

The FlatBuffers-based RTree is much faster is because the creation process just loads the basic context info and one root node.

Besides, it is interesting in the results that bulk packing with 70% full nodes is a little faster than that with 100% full nodes, while in fact there should be less nodes created if the nodes are 100% full. I may dig this deeper for the reason.

As for the insertion/deletion/search performance on the STR-RTree, it would be reasonably worse than that of the R*-tree, but I haven't check this yet.

And for geometry types other than the point, the performance of STR-RTree maybe even more worse, because the packing is based on just the central position, leading to more overlapping among nodes.

So actually I still recommend the R*-tree implementation in most cases, but for very large dataset indexing or readonly applications the STR-Rtree can be offered as a good option.

return packingSTR(nodes, false, size, context);
}

private class MidComparator implements Comparator<HasGeometry> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make static final

}

private class MidComparator implements Comparator<HasGeometry> {
private short dimension; // leave space for multiple dimensions, 0 for x, 1 for y, ...
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make final

/**
* Create an RTree by bulk loading, using the STR method.
* STR: a simple and efficient algorithm for R-tree packing
* http://ieeexplore.ieee.org/abstract/document/582015/
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add javadoc comment about the fact that the method mutates the input list

@@ -713,25 +755,33 @@ public void testStarTreeReturnsSameAsStandardRTree() {
rectangle(1, 0.252, 50, 69.23), rectangle(13.12, 23.123, 50.45, 80.9),
rectangle(10, 10, 50, 50) };

List<Entry<Integer, Geometry>> entris = new ArrayList<Entry<Integer, Geometry>>(10000);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entries

}

@Override
public boolean equals(Object obj) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you implementing equals? I'd say remove.

@ambling
Copy link
Contributor Author

ambling commented Mar 6, 2017

Thanks for the review!

I have made the changes, please have a look at that.

@davidmoten
Copy link
Owner

Thanks @ambling. So I suppose the remaining task is to come up with some reasonably comprehensive advice about the pros and cons of using the STR bulk load. Would you be able to do a bit more perf testing so that we have something concrete to advise users?

@ambling
Copy link
Contributor Author

ambling commented Mar 6, 2017

Yes @davidmoten . I think maybe I can add one or more test datasets and more benchmarking cases to verify the performance. I may finish the task in several days.

@davidmoten davidmoten merged commit f0609d5 into davidmoten:master Mar 6, 2017
@davidmoten
Copy link
Owner

Thanks @ambling. Let's merge this now and when you have more detail we'll add it the docs. Thanks very much for your work!

@ambling ambling deleted the str-bulk-loading branch March 7, 2017 03:12
@davidmoten davidmoten mentioned this pull request Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants