Aggregation duplicates points at tile borders #889

jgoizueta · 2018-03-08T19:11:08Z

Our aggregation queries select the points in the tile with this filter:

   WHERE the_geom_webmercator && _cdb_params.bbox

So, points at tile borders can be selected in multiple tiles.

The aggregation expression, since it uses Floor functions should never aggregate a point to the wrong cluster, so those border points will appear clusters outside the tile in all except one tile.

We should fix the queries to avoid returning clusters that don't belong to the tile.

This can be done either by properly filtering points to be aggregated or by filtering after aggregation.

In the first case we could replace:

   WHERE the_geom_webmercator && _cdb_params.bbox

by:

        WHERE ST_X(the_geom_webmercator) >= ST_XMIN(bbox)
          AND ST_X(the_geom_webmercator) < ST_XMAX(bbox)
          AND ST_Y(the_geom_webmercator) >= ST_YMIN(bbox)
          AND ST_Y(the_geom_webmercator) < ST_YMAX(bbox)

But this could degrade the performance (or not, because the the_geom_webmercator && _cdb_params.bbox is added by Mapnik anyway in its query wrapping).
A solution could be add both conditions (we'd need to check).

The other option would be to filter the aggregation coordinates, for example:

        GROUP BY
            Floor(ST_X(_cdb_query.the_geom_webmercator)/_cdb_params.res),
            Floor(ST_Y(_cdb_query.the_geom_webmercator)/_cdb_params.res)
         HAVING 
             Floor(ST_X(_cdb_query.the_geom_webmercator)/_cdb_params.res) >= 0
             AND Floor(ST_X(_cdb_query.the_geom_webmercator)/_cdb_params.res) < 256/resolution
             AND etc...

Which would not use indices, but would operate on relatively few rows...

The text was updated successfully, but these errors were encountered:

Algunenano · 2018-03-09T12:06:13Z

You could do

WHERE the_geom_webmercator && _cdb_params.bbox
    AND ST_X(the_geom_webmercator) < ST_XMAX(bbox) -- If you have the number it's better
    AND ST_Y(the_geom_webmercator) < ST_YMAX(bbox) -- If you have the number it's better

This approach a) only works for points; b) Some artefacts are to be expected if the right border of the most to the right tile falls exactly at the border of the screen; same for the bottom, since the tile covering the point will not be requested.

Keeping in mind I don't know anything at all about the renderer, could it be easier to return the cartodb_id with each geometry and avoid rendering it again if duplicated? Please disregard it if I'm saying something stupid.

See #889 FOr centroid and point-grid the cartodb_id wasn't unique across tiles.

jgoizueta · 2018-06-11T13:37:37Z

I'm closing it by now because we can handle the duplicated border points in the clients easily.
If we find this a problem in the future in some use case let's reopen and apply @Algunenano proposed solution.

jgoizueta · 2018-07-11T09:45:49Z

I'm reopening this because I think we should avoid the duplicate points: it would avoid potential confusions in the use of the tiles, and, commented above shouldn't have a performance impact (if we add the filtering condition in addition to the existing one which uses indices).

As stated above, border points are assigned to the correct cluster so that only in one tile the resulting cluster belongs to the tile. But the other cases are not only out of the tile, but can contain incomplete aggregations (only the border points are aggregated). This is not a problem if the client filters the out-of-the-tile clusters geometrically, but it could be a problem if duplicates are avoided by cartodb_id instead. In any case it seems inconsistent to include those partially aggregated clusters in the tile.

jgoizueta · 2018-07-15T19:23:33Z

I've realised that the simple filtering described above is only valid with no tile extent/buffer, or when the MVT extent is a multiple of the aggregation resolution.

To exclude all aggregation cells that are only partially included in the bbox we must first compute
the limits of the cells fully included int the bbox:

CEIL(ST_XMIN(bbox)/res)*res AS xmin,
FLOOR(ST_XMAX(bbox)/res)*res AS xmax,
CEIL(ST_YMIN(bbox)/res)*res AS ymin,
FLOOR(ST_YMAX(bbox)/res)*res AS ymax

And then filter with:

(the_geom_webmercator && bbox) AND
ST_X(the_geom_webmercator) >= xmin AND
ST_X(the_geom_webmercator) < xmax AND
ST_Y(the_geom_webmercator) >= ymin AND
ST_Y(the_geom_webmercator) < ymax

I've been trying this and I've found a problem when zoom level is high and the tile is away from the origin: in such cases,the inaccuracies in Mapnik's provided bbox can make us drop cells on the border of the area.

This wouldn't happen if bbox was computed with full double accuracy, as CDB_XYZ_Extent does.

For example, for the tile z=20, x=1000000, y=1000000, here's the difference between the minimum Y of the tile and the correct value -18181044.018313024:

ST_YMIN(CDB_XYZ_Extent(1000000,1000000,20)) + 18181044.018313024 -- => 0 this is good
ST_YMIN(!bbox!) + 18181044.018313024 -- => 3.72529029846191e-09 oh oh

This discrepancy makes the filtering formulas above skip the lower row of cells in the tile.

Note: I have yet to test this with production Mapnik. Also, @Algunenano, I'll ask you to run some tests I have in your local configuration.

Also to be able to execute the tests I've prepared accurately we need to fix the problem described in #1001 so I will include combine those fixes in a PR

This tests #889

jgoizueta · 2018-07-16T09:24:55Z

For a moment I thought that we better include all the cells intersected by the bbox, which would be more robust against bbox lack of precision (taking floor of min, ceil of max). But the problem is that the spatial filters that Mapnik wraps the queries with will prevent all the necessary data from being aggregated.

jgoizueta · 2018-07-16T09:33:40Z

I've notice the error in the example ST_YMIN(!bbox!) + 18181044.018313024 -- => 3.72529029846191e-09 is exactly 1 ulp (unit in the last place). If this is so, we could manage the problem by adding/substracting a relative epsilon before the floor/ceil operations.

jgoizueta · 2018-07-16T10:47:11Z

This applies the idea in my last comment and seems to work. Further testing will be needed, though.

jgoizueta mentioned this issue Mar 8, 2018

Features disappearing when zoom changes CartoDB/carto-vl#91

Closed

jgoizueta added a commit that referenced this issue Apr 4, 2018

Use unique cartodb_id in aggregated results

c1da1a8

See #889 FOr centroid and point-grid the cartodb_id wasn't unique across tiles.

jgoizueta added a commit that referenced this issue Apr 5, 2018

Use unique cartodb_id in aggregated results

ffa3a96

See #889 FOr centroid and point-grid the cartodb_id wasn't unique across tiles.

jgoizueta mentioned this issue Jun 11, 2018

Avoid duplicated points between tiles CartoDB/carto-vl#541

Closed

jgoizueta closed this as completed Jun 11, 2018

jgoizueta mentioned this issue Jul 11, 2018

Adapt tests for more accurate PROJ #998

Merged

jgoizueta reopened this Jul 11, 2018

ramiroaznar assigned jgoizueta Jul 11, 2018

This was referenced Jul 11, 2018

Test: Invalid aggregation test with PROJ 5.1 #994

Closed

Aggregation innacuracies #1001

Closed

jgoizueta added a commit that referenced this issue Jul 15, 2018

Test that tiles do not contain partially aggregated clusters

e21ab12

This tests #889

jgoizueta mentioned this issue Jul 15, 2018

Aggregation fixes #1002

Merged

jgoizueta closed this as completed in 716f983 Jul 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregation duplicates points at tile borders #889

Aggregation duplicates points at tile borders #889

jgoizueta commented Mar 8, 2018 •

edited

Loading

Algunenano commented Mar 9, 2018 •

edited

Loading

jgoizueta commented Jun 11, 2018

jgoizueta commented Jul 11, 2018

jgoizueta commented Jul 15, 2018

jgoizueta commented Jul 16, 2018

jgoizueta commented Jul 16, 2018

jgoizueta commented Jul 16, 2018

Aggregation duplicates points at tile borders #889

Aggregation duplicates points at tile borders #889

Comments

jgoizueta commented Mar 8, 2018 • edited Loading

Algunenano commented Mar 9, 2018 • edited Loading

jgoizueta commented Jun 11, 2018

jgoizueta commented Jul 11, 2018

jgoizueta commented Jul 15, 2018

jgoizueta commented Jul 16, 2018

jgoizueta commented Jul 16, 2018

jgoizueta commented Jul 16, 2018

jgoizueta commented Mar 8, 2018 •

edited

Loading

Algunenano commented Mar 9, 2018 •

edited

Loading