Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDR tests are failing with gdal=3.4.2 #905

Closed
emlys opened this issue Mar 16, 2022 · 10 comments · Fixed by #1036
Closed

SDR tests are failing with gdal=3.4.2 #905

emlys opened this issue Mar 16, 2022 · 10 comments · Fixed by #1036
Assignees
Labels
bug Something isn't working in progress This issue is actively being worked on

Comments

@emlys
Copy link
Member

emlys commented Mar 16, 2022

Three SDR tests are failing on github actions:

test_base_regression: AssertionError: these key (actual/expected) errors occured: {'usle_tot': (13.90210914612, 14.25030517578), 'sed_export': (0.55185163021, 0.60502111912), 'sed_dep': (8.80612564087, 9.05251502991)}

test_drainage_regression: AssertionError: these key (actual/expected) errors occured: {'sed_export': (0.97192692757, 1.02959537506), 'usle_tot': (12.68887424469, 12.97211265564)}

test_regression_with_undefined_nodata: AssertionError: these key (actual/expected) errors occured: {'sed_export': (0.55185163021, 0.60502111912), 'usle_tot': (13.90210914612, 14.25030517578)}

These passed when I ran them locally. Possibly an updated dependency?

@emlys emlys added bug Something isn't working critical for issues likely to obstruct the whole dev team (e.g. broken builds) labels Mar 16, 2022
@emlys emlys self-assigned this Mar 22, 2022
@emlys emlys added the in progress This issue is actively being worked on label Mar 30, 2022
@emlys
Copy link
Member Author

emlys commented Apr 1, 2022

Confirmed that the tests fail with GDAL 3.4.2 and pass with 3.4.1. The differences are larger than I'd like to attribute to floating point imprecision.

@emlys
Copy link
Member Author

emlys commented Apr 1, 2022

The difference is coming from a change in the aligned LULC raster. Something in the alignment process must have changed.
Original LULC:
image
Aligned LULC using GDAL 3.4.1:
image
Aligned LULC using GDAL 3.4.2:
image

The 3.4.2 version actually looks a lot more accurate (maybe identical) to the original. In the 3.4.1 version lots of pixels are switched to LULC class 2 (gray).
The other aligned rasters are the same as always. LULC is the only one that's aligned using mode interpolation.

@emlys
Copy link
Member Author

emlys commented Apr 4, 2022

This comes down to this undocumented change released in 3.4.2. From what I can tell, nearest neighbor is being used instead of mode, because of the new setting GDAL_WARP_USE_TRANSLATION_OPTIM that defaults to YES. In this particular SDR case, the warp operation is just translating along pixel boundaries (?). And because of that, it's being "optimized" to use near instead of mode.

The near result is way more accurate to the original. We use near a lot in invest and mode for just a few (SDR: LULC, Urban Cooling: LULC, Urban Flood: LULC and soil groups). I don't know why mode was chosen over near for these, and I don't know if it really matters.

However, near is definitely not what we want for resampling DEMs, etc. It kind of seems like a mistake that GDAL now silently ignores your selecting warping algorithm.

I think next steps are:

  • Set GDAL_WARP_USE_TRANSLATION_OPTIM=NO to restore the old behavior (does this have to be set as an env variable? would be nice to keep it isolated to invest)
  • Consider using near instead of mode, since it gives way more accurate results at least in this one case
  • Possibly get clarification from GDAL developers about the intent of this change

@davemfish
Copy link
Contributor

just translating along pixel boundaries (?)

Yeah I don't know what they mean by this either. And I also see this in the changeset you linked:

const int nResWinSize = m_bIsTranslationOnPixelBoundaries ? 0 : GWKGetFilterRadius(psOptions->eResampleAlg);

So if that option is on, the resize window is 0? Meaning 0 pixels (or 0 other pixels?) are included in the resampling window. So how could it possibly matter what the resampling algorithm is at that point?

I'm doing a lot speculation here, I haven't fully digested the source code.

@davemfish
Copy link
Contributor

It kind of seems like a mistake that GDAL now silently ignores your selecting warping algorithm.

Yes, but I do wonder when exactly this happens. For example I see that in the SDR sample data, the DEM and the LULC rasters are already perfectly aligned as inputs, so perhaps that's a reason why GDAL goes for the optimized options? And if it's actually improving the accuracy of the result, maybe it's a good thing.

If they don't use these options in any other cases, then maybe we don't need to make any changes?

@emlys
Copy link
Member Author

emlys commented Apr 5, 2022

If they don't use these options in any other cases, then maybe we don't need to make any changes?

That would probably be fine if we can confirm that. I'm struggling to understand the source code and see when it applies. I created an issue on osgeo/gdal asking about it.

@dcdenu4
Copy link
Member

dcdenu4 commented Apr 7, 2022

Here's the ticket: OSGeo/gdal#5578!

@emlys
Copy link
Member Author

emlys commented Apr 8, 2022

Pinning GDAL<3.4.2 as a temporary fix until we have a better understanding of what the changes in 3.4.2 do.

@davemfish
Copy link
Contributor

It seems there are DLL import issues with version 3.4.1, so the resolution now is gdal<3.4.1.

And once we have consensus that the changes in gdal 3.4.2 are there to stay, we can update SDR regression data to accommodate them.

@davemfish davemfish removed the critical for issues likely to obstruct the whole dev team (e.g. broken builds) label Apr 12, 2022
@davemfish davemfish changed the title SDR tests are failing on main SDR tests are failing with gdal=3.4.2 Apr 12, 2022
@emlys
Copy link
Member Author

emlys commented Jul 19, 2022

Here's the ticket: OSGeo/gdal#5578!

This got an answer that the results seem correct (and are indeed more accurate than before). I still don't understand when exactly this improvement is applied, but there are many things I don't understand about GDAL 🤷‍♀️

emlys added a commit to emlys/invest that referenced this issue Jul 19, 2022
emlys added a commit to emlys/invest that referenced this issue Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working in progress This issue is actively being worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants