Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NDR runoff proxy is way off if no nodata value is defined #1005

Closed
phargogh opened this issue May 24, 2022 · 4 comments · Fixed by #1367
Closed

NDR runoff proxy is way off if no nodata value is defined #1005

phargogh opened this issue May 24, 2022 · 4 comments · Fixed by #1367
Assignees
Labels
bug Something isn't working in progress This issue is actively being worked on
Milestone

Comments

@phargogh
Copy link
Member

This came up in a recent forums post and has an easy workaround (runoff proxy input should have a nodata value).

To reproduce:

  • Run NDR with a runoff proxy input raster that does not have a nodata value and all pixels have valid, reasonable values.
  • Run NDR with a runoff proxy input raster that does have a defined nodata value.
  • Observe very different n_export values across the 2 runs.

The problem here is that early in the model, we align the runoff proxy input and mask out pixel values outside of the watersheds vector. Because there's no nodata value defined, a pixel value of 0 is assumed for masked-out pixels, and we log a relevant WARNING in the logfile. But this is a serious issue later on because when we're normalizing runoff proxy pixel values, we assume that all of the pixel values that are non-nodata are valid, and when those are all 0, they're skewing the normalization down towards 0.

If we can assume that the runoff proxy raster must be positive values, then we can simply define a negative nodata value and be on our merry way. If not, then maybe we can assume +/- infinity as a nodata value?

It may also be a reasonable solution to add some functionality to pygeoprocessing's align_and_resize_raster_stack to assign default nodata values to specific datasets if one is not defined, but that also gets tricky. Maybe validation is a better solution?

@phargogh phargogh added the bug Something isn't working label May 24, 2022
@phargogh
Copy link
Member Author

Related: #783

@phargogh
Copy link
Member Author

In a slack thread with Rafa and Stacie, they clarified that runoff proxy should always be >0. So a valid option here might be to leave the runoff proxy layer as it is after warping (with 0 values) and to handle this as though it's nodata in subsequent operations.

@phargogh phargogh self-assigned this Apr 7, 2023
@phargogh phargogh added this to the 3.13.1 milestone Apr 7, 2023
@phargogh phargogh assigned emlys and unassigned phargogh Aug 1, 2023
@emlys
Copy link
Member

emlys commented Aug 9, 2023

I see a few problems here -

  1. IMO, pygeoprocessing.mask_raster should error in this case, rather than defaulting to zero, which is often a valid data value
  2. align_and_resize_raster_stack calls mask_raster (via warp_raster) but does not expose the target_mask_value parameter, which would solve the problem here
  3. Since we've been talking about API redesign lately, I'll say that it would feel more intuitive to me if align_and_resize_raster_stack did not include masking functionality at all. Masking is different from aligning, but it gets confusing using them together. And it doesn't look like it provides much of a performance boost - they're still sequential operations.

emlys added a commit to emlys/invest that referenced this issue Aug 10, 2023
@dcdenu4
Copy link
Member

dcdenu4 commented Aug 10, 2023

I certainly agree that it'd be nice to have a consistent method of handling the "no defined" nodata issue in pygeoprocessing.

I'll say that it would feel more intuitive to me if align_and_resize_raster_stack did not include masking functionality at all

I think that's a fair point. I have found it convenient to use vector_mask_options in the past and I think it's been convenient from an InVEST standpoint to mask based on AOI because that's how we visualize the process (though I don't think necessary given how we handle most operations in raster_calculator calls). I do wonder if we do a vector focused API refresh, if it could make sense to purposely separate these.

Thinking about this issue from an InVEST input perspective I think we should have a more thorough validation and action procedure for "no defined" nodata values. We've talked about this case being a-okay with GDAL, so we should also support it. I don't think that's necessary from InVEST (though in my mind makes for a better argument for pygeoprocessing). But, given we want to make it "easier" for users to run InVEST, I can see how handling this common case would be beneficial to that cause. And from InVEST, we know (to a large extent) what the inputs and intermediate value ranges should be, so we can make highly educated decisions about what a proper nodata value could be for given raster data.

So I'd prefer initially the option of:

  • Handle InVEST input rasters with "no defined" nodata values up front by creating a copy of the input and assigning a valid nodata to that raster.

@emlys emlys added the in progress This issue is actively being worked on label Aug 10, 2023
emlys added a commit to emlys/invest that referenced this issue Aug 11, 2023
emlys added a commit to emlys/invest that referenced this issue Aug 17, 2023
emlys added a commit to emlys/invest that referenced this issue Aug 18, 2023
emlys added a commit to emlys/invest that referenced this issue Aug 18, 2023
emlys added a commit to emlys/invest that referenced this issue Aug 18, 2023
emlys added a commit to emlys/invest that referenced this issue Aug 18, 2023
emlys added a commit to emlys/invest that referenced this issue Aug 18, 2023
emlys added a commit to emlys/invest that referenced this issue Aug 18, 2023
emlys added a commit to emlys/invest that referenced this issue Aug 21, 2023
emlys added a commit to emlys/invest that referenced this issue Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working in progress This issue is actively being worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants