Bugfix/ufrm missing lucodes #698

davemfish · 2021-10-25T16:49:49Z

This PR adds error-checking & handling to guard against lucodes that are present in the raster but missing from the biophysical table.

The approach taken here is heavily dictated by the fact that the biophysical table data is stored in a 2d sparse matrix. That seems like the right choice for the context of _lu_to_cn_op, but it does obfuscate things a bit when it comes to guarding against bad data. I'm interested to hear more opinions regarding those tradeoffs.

Fixes #686

Checklist

Updated HISTORY.rst (if these changes are user-facing)
Updated the user's guide (if needed)

…rom the biophysical table. natcap#686.

phargogh

Thanks for this @davemfish ! I think there might be a small bug in the first ValueError check, but I could absolutely be wrong about this. Could you take a look and let me know what you think?

phargogh · 2021-10-25T17:48:56Z

src/natcap/invest/urban_flood_risk_mitigation.py

+        # Find the code that raised the IndexError, and possibly
+        # any others that also would have.
+        lucodes = numpy.unique(valid_lucodes)
+        missing_codes = lucodes[lucodes >= lucode_to_cn_table.shape[0]]


I could be very wrong about this, but to my eye I think this might always retrieve the top n lucodes. When I make the following change to tests/test_ufrm.py (and also PDB'd into the stacktrace):

diff --git a/tests/test_ufrm.py b/tests/test_ufrm.py index 8f8e2ccc1..a6487fd4b 100644 --- a/tests/test_ufrm.py +++ b/tests/test_ufrm.py @@ -182,7 +182,8 @@ class UFRMTests(unittest.TestCase): # These are codes that will raise an IndexError on # indexing into the CN table sparse matrix. The test # LULC raster has values from 0 to 21. - bad_cn_table = cn_table[cn_table['lucode'] < 15] + bad_cn_table = cn_table[ + (cn_table['lucode'] > 3) & (cn_table['lucode'] < 15)] bad_cn_table.to_csv(bad_cn_table_path, index=False) args['curve_number_table_path'] = bad_cn_table_path

I get the top end of the range instead of [0, 2, 3, 16, 17, 18, 21]:

ValueError: The biophysical table is missing a row for lucode(s)[16, 17, 18, 21]

Yes, I think you're right, but also that's how I originally intended it. When it hits the IndexError it's finding only the codes that can raise that IndexError. We could also do the check that appears in the next block, where we check for codes that do not raise the IndexError but are still missing ([0, 2, 3] in your example). What do you think?

The other thing is, in this scope we only ever know the pixel values that appear in the current iterblocks block, right? So in either case (the IndexError or the subsequent check for empty rows) our error message might miss values that would get caught in a subsequent raster block.

Also, there were bugs in the assertRaises blocks of all these tests that were preventing the nested assertions from even being called. I fixed that and made some assertions a bit more explicit to better differentiate the two cases that are being tested in the one new test.

What do you think?

After poking around this a bit more, I think what you have here is actually a complete solution! I constructed a table (Biophysical_water_SF_bad.csv) to obviously be missing required values but have enough and large-enough values in the matrix to not trigger the IndexError. When the CSR matrix is indexed into (which does not raise IndexError), the second check for all-zero rows catches all of the missing internal values as we would hope:

ValueError: The biophysical table is missing a row for lucode(s)[0, 2, 3, 4, 5, 8, 10, 16, 17, 18, 21]

So I was wrong and this test is good to go!

The other thing is, in this scope we only ever know the pixel values that appear in the current iterblocks block, right? So in either case (the IndexError or the subsequent check for empty rows) our error message might miss values that would get caught in a subsequent raster block.

Yep, you're absolutely right ... the only way we'll know if our error message reflects all the missing values is by checking all of the raster values. In my opinion, the value (and relative ease of maintenance) of failing fast will probably outweigh the benefits of a truly complete error message.

Also, there were bugs in the assertRaises blocks of all these tests that were preventing the nested assertions from even being called. I fixed that and made some assertions a bit more explicit to better differentiate the two cases that are being tested in the one new test.

Oh man, thanks for catching that!

phargogh · 2021-10-25T17:53:52Z

src/natcap/invest/urban_flood_risk_mitigation.py

+    # Even without an IndexError, still must guard against
+    # lucodes that can index into the sparse matrix but were
+    # missing from the biophysical table. They have rows of all 0.


This is very helpful.

phargogh · 2021-10-25T17:54:39Z

tests/test_ufrm.py

+        # drop rows with lucodes known to exist in lulc raster
+        # These are codes that will raise an IndexError on
+        # indexing into the CN table sparse matrix. The test
+        # LULC raster has values from 0 to 21.


The summary here of what's in the LULC raster is a very good idea and was helpful in digging into this!

phargogh

I was wrong about my previous assessment of missing values within the bound of the sparse matrix (see #698 (comment)), so this is good to go. The one build failure is pretty clearly a github response error when fetching tags and shouldn't have anything to do with this PR.

davemfish added 2 commits October 25, 2021 12:35

added tests and error-checking to guard against lulc values missing f…

2d9e3ed

…rom the biophysical table. natcap#686.

note for HISTORY. natcap#686.

5185680

davemfish self-assigned this Oct 25, 2021

davemfish added this to the 3.9.2 milestone Oct 25, 2021

phargogh self-requested a review October 25, 2021 17:20

phargogh requested changes Oct 25, 2021

View reviewed changes

fixing bug in tests and making assertions more explicit. natcap#686.

3bcfaec

davemfish requested a review from phargogh October 25, 2021 19:04

phargogh approved these changes Oct 26, 2021

View reviewed changes

phargogh merged commit 82f11b7 into natcap:main Oct 26, 2021

davemfish mentioned this pull request Nov 22, 2022

UFRM: incorrect ValueError raised if biophysical table includes a row of all 0s #1123

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix/ufrm missing lucodes #698

Bugfix/ufrm missing lucodes #698

davemfish commented Oct 25, 2021

phargogh left a comment

phargogh Oct 25, 2021

davemfish Oct 25, 2021

phargogh Oct 26, 2021 •

edited

Loading

phargogh Oct 25, 2021

phargogh Oct 25, 2021

phargogh left a comment

Bugfix/ufrm missing lucodes #698

Bugfix/ufrm missing lucodes #698

Conversation

davemfish commented Oct 25, 2021

Checklist

phargogh left a comment

Choose a reason for hiding this comment

phargogh Oct 25, 2021

Choose a reason for hiding this comment

davemfish Oct 25, 2021

Choose a reason for hiding this comment

phargogh Oct 26, 2021 • edited Loading

Choose a reason for hiding this comment

phargogh Oct 25, 2021

Choose a reason for hiding this comment

phargogh Oct 25, 2021

Choose a reason for hiding this comment

phargogh left a comment

Choose a reason for hiding this comment

phargogh Oct 26, 2021 •

edited

Loading