-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/ufrm missing lucodes #698
Merged
phargogh
merged 3 commits into
natcap:main
from
davemfish:bugfix/UFRM-686-lucodes-matrix
Oct 26, 2021
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -773,8 +773,30 @@ def _lu_to_cn_op( | |
# pixel and the rows are the curve number index for the landcover | ||
# type under that pixel (0..3 are CN_A..CN_D and 4 is "unknown") | ||
valid_lucodes = lucode_array[valid_mask].astype(int) | ||
|
||
try: | ||
cn_matrix = lucode_to_cn_table[valid_lucodes] | ||
except IndexError: | ||
# Find the code that raised the IndexError, and possibly | ||
# any others that also would have. | ||
lucodes = numpy.unique(valid_lucodes) | ||
missing_codes = lucodes[lucodes >= lucode_to_cn_table.shape[0]] | ||
raise ValueError( | ||
f'The biophysical table is missing a row for lucode(s) ' | ||
f'{missing_codes.tolist()}') | ||
|
||
# Even without an IndexError, still must guard against | ||
# lucodes that can index into the sparse matrix but were | ||
# missing from the biophysical table. They have rows of all 0. | ||
Comment on lines
+788
to
+790
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is very helpful. |
||
if not cn_matrix.sum(1).all(): | ||
empty_rows = numpy.where(lucode_to_cn_table.sum(1) == 0) | ||
missing_codes = numpy.intersect1d(valid_lucodes, empty_rows) | ||
raise ValueError( | ||
f'The biophysical table is missing a row for lucode(s) ' | ||
f'{missing_codes.tolist()}') | ||
|
||
per_pixel_cn_array = ( | ||
lucode_to_cn_table[valid_lucodes].toarray().reshape( | ||
cn_matrix.toarray().reshape( | ||
(-1, 4))).transpose() | ||
|
||
# this is the soil type array with values ranging from 0..4 that will | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -149,10 +149,53 @@ def test_ufrm_value_error_on_bad_soil(self): | |
|
||
with self.assertRaises(ValueError) as cm: | ||
urban_flood_risk_mitigation.execute(args) | ||
actual_message = str(cm.exception) | ||
expected_message = ( | ||
'Check that the Soil Group raster does not contain') | ||
self.assertTrue(expected_message in actual_message) | ||
|
||
actual_message = str(cm.exception) | ||
expected_message = ( | ||
'Check that the Soil Group raster does not contain') | ||
self.assertTrue(expected_message in actual_message) | ||
|
||
def test_ufrm_value_error_on_bad_lucode(self): | ||
"""UFRM: assert exception on missing lucodes.""" | ||
import pandas | ||
from natcap.invest import urban_flood_risk_mitigation | ||
args = self._make_args() | ||
|
||
bad_cn_table_path = os.path.join( | ||
self.workspace_dir, 'bad_cn_table.csv') | ||
cn_table = pandas.read_csv(args['curve_number_table_path']) | ||
|
||
# drop a row with an lucode known to exist in lulc raster | ||
# This is a code that will successfully index into the | ||
# CN table sparse matrix, but will not return valid data. | ||
bad_cn_table = cn_table[cn_table['lucode'] != 0] | ||
bad_cn_table.to_csv(bad_cn_table_path, index=False) | ||
args['curve_number_table_path'] = bad_cn_table_path | ||
|
||
with self.assertRaises(ValueError) as cm: | ||
urban_flood_risk_mitigation.execute(args) | ||
|
||
actual_message = str(cm.exception) | ||
expected_message = ( | ||
f'The biophysical table is missing a row for lucode(s) {[0]}') | ||
self.assertEqual(expected_message, actual_message) | ||
|
||
# drop rows with lucodes known to exist in lulc raster | ||
# These are codes that will raise an IndexError on | ||
# indexing into the CN table sparse matrix. The test | ||
# LULC raster has values from 0 to 21. | ||
Comment on lines
+183
to
+186
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The summary here of what's in the LULC raster is a very good idea and was helpful in digging into this! |
||
bad_cn_table = cn_table[cn_table['lucode'] < 15] | ||
bad_cn_table.to_csv(bad_cn_table_path, index=False) | ||
args['curve_number_table_path'] = bad_cn_table_path | ||
|
||
with self.assertRaises(ValueError) as cm: | ||
urban_flood_risk_mitigation.execute(args) | ||
|
||
actual_message = str(cm.exception) | ||
expected_message = ( | ||
f'The biophysical table is missing a row for lucode(s) ' | ||
f'{[16, 17, 18, 21]}') | ||
self.assertEqual(expected_message, actual_message) | ||
|
||
def test_ufrm_string_damage_to_infrastructure(self): | ||
"""UFRM: handle str(int) structure indices. | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be very wrong about this, but to my eye I think this might always retrieve the top
n
lucodes. When I make the following change totests/test_ufrm.py
(and also PDB'd into the stacktrace):I get the top end of the range instead of
[0, 2, 3, 16, 17, 18, 21]
:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you're right, but also that's how I originally intended it. When it hits the
IndexError
it's finding only the codes that can raise that IndexError. We could also do the check that appears in the next block, where we check for codes that do not raise the IndexError but are still missing ([0, 2, 3] in your example). What do you think?The other thing is, in this scope we only ever know the pixel values that appear in the current
iterblocks
block, right? So in either case (the IndexError or the subsequent check for empty rows) our error message might miss values that would get caught in a subsequent raster block.Also, there were bugs in the
assertRaises
blocks of all these tests that were preventing the nested assertions from even being called. I fixed that and made some assertions a bit more explicit to better differentiate the two cases that are being tested in the one new test.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After poking around this a bit more, I think what you have here is actually a complete solution! I constructed a table (Biophysical_water_SF_bad.csv) to obviously be missing required values but have enough and large-enough values in the matrix to not trigger the
IndexError
. When the CSR matrix is indexed into (which does not raiseIndexError
), the second check for all-zero rows catches all of the missing internal values as we would hope:So I was wrong and this test is good to go!
Yep, you're absolutely right ... the only way we'll know if our error message reflects all the missing values is by checking all of the raster values. In my opinion, the value (and relative ease of maintenance) of failing fast will probably outweigh the benefits of a truly complete error message.
Oh man, thanks for catching that!