Remove `build_lookup_from_csv` and consolidate into `read_csv_to_dataframe` #1334

emlys · 2023-06-15T22:51:26Z

Description

Fixes #1327
Consolidated functionality into read_csv_to_dataframe and replaced uses of build_lookup_from_csv(...) with read_csv_to_dataframe(...).to_dict(orient='index').

Split the to_lower functionality into two separate args, cols_to_lower and vals_to_lower. Some of this might be obsoleted by #1328, but I wanted to take it one step at a time.

Checklist

Updated HISTORY.rst and link to any relevant issue (if these changes are user-facing)
Updated the user's guide (if needed)
Tested the affected models' UIs (if relevant)

…sv_to_dataframe natcap#1327

natcap#1327

emlys · 2023-06-15T22:58:19Z

tests/test_coastal_blue_carbon.py

@@ -358,7 +358,6 @@ def test_read_transition_matrix(self):
            transition_csv.write('b,,NCC,accum\n')
            transition_csv.write('c,accum,,NCC\n')
            transition_csv.write(',,,\n')
-            transition_csv.write(',legend,,')  # simulate legend


Here's one minor functional change. This was testing that extra stuff in the table would be ignored. I don't think we allow that in other tables in invest, so it seemed easier to drop it from the test than figure out how to accomplish that with read_csv_to_dataframe. But let me know if there's a reason to keep it!

Yep, you're right that this table legend is unique among the InVEST models! It's also weird in that it breaks from the normal table structure only to provide some extra metadata that's also described in the UG chapter. Personally I'm OK with removing it, but because this is a notable functional change we should also be sure to

remove the legend-generating code from the preprocessor

Note the functional change in the model's HISTORY

remove the legend from the transitions_template.csv in the sample data repo

What do you think?

Ah, I didn't realize that the preprocessor and sample data relied on this. In the interest of self-contained PRs, I'd rather not have to make all those changes here.

How about this for now? I made a small change in coastal_blue_carbon.py so that it will skip over rows that have nothing in the lulc-class column (which should be all the legend rows). That way the functionality is the same, but read_csv_to_dataframe doesn't have to handle it. And we can remove the legend in a separate issue if it's worth it?

Sounds like a fine solution to me! I took the liberty of describing this in #1336

phargogh

Thanks @emlys ! I left a few questions, comments and suggestions. And thanks for having this feature implementation as a nice, atomic PR in the bigger picture of where these table changes are heading :)

phargogh · 2023-06-19T21:36:47Z

src/natcap/invest/utils.py

+        cols_to_lower (bool): if True, convert all column names to lowercase
+        vals_to_lower (bool): if True, convert all table values to lowercase


I see from the docstring that cols_to_lower and vals_to_lower are supposed to be of type bool, but the name of the parameters makes it sound like they are supposed to be lists of columns. Is there a precedent for this sort of argument name in pandas? If not, might there be an alternate parameter name that indicates that this is a boolean?

Renamed these to convert_cols_to_lower and convert_vals_to_lower, hopefully that's clearer.

Yes, much clearer. Thanks!

phargogh · 2023-06-19T21:38:28Z

src/natcap/invest/utils.py

        dataframe = pandas.read_csv(
-            path, sep=sep, engine=engine, encoding=encoding, **kwargs)
+            path, index_col=False, sep=sep, engine=engine, encoding=encoding, **kwargs)
    except UnicodeDecodeError as error:
        LOGGER.error(
            f'{path} must be encoded as UTF-8 or ASCII')


I know this line isn't part of the PR, but is your impression that this error is about the filepath or about the contents of the table? And would it be worth it to clarify this in the error message? Just thinking about this in relation to the encoding parameter we're passing where we're defaulting to utf-8-sig but a different encoding could be used.

I believe this is about the contents of the table. I updated this to say "The file {path} must be..."

phargogh · 2023-06-19T21:46:00Z

src/natcap/invest/utils.py

+    # convert table values to lowercase
+    if vals_to_lower:
+        dataframe = dataframe.applymap(
+            lambda x: x.lower() if isinstance(x, str) else x)
+
+    # expand paths


Is there a case where we would want to lowercase values except for the paths that need to be expanded? As written, it seems like we could find ourselves in a case where filepaths are expanded only after the values have been lowercased.

(Sorry, I can't highlight the expand_path_cols block because it's collapsed in the github UI)

I think there definitely are cases where a table contains a path column that we don't want to lowercase, but other columns that we do want to lowercase. The read_csv_to_dataframe approach has not given us the option to distinguish between columns, so we've had to set to_lower=False on any table that contains paths.

When #1328 is implemented, that will give us the chance to apply different processing to each column. For now I don't think this is an issue, because vals_to_lower should always be False if expand_paths are defined.

phargogh · 2023-06-19T22:09:45Z

tests/test_coastal_blue_carbon.py

@@ -358,7 +358,6 @@ def test_read_transition_matrix(self):
            transition_csv.write('b,,NCC,accum\n')
            transition_csv.write('c,accum,,NCC\n')
            transition_csv.write(',,,\n')
-            transition_csv.write(',legend,,')  # simulate legend


Yep, you're right that this table legend is unique among the InVEST models! It's also weird in that it breaks from the normal table structure only to provide some extra metadata that's also described in the UG chapter. Personally I'm OK with removing it, but because this is a notable functional change we should also be sure to

remove the legend-generating code from the preprocessor

Note the functional change in the model's HISTORY

remove the legend from the transitions_template.csv in the sample data repo

What do you think?

phargogh · 2023-06-19T22:13:31Z

tests/test_utils.py

+
+        with open(csv_file, 'w') as file_obj:
+            file_obj.write(textwrap.dedent(
+                """


I see you have a .strip() there at the end (which is nice!) so this isn't a request to change anything, just an FYI! I recently learned that you can also do this to avoid the leading newline:

"""\ HEADER, A, b """

That is much nicer, I made this change everywhere in test_utils.py

phargogh · 2023-06-19T22:15:51Z

src/natcap/invest/utils.py

+        index_col (str): name of column to use as the dataframe index. If
+            ``cols_to_lower``, this column name and the dataframe column names
+            will be lowercased before they are compared. If ``usecols``
+            is defined, this must be included in ``usecols``.
+        usecols (list(str)): list of column names to subset from the dataframe.
+            If ``cols_to_lower``, these names and the dataframe column names
+            will be lowercased before they are compared.


Since index_col is required to be in usecols, would it make sense to simply add it into usecols if it isn't already there? It seems like this might be a minor bit of de-duplication when writing out a function call, but I'm curious what you think!

That certainly could make sense! I was going for consistency with pandas.read_csv, which will raise an error if you don't include the value of index_col in your usecols list. I think it also parallels nicely with the structure of the MODEL_SPEC - when we implement #1328, we might pipe in the spec columns, which already include the index column.

That all makes sense to me!

phargogh

Thanks @emlys ! I don't have any further comments or changes to request, so I'm adding my approval.

One thing I am seeing though is that the mac binary build is failing when it was passing at the beginning of the PR. Could you just take a peek at that and make sure it isn't failing because of something in the PR here? If it isn't, please feel free to merge away. Thanks!

emlys · 2023-06-22T15:50:20Z

Thanks @phargogh! I'm seeing the same error in the latest build on main (https://github.com/natcap/invest/actions/runs/5317455763/jobs/9628004338), so I think it's unrelated.

emlys added 3 commits June 15, 2023 12:59

move data cleaning functionality of build_lookup_from_csv into read_c…

278bb13

…sv_to_dataframe natcap#1327

update usages of read_csv_to_dataframe to new api natcap#1327

9e07853

remove build_lookup_from_csv and consolidate into read_csv_to_dataframe

2f78045

natcap#1327

emlys commented Jun 15, 2023

View reviewed changes

emlys added 2 commits June 15, 2023 16:09

clean up and add notes to history natcap#1327

8d61c18

verify unique keys when setting index natcap#1327

bbef555

emlys requested a review from phargogh June 16, 2023 00:02

emlys marked this pull request as ready for review June 16, 2023 00:02

emlys self-assigned this Jun 16, 2023

phargogh requested changes Jun 19, 2023

View reviewed changes

emlys added 5 commits June 20, 2023 12:47

escape first newline rather than calling .strip() natcap#1327

246e94d

clarify encoding error natcap#1327

58a2109

rename to_lower arguments to clarify that they are boolean natcap#1327

5d39781

rename to_lower args natcap#1327

f545a6e

skip transition table rows that are part of the legend natcap#1327

b0e7548

emlys requested a review from phargogh June 20, 2023 20:47

missed some to_lower args natcap#1327

8c45efc

phargogh mentioned this pull request Jun 22, 2023

Remove the CBC legend #1336

Open

4 tasks

phargogh approved these changes Jun 22, 2023

View reviewed changes

emlys merged commit 1a6e314 into natcap:main Jun 22, 2023

phargogh mentioned this pull request Dec 5, 2023

HRA criteria table loading fails cryptically when CSV has a BOM #1460

Closed

emlys deleted the feature/1327 branch October 3, 2024 23:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `build_lookup_from_csv` and consolidate into `read_csv_to_dataframe` #1334

Remove `build_lookup_from_csv` and consolidate into `read_csv_to_dataframe` #1334

emlys commented Jun 15, 2023 •

edited

Loading

emlys Jun 15, 2023

phargogh Jun 19, 2023

emlys Jun 20, 2023

phargogh Jun 22, 2023

phargogh left a comment •

edited

Loading

phargogh Jun 19, 2023

emlys Jun 20, 2023

phargogh Jun 22, 2023

phargogh Jun 19, 2023

emlys Jun 20, 2023

phargogh Jun 19, 2023

emlys Jun 20, 2023

phargogh Jun 19, 2023

phargogh Jun 19, 2023

emlys Jun 20, 2023

phargogh Jun 19, 2023

emlys Jun 20, 2023

phargogh Jun 22, 2023

phargogh left a comment

emlys commented Jun 22, 2023

		cols_to_lower (bool): if True, convert all column names to lowercase
		vals_to_lower (bool): if True, convert all table values to lowercase

Remove build_lookup_from_csv and consolidate into read_csv_to_dataframe #1334

Remove build_lookup_from_csv and consolidate into read_csv_to_dataframe #1334

Conversation

emlys commented Jun 15, 2023 • edited Loading

Description

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phargogh left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phargogh left a comment

Choose a reason for hiding this comment

emlys commented Jun 22, 2023

Remove `build_lookup_from_csv` and consolidate into `read_csv_to_dataframe` #1334

Remove `build_lookup_from_csv` and consolidate into `read_csv_to_dataframe` #1334

emlys commented Jun 15, 2023 •

edited

Loading

phargogh left a comment •

edited

Loading