Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update error message when kedro-datasets is not installed or DataSet spelling is used #3952

Merged
merged 5 commits into from
Jun 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

## Bug fixes and other changes
* Updated error message for invalid catalog entries.
* Updated error message for catalog entries when the dataset class is not found with hints on how to resolve the issue.
* Fixed a bug in the `DataCatalog` `shallow_copy()` method to ensure it returns the type of the used catalog and doesn't cast it to `DataCatalog`.

## Breaking changes to the API
Expand Down
20 changes: 18 additions & 2 deletions kedro/io/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ def from_config(
except Exception as exc:
raise DatasetError(
f"An exception occurred when parsing config "
f"for dataset '{name}':\n{str(exc)}."
f"for dataset '{name}':\n{str(exc)}"
) from exc

try:
Expand Down Expand Up @@ -406,7 +406,23 @@ def parse_dataset_definition(
class_obj = tmp
break
else:
raise DatasetError(f"Class '{dataset_type}' not found, is this a typo?")
hint = ""
if "DataSet" in dataset_type:
hint = ( # pragma: no cover # To remove when we drop support for python 3.8
"Hint: If you are trying to use a dataset from `kedro-datasets`>=2.0.0, "
"make sure that the dataset name uses the `Dataset` spelling instead of `DataSet`."
)
else:
hint = (
"Hint: If you are trying to use a dataset from `kedro-datasets`, "
"make sure that the package is installed in your current environment. "
"You can do so by running `pip install kedro-datasets` or "
"`pip install kedro-datasets[<dataset-group>]` to install `kedro-datasets` along with "
"related dependencies for the specific dataset group."
)
raise DatasetError(
f"Class '{dataset_type}' not found, is this a typo?" f"\n{hint}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder:

  • if we could aid debugging by spitting out the full importlib classpath at this point
  • if we could check if users are missing an __int__.py a common issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we could check if users are missing an int.py a common issue

Is that Kedro's responsibility though?

)

if not class_obj:
class_obj = dataset_type
Expand Down
20 changes: 19 additions & 1 deletion tests/io/test_data_catalog.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import logging
import re
import sys
from copy import deepcopy
from datetime import datetime, timezone
from pathlib import Path
Expand Down Expand Up @@ -503,7 +504,24 @@ def test_config_missing_class(self, sane_config):

pattern = (
"An exception occurred when parsing config for dataset 'boats':\n"
"Class 'kedro.io.CSVDatasetInvalid' not found"
"Class 'kedro.io.CSVDatasetInvalid' not found, is this a typo?"
)
with pytest.raises(DatasetError, match=re.escape(pattern)):
DataCatalog.from_config(**sane_config)

@pytest.mark.skipif(
sys.version_info < (3, 9),
reason="for python 3.8 kedro-datasets version 1.8 is used which has the old spelling",
)
def test_config_incorrect_spelling(self, sane_config):
"""Check hint if the type uses the old DataSet spelling"""
sane_config["catalog"]["boats"]["type"] = "pandas.CSVDataSet"

pattern = (
"An exception occurred when parsing config for dataset 'boats':\n"
"Class 'pandas.CSVDataSet' not found, is this a typo?"
"\nHint: If you are trying to use a dataset from `kedro-datasets`>=2.0.0,"
" make sure that the dataset name uses the `Dataset` spelling instead of `DataSet`."
)
with pytest.raises(DatasetError, match=re.escape(pattern)):
DataCatalog.from_config(**sane_config)
Expand Down