Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop raising RuntimeError when setting download=True (#8637) #8638

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 8 additions & 10 deletions torchvision/datasets/imagenette.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from PIL import Image

from .folder import find_classes, make_dataset
from .utils import download_and_extract_archive, verify_str_arg
from .utils import check_integrity, download_and_extract_archive, verify_str_arg
from .vision import VisionDataset


Expand Down Expand Up @@ -66,8 +66,8 @@ def __init__(

if download:
self._download()
elif not self._check_exists():
raise RuntimeError("Dataset not found. You can use download=True to download it.")
elif not self._check_integrity():
raise RuntimeError("Dataset not found or corrupted. You can use download=True to download it")

self.wnids, self.wnid_to_idx = find_classes(self._image_root)
self.classes = [self._WNID_TO_CLASS[wnid] for wnid in self.wnids]
Expand All @@ -76,15 +76,13 @@ def __init__(
}
self._samples = make_dataset(self._image_root, self.wnid_to_idx, extensions=".jpeg")

def _check_exists(self) -> bool:
return self._size_root.exists()
def _check_integrity(self) -> bool:
return check_integrity(self._size_root.with_suffix(".tgz"), self._md5)

def _download(self):
if self._check_exists():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @imkasen .

@pmeier do you remember why we're erroring instead of just doing an early return like in other datasets e.g. clevr ?

def _download(self) -> None:
if self._check_exists():
return

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea was to raise on download=True if we can't properly verify the integrity. For example, after a folder is extracted, there is no way for us to know if the images inside this folder have not been tempered with.

Contrast this with datasets like MNIST or CIFAR, where we have checksums for each file on record. Here we can properly check the integrity and thus there is no need to raise on download=True.

Now, both CLEVRClassification and Imagenette fall into the first category. I don't know why we didn't follow our own "rule" for the former, but it is likely too late to change it now. However, I would treat "relaxing" the rule for the latter with caution. I don't think anything has changed that warrants walking back on our previous decision. However, if you do, you probably want to do it for all datasets to keep things consistent.

raise RuntimeError(
f"The directory {self._size_root} already exists. "
f"If you want to re-download or re-extract the images, delete the directory."
)
if self._check_integrity():
print("Files already downloaded and verified")
return

download_and_extract_archive(self._url, self.root, md5=self._md5)

Expand Down