generated from datalad/datalad-extension-template
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft for Issue 183 list collection #342
Closed
christian-monch
wants to merge
30
commits into
datalad:main
from
christian-monch:issue-183-list-collection
Closed
Draft for Issue 183 list collection #342
christian-monch
wants to merge
30
commits into
datalad:main
from
christian-monch:issue-183-list-collection
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
For now just downloads via unauthenticated connections. The included code enables downloads of selected archived content, without having to download the entire archive first. See datalad/datalad#373 For ZIP and TAR archives this is hooked into `AnyUrlOperations` and thereby accessible via the `datalad download` command. Demo: ```sh ❯ datalad download 'zip://datalad-datalad-cddbe22/requirements-devel.txt::https://zenodo.org/record/7497306/files/datalad/datalad-0.18.0.zip?download=1 -' # Theoretically we don't want -e here but ATM pip would puke if just .[full] is provided # Since we use requirements.txt ATM only for development IMHO it is ok but # we need to figure out/complaint to pip folks -e .[devel] ❯ datalad download 'tar://datalad-0.18.0/requirements-devel.txt::https://files.pythonhosted.org/packages/dd/5e/9be11886ef4c3c64e78a8cdc3f9ac3f27d2dac403a6337d5685cd5686770/datalad-0.18.0.tar.gz -' # Theoretically we don't want -e here but ATM pip would puke if just .[full] is provided # Since we use requirements.txt ATM only for development IMHO it is ok but # we need to figure out/complaint to pip folks -e .[devel] ``` As demo'ed in the code, dependening on the capabilities of the particular filesystem abstraction it needs custom handling of the actual download process after `open()` was called.
This includes, but is not limited to, being able to specify whether or not anonymous access should be attempted (first). This change paves the way for endpoint customizations and anything else that FSSPEC exposes. Moreover, when an explicit `credential` identifier is given, the boto-based attempt to locate credentials or try anonymous access first is skipped, and a credential is looked up and provisioned immediately.
This facilitates exploitation of the numerous filesystem specific features provided by FSSPEC.
This changes demos the utility of this feature for the FSSPEC based access to S3 resources. By default anonymous access is attempted, but additional handlers could change this behavior (for particular S3 targets).
Requires particular permissions and comes with a potential performance penalty.
They are mostly needed for the tests (and more were needed), and we want to keep the actually core-dependencies small.
See docs inside. At present it is unclear how relevant this will be in practice. However, different filesystem caching settings impact performance substantially (empricial observation). If caching is turned off entirely, this parameter is the only way to specify chunk sizes (for reading). Hence I consider it a useful thing to have in general -- and it is cheap. This change also flips the default download method from iteration over a file pointer to reading chunks of a specific size. Ths has been found to be more performant in some cases.
Such an explicit option conflicts with processing of S3 URL that specify a version explicitly.
Expand test coverage too
When working with S3 URLs, s3fs employs internal URL parsing to detect whether the S3 file has versioning enabled. Based on the outcome of this evaluation, it sets the version_aware value itself. If a user sets this keys' value to False in fs_kwargs of the URL handler, but then supplies a versioned URL, s3fs's attempt to set this key to True is sabotaged, and the download crashes: datalad_next.url_operations.UrlOperationsRemoteError: UrlOperationsRemoteError for 's3://mslw-datalad-test0-versioned/3versions-allversioned.txt?versionId=Tro_UjqVFJfr32v5tuPfjwtOzeqYCxi2' To leave a trace of this, even if only temporary, this commit adds a short paragraph to the URL handler's docstring to warn about setting this key.
Mostly to minimize the diff for conflict avoidance, after a lot of typos were fixed via datalad#315
Get the updates to -core for the CI
If the props variable does not get populated during _get_fs(), access to the URL has failed for some reason. Further processing of props in download will lead to crashes. This change adds error handling to raise early and more informative
This is an MVP (minimal viable product) which will be extended in the next commits. It currently works on fsspec-URLs, but does not yet support credentials. The type overwrite is also not implemented yet.
13 tasks
Given that #183 is resolved, I am closing this PR. Further work can focus on Thx! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A draft for issue #183
-[ ] add credential management
-[ ] add tests