Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cache for dask.dataframe #305

Merged
merged 7 commits into from
Aug 24, 2022
Merged

Conversation

hoxbro
Copy link
Member

@hoxbro hoxbro commented Aug 24, 2022

When running the bikes example, I got the following errors:

2022-08-24 10:12:20,755 Could not cache 'occupancy' to parquet file. Error during saving process: [Errno 21] Failed to open local file '/home/shh/Development/holoviz/repos/lumen/examples/bikes/cache/occupancy.parq'. Detail: [errno 21] Is a directory

This PR does three things:

  1. Trying to debug this, I found it confusing with the combination of os.path and pathlib.Path, so for sources/base.py, I converted it all to Path. I also set root up as a param, to always have it as Path.

  2. The extension of the bikes URL https://api.tfl.gov.uk/Occupancy/BikePoints/@{stations.stations.id}?app_key=a1c692de000b4944af55f59d8e849915 was before this PR id}?app_key=a1c692de000b4944af55f59d8e849915 and with this PR None

  3. dd.to_parquet in _set_cache first created a directory named occupancy.parq with part.0.parquet files. This means that the final result file occupancy.parq could not be saved as it was already a directory. This meant that the cache failed, gave the error message above and removed all the cache, which is not desired. I do it now by removing the extension if data is a dask.DataFrame. When loading in cache in _get_cache I check for directories with and without the extension

@codecov-commenter
Copy link

codecov-commenter commented Aug 24, 2022

Codecov Report

Merging #305 (c5ab64f) into master (fdfdc6e) will increase coverage by 0.26%.
The diff coverage is 74.35%.

@@            Coverage Diff             @@
##           master     #305      +/-   ##
==========================================
+ Coverage   66.04%   66.31%   +0.26%     
==========================================
  Files          59       59              
  Lines        6147     6189      +42     
==========================================
+ Hits         4060     4104      +44     
+ Misses       2087     2085       -2     
Impacted Files Coverage Δ
lumen/sources/base.py 62.65% <72.22%> (+0.92%) ⬆️
lumen/tests/sources/test_base.py 99.05% <100.00%> (+0.18%) ⬆️
lumen/tests/sources/test_derived.py 100.00% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@philippjfr philippjfr merged commit f94ee65 into holoviz:master Aug 24, 2022
@hoxbro hoxbro deleted the cache_parquet_dask branch August 24, 2022 12:40
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants