Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove DataFrames completely #1035

Merged
merged 1 commit into from
Feb 26, 2025
Merged

Remove DataFrames completely #1035

merged 1 commit into from
Feb 26, 2025

Conversation

abelsiqueira
Copy link
Member

@abelsiqueira abelsiqueira commented Feb 20, 2025

Now that DuckDB is being used almost everywhere, we can finish the job and remove DataFrames.
This mostly means:

  • TulipaVariables (and friends) field indices is now a DuckDB thing. To loop over it we don't need to use for row in eachrow(indices), just for row in indices.
  • Some things are harder to do, since they must use SQL now. Namely, subsets.
  • There should be some memory and speed improvement.

It should also be possible to do the following improvements, which can be considered "nice to have":

  • Don't even store indices, instead store connection and when it is desirable to loop, call the query to create the looping object. I'm not sure this is actually necessary or useful, because I believe the DuckDB indices will never be store materialised. So in practice it should be the same efficiency/memory use.
  • Make loops look nicer by creating an API to loop over the rows and the container/expressions at the same time. For instance, instead of doing
    [... for (row, expr1, expr2) in zip(cons.indices, cons.expressions[:expr1], cons.expressions[:expr2])]
    
    we can create a function to wrap that:
    [... for (row, expr1, expr2) in loop_over(cons, with_expr[:expr1, :expr2])
    
    Some thought is necessary to ensure (some) type stability.

Related issues

Related to #701

Checklist

  • I am following the contributing guidelines
  • Tests are passing
  • Lint workflow is passing
  • Docs were updated and workflow is passing

@abelsiqueira abelsiqueira added the benchmark PR only - Run benchmark on PR label Feb 20, 2025
Copy link
Contributor

github-actions bot commented Feb 20, 2025

Benchmark Results

68c3bcc... f74f0d5... 68c3bcc... / f74f0d5...
energy_problem/create_model 36 ± 2 s 35.7 ± 3.5 s 1.01
energy_problem/input_and_constructor 27.8 ± 0.17 s 27.7 ± 0.068 s 1.01
time_to_load 2.62 ± 0.023 s 2.66 ± 0.035 s 0.987
68c3bcc... f74f0d5... 68c3bcc... / f74f0d5...
energy_problem/create_model 0.245 G allocs: 11.9 GB 0.243 G allocs: 12.1 GB 0.98
energy_problem/input_and_constructor 30.8 M allocs: 1.18 GB 26.8 M allocs: 0.952 GB 1.24
time_to_load 0.159 k allocs: 11.2 kB 0.159 k allocs: 11.2 kB 1

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

Copy link

codecov bot commented Feb 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.86%. Comparing base (68c3bcc) to head (f74f0d5).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1035   +/-   ##
=======================================
  Coverage   97.86%   97.86%           
=======================================
  Files          29       29           
  Lines         982      985    +3     
=======================================
+ Hits          961      964    +3     
  Misses         21       21           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@abelsiqueira abelsiqueira force-pushed the 701-remove-dataframes branch 2 times, most recently from 99dfa62 to de68fd7 Compare February 24, 2025 09:04
@abelsiqueira abelsiqueira marked this pull request as ready for review February 24, 2025 09:04
WHERE asset = '$(row.asset)' AND year = $(row.year) AND rep_period = $(row.rep_period)
",
)
])::Int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is for my understanding: ::Int is here to ensure type stability since we know in advance that the result will be an Int, but Julia can't know it in advance because it comes from a SQL query. Is that correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. The draft PR #1055 will add more of these.

Copy link
Member

@datejada datejada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. The code looks cleaner, and deleting dependencies is good 😄 The code was also initially review on the Tulipa working day. I left a comment for my learning porpuse, so that it is why it is approved now.

@datejada datejada merged commit e8e59d5 into main Feb 26, 2025
7 checks passed
@datejada datejada deleted the 701-remove-dataframes branch February 26, 2025 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark PR only - Run benchmark on PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants