Remove DataFrames completely #1035

abelsiqueira · 2025-02-20T14:40:39Z

Now that DuckDB is being used almost everywhere, we can finish the job and remove DataFrames.
This mostly means:

TulipaVariables (and friends) field indices is now a DuckDB thing. To loop over it we don't need to use for row in eachrow(indices), just for row in indices.
Some things are harder to do, since they must use SQL now. Namely, subsets.
There should be some memory and speed improvement.

It should also be possible to do the following improvements, which can be considered "nice to have":

Don't even store indices, instead store connection and when it is desirable to loop, call the query to create the looping object. I'm not sure this is actually necessary or useful, because I believe the DuckDB indices will never be store materialised. So in practice it should be the same efficiency/memory use.
Make loops look nicer by creating an API to loop over the rows and the container/expressions at the same time. For instance, instead of doing
```
[... for (row, expr1, expr2) in zip(cons.indices, cons.expressions[:expr1], cons.expressions[:expr2])]
```
we can create a function to wrap that:
```
[... for (row, expr1, expr2) in loop_over(cons, with_expr[:expr1, :expr2])
```
Some thought is necessary to ensure (some) type stability.

Related issues

Related to #701

Checklist

I am following the contributing guidelines
Tests are passing
Lint workflow is passing
Docs were updated and workflow is passing

github-actions · 2025-02-20T14:41:01Z

Benchmark Results

	`68c3bcc`...	`f74f0d5`...	`68c3bcc`... / `f74f0d5`...
energy_problem/create_model	36 ± 2 s	35.7 ± 3.5 s	1.01
energy_problem/input_and_constructor	27.8 ± 0.17 s	27.7 ± 0.068 s	1.01
time_to_load	2.62 ± 0.023 s	2.66 ± 0.035 s	0.987

	`68c3bcc`...	`f74f0d5`...	`68c3bcc`... / `f74f0d5`...
energy_problem/create_model	0.245 G allocs: 11.9 GB	0.243 G allocs: 12.1 GB	0.98
energy_problem/input_and_constructor	30.8 M allocs: 1.18 GB	26.8 M allocs: 0.952 GB	1.24
time_to_load	0.159 k allocs: 11.2 kB	0.159 k allocs: 11.2 kB	1

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

codecov · 2025-02-20T14:48:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.86%. Comparing base (68c3bcc) to head (f74f0d5).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1035   +/-   ##
=======================================
  Coverage   97.86%   97.86%           
=======================================
  Files          29       29           
  Lines         982      985    +3     
=======================================
+ Hits          961      964    +3     
  Misses         21       21

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

datejada · 2025-02-26T11:22:52Z

src/constraints/storage.jl

+                                    WHERE asset = '$(row.asset)' AND year = $(row.year) AND rep_period = $(row.rep_period)
+                                    ",
+                                )
+                            ])::Int


This comment is for my understanding: ::Int is here to ensure type stability since we know in advance that the result will be an Int, but Julia can't know it in advance because it comes from a SQL query. Is that correct?

Exactly. The draft PR #1055 will add more of these.

datejada

Thanks for the PR. The code looks cleaner, and deleting dependencies is good 😄 The code was also initially review on the Tulipa working day. I left a comment for my learning porpuse, so that it is why it is approved now.

abelsiqueira added the benchmark PR only - Run benchmark on PR label Feb 20, 2025

abelsiqueira force-pushed the 701-remove-dataframes branch 2 times, most recently from 99dfa62 to de68fd7 Compare February 24, 2025 09:04

abelsiqueira marked this pull request as ready for review February 24, 2025 09:04

Remove DataFrames completely

f74f0d5

abelsiqueira force-pushed the 701-remove-dataframes branch from de68fd7 to f74f0d5 Compare February 26, 2025 08:45

abelsiqueira requested a review from datejada February 26, 2025 09:06

datejada reviewed Feb 26, 2025

View reviewed changes

datejada approved these changes Feb 26, 2025

View reviewed changes

datejada merged commit e8e59d5 into main Feb 26, 2025
7 checks passed

datejada deleted the 701-remove-dataframes branch February 26, 2025 11:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove DataFrames completely #1035

Remove DataFrames completely #1035

abelsiqueira commented Feb 20, 2025 •

edited

Loading

github-actions bot commented Feb 20, 2025 •

edited

Loading

codecov bot commented Feb 20, 2025 •

edited

Loading

datejada Feb 26, 2025

abelsiqueira Feb 26, 2025

datejada left a comment

Remove DataFrames completely #1035

Remove DataFrames completely #1035

Conversation

abelsiqueira commented Feb 20, 2025 • edited Loading

Related issues

Checklist

github-actions bot commented Feb 20, 2025 • edited Loading

Benchmark Results

Benchmark Plots

codecov bot commented Feb 20, 2025 • edited Loading

Codecov Report

datejada Feb 26, 2025

Choose a reason for hiding this comment

abelsiqueira Feb 26, 2025

Choose a reason for hiding this comment

datejada left a comment

Choose a reason for hiding this comment

abelsiqueira commented Feb 20, 2025 •

edited

Loading

github-actions bot commented Feb 20, 2025 •

edited

Loading

codecov bot commented Feb 20, 2025 •

edited

Loading