asv comparison tests pandas dataframe memory creation vs pandas.dataframe read/write and arcticdb_lmdb read/writes #2137

grusev · 2025-01-23T16:11:26Z

Reference Issues/PRs

Those asv tests aim to compare memory usage between above mentioned operations. Seeing all of them on one graph could tell us how efficient we are in memory management.

If we consider that we will have also the ability to make same tests for Polars and Amazon S3 with time, such benchmarks could provide insights for our performance

         OUTPUT -------->
         DF generated 88.47616171836853                                                                                                 ok

[20.00%] ··· comparison_benchmarks.ComparisonBenchmarks.peakmem_create_dataframe 3.16G
[40.00%] ··· comparison_benchmarks.ComparisonBenchmarks.peakmem_read_dataframe_arctic 4.95G
[60.00%] ··· comparison_benchmarks.ComparisonBenchmarks.peakmem_read_dataframe_parquet 5.4G
[80.00%] ··· comparison_benchmarks.ComparisonBenchmarks.peakmem_write_dataframe_arctic 3.45G
[100.00%] ··· comparison_benchmarks.ComparisonBenchmarks.peakmem_write_dataframe_parquet 2.77G

What does this implement or fix?

Any other comments?

Checklist

Checklist for code changes...

Have you updated the relevant docstrings, documentation and copyright notice?
Is this contribution tested against all ArcticDB's features?
Do all exceptions introduced raise appropriate error messages?
Are API changes highlighted in the PR description?
Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

G-D-Petrov

LGTM

poodlewars · 2025-02-19T17:11:08Z

python/benchmarks/comparison_benchmarks.py

+    SYMBOL = "dataframe"
+    NUMBER_ROWS = 3000000
+
+    def setup_cache(self):


I think this should just return the df given that the df and the dict are really the same thing

The return of dict is needed for dataframe creation test, which is important for understanging all later numbers and their dynamics, thus this cannot be avoided

poodlewars · 2025-02-19T17:12:49Z

python/benchmarks/comparison_benchmarks.py

+            "dtype" : str_col(10, size),
+        }       
+
+    def peakmem_create_dataframe(self, tpl):


I get the idea with this but it's odd for us to have a benchmark that we have no control over - would be very odd for a Pandas regression to block our PRs.

Creation of the dataframe is basis for comparison how effective we will be on other graphs. Also it shows that although create and write done separetly have certain mem requirements, the mem requirements of both of them are not equal to the sum of each one. (same for read/write ets.

I would not say block, but warn and require action if the memory usage suddently grows. Why it grew? IS it ok? wtc. Such questions will pop in our heads which otherwise will not ... So see this as a feedback loop, that will remind us to thinl when something unusual happens.

Overall seing all processes in one test and graph will make a difference over time as this coul improve our knowledge and ways of management performance and requirements

initial test

08885b4

grusev requested review from alexowens90, willdealtry and poodlewars as code owners January 23, 2025 16:11

removed unneeded assignent

bc150e6

G-D-Petrov approved these changes Feb 18, 2025

View reviewed changes

Merge branch 'master' into asv_test_comparison

7aead1f

poodlewars reviewed Feb 19, 2025

View reviewed changes

poodlewars approved these changes Feb 19, 2025

View reviewed changes

grusev merged commit 71ebe0b into master Feb 24, 2025
151 of 152 checks passed

grusev deleted the asv_test_comparison branch February 24, 2025 07:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

asv comparison tests pandas dataframe memory creation vs pandas.dataframe read/write and arcticdb_lmdb read/writes #2137

asv comparison tests pandas dataframe memory creation vs pandas.dataframe read/write and arcticdb_lmdb read/writes #2137

grusev commented Jan 23, 2025 •

edited

Loading

G-D-Petrov left a comment

poodlewars Feb 19, 2025

grusev Feb 24, 2025

poodlewars Feb 19, 2025

grusev Feb 24, 2025

asv comparison tests pandas dataframe memory creation vs pandas.dataframe read/write and arcticdb_lmdb read/writes #2137

asv comparison tests pandas dataframe memory creation vs pandas.dataframe read/write and arcticdb_lmdb read/writes #2137

Conversation

grusev commented Jan 23, 2025 • edited Loading

Reference Issues/PRs

What does this implement or fix?

Any other comments?

Checklist

G-D-Petrov left a comment

Choose a reason for hiding this comment

poodlewars Feb 19, 2025

Choose a reason for hiding this comment

grusev Feb 24, 2025

Choose a reason for hiding this comment

poodlewars Feb 19, 2025

Choose a reason for hiding this comment

grusev Feb 24, 2025

Choose a reason for hiding this comment

grusev commented Jan 23, 2025 •

edited

Loading