Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fuse index runend decoding - take_from #2527

Merged
merged 17 commits into from
Feb 28, 2025
Merged

Conversation

0ax1
Copy link
Member

@0ax1 0ax1 commented Feb 26, 2025

develop:

run_end_compress    fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ take_indices                   │               │               │               │         │
   ├─ (1000, 4)     48.45 µs      │ 134.2 µs      │ 49.47 µs      │ 50.7 µs       │ 100     │ 100
   ├─ (1000, 16)    12.99 µs      │ 19.66 µs      │ 13.37 µs      │ 13.44 µs      │ 100     │ 100
   ├─ (1000, 256)   1.041 µs      │ 1.864 µs      │ 1.082 µs      │ 1.091 µs      │ 100     │ 400
   ├─ (10000, 4)    13.19 ms      │ 16.67 ms      │ 13.47 ms      │ 13.69 ms      │ 100     │ 100
   ├─ (10000, 16)   1.845 ms      │ 3.709 ms      │ 1.91 ms       │ 1.958 ms      │ 100     │ 100
   ╰─ (10000, 256)  124.3 µs      │ 182.6 µs      │ 127.5 µs      │ 131.9 µs      │ 100     │ 100

new:

run_end_compress    fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ take_indices                   │               │               │               │         │
   ├─ (1000, 4)     457.7 ns      │ 170.5 µs      │ 540.7 ns      │ 2.238 µs      │ 100     │ 100
   ├─ (1000, 16)    405.9 ns      │ 2.192 µs      │ 438.4 ns      │ 492.4 ns      │ 100     │ 1600
   ├─ (1000, 256)   382.5 ns      │ 567.3 ns      │ 403.3 ns      │ 424 ns        │ 100     │ 1600
   ├─ (10000, 4)    1.166 µs      │ 11.7 µs       │ 1.208 µs      │ 1.33 µs       │ 100     │ 100
   ├─ (10000, 16)   582.7 ns      │ 21.66 µs      │ 624.7 ns      │ 848.8 ns      │ 100     │ 100
   ╰─ (10000, 256)  374.7 ns      │ 5.958 µs      │ 457.7 ns      │ 530.5 ns      │ 100     │ 100

@0ax1 0ax1 changed the title feat: fuse dict + runend decoding feat: fuse dict + runend decoding - take_from Feb 26, 2025
@0ax1 0ax1 added the benchmark Run benchmarks on this branch label Feb 26, 2025
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Feb 26, 2025
Copy link
Contributor

github-actions bot commented Feb 26, 2025

Benchmarks: random_access

Table of Results
name PR 432f061 base c016864 ratio (PR/base) unit
random-access/vortex-tokio-local-disk 2247087 2514950 0.893492 ns
random-access/parquet-tokio-local-disk 284248621 272406148 1.04347 ns

Copy link
Contributor

github-actions bot commented Feb 26, 2025

Benchmarks: TPC-H on NVME

Table of Results
name PR 432f061 base c016864 ratio (PR/base) unit
tpch_q01/arrow 79473500 72484638 1.09642 ns
tpch_q02/arrow 41419294 41468906 0.998804 ns
tpch_q03/arrow 33793370 33572101 1.00659 ns
tpch_q04/arrow 30717082 29909325 1.02701 ns
tpch_q05/arrow 57032966 56712127 1.00566 ns
tpch_q06/arrow 8246987 8195817 1.00624 ns
tpch_q07/arrow 97812855 98447593 0.993553 ns
tpch_q08/arrow 57208160 57768306 0.990304 ns
tpch_q09/arrow 84109987 82140953 1.02397 ns
tpch_q10/arrow 54886587 53424065 1.02738 ns
tpch_q11/arrow 25348370 25221354 1.00504 ns
tpch_q12/arrow 31555914 31034778 1.01679 ns
tpch_q13/arrow 24967747 24974433 0.999732 ns
tpch_q14/arrow 11683401 11425077 1.02261 ns
tpch_q15/arrow 24329934 24491588 0.9934 ns
tpch_q16/arrow 21521513 21691609 0.992158 ns
tpch_q17/arrow 79868330 79466958 1.00505 ns
tpch_q18/arrow 156541240 157896124 0.991419 ns
tpch_q19/arrow 26057754 26193202 0.994829 ns
tpch_q20/arrow 35981745 36049661 0.998116 ns
tpch_q21/arrow 156842612 157520842 0.995694 ns
tpch_q22/arrow 15700628 16782483 0.935537 ns
tpch_q01/parquet 154387810 152009742 1.01564 ns
tpch_q02/parquet 95449832 93291810 1.02313 ns
tpch_q03/parquet 106058113 111568157 0.950613 ns
tpch_q04/parquet 59693675 60544089 0.985954 ns
tpch_q05/parquet 113488894 113421330 1.0006 ns
tpch_q06/parquet 30044128 28309153 1.06129 ns
tpch_q07/parquet 146960497 146271613 1.00471 ns
tpch_q08/parquet 152580995 150508394 1.01377 ns
tpch_q09/parquet 197111787 201700770 0.977249 ns
tpch_q10/parquet 158968637 160301231 0.991687 ns
tpch_q11/parquet 46174183 45324981 1.01874 ns
tpch_q12/parquet 84271685 83713451 1.00667 ns
tpch_q13/parquet 182126463 177544498 1.02581 ns
tpch_q14/parquet 47955961 48993256 0.978828 ns
tpch_q15/parquet 83138236 84519661 0.983656 ns
tpch_q16/parquet 48093147 45357038 1.06032 ns
tpch_q17/parquet 146082886 140160225 1.04226 ns
tpch_q18/parquet 242315178 244781450 0.989925 ns
tpch_q19/parquet 82815402 83980629 0.986125 ns
tpch_q20/parquet 97060613 103099335 0.941428 ns
tpch_q21/parquet 208525877 208260320 1.00128 ns
tpch_q22/parquet 51030277 50693032 1.00665 ns
tpch_q01/vortex-file-compressed 60813623 60039260 1.0129 ns
tpch_q02/vortex-file-compressed 46299599 46994153 0.98522 ns
tpch_q03/vortex-file-compressed 32302596 33259701 0.971223 ns
tpch_q04/vortex-file-compressed 20761373 21032350 0.987116 ns
tpch_q05/vortex-file-compressed 53220730 53571806 0.993447 ns
tpch_q06/vortex-file-compressed 10199134 11326111 0.900497 ns
tpch_q07/vortex-file-compressed 83197810 88140296 0.943925 ns
tpch_q08/vortex-file-compressed 61595772 62282597 0.988972 ns
tpch_q09/vortex-file-compressed 87082321 88322838 0.985955 ns
tpch_q10/vortex-file-compressed 54879321 56741342 0.967184 ns
tpch_q11/vortex-file-compressed 24397016 24545305 0.993959 ns
tpch_q12/vortex-file-compressed 29097826 29606014 0.982835 ns
tpch_q13/vortex-file-compressed 30423057 31416100 0.968391 ns
tpch_q14/vortex-file-compressed 14953812 14408411 1.03785 ns
tpch_q15/vortex-file-compressed 28010118 27653034 1.01291 ns
tpch_q16/vortex-file-compressed 24888510 24380482 1.02084 ns
tpch_q17/vortex-file-compressed 76976946 76030611 1.01245 ns
tpch_q18/vortex-file-compressed 138224743 140841385 0.981421 ns
tpch_q19/vortex-file-compressed 30832224 30573475 1.00846 ns
tpch_q20/vortex-file-compressed 38518068 39950358 0.964148 ns
tpch_q21/vortex-file-compressed 124199704 124582527 0.996927 ns
tpch_q22/vortex-file-compressed 29429920 27395798 1.07425 ns

Copy link
Contributor

github-actions bot commented Feb 26, 2025

Benchmarks: TPC-H on S3

Table of Results
name PR c6acd11 base 2b895da ratio (PR/base) unit
tpch_q01/parquet 299165958 303526306 0.985634 ns
tpch_q02/parquet 721448215 757737031 0.952109 ns
tpch_q03/parquet 466482568 466547874 0.99986 ns
tpch_q04/parquet 258431550 253240122 1.0205 ns
tpch_q05/parquet 602614987 628739081 0.95845 ns
tpch_q06/parquet 195699756 196292397 0.996981 ns
tpch_q07/parquet 695451496 692150033 1.00477 ns
tpch_q08/parquet 861276206 921106369 0.935045 ns
tpch_q09/parquet 728958349 771957331 0.944299 ns
tpch_q10/parquet 588091724 598664005 0.98234 ns
tpch_q11/parquet 316666504 304983179 1.03831 ns
tpch_q12/parquet 294670583 287020063 1.02666 ns
tpch_q13/parquet 424608993 429007163 0.989748 ns
tpch_q14/parquet 279466688 285805817 0.97782 ns
tpch_q15/parquet 609976183 516806944 1.18028 ns
tpch_q16/parquet 510951023 278542356 1.83437 ns
tpch_q17/parquet 435972501 448815823 0.971384 ns
tpch_q18/parquet 625128065 630123225 0.992073 ns
tpch_q19/parquet 314835616 324176012 0.971187 ns
tpch_q20/parquet 638799495 580311362 1.10079 ns
tpch_q21/parquet 736581631 714314340 1.03117 ns
tpch_q22/parquet 308458833 309289122 0.997315 ns
tpch_q01/vortex-file-compressed 314059126 336994275 0.931942 ns
tpch_q02/vortex-file-compressed 459386716 427468475 1.07467 ns
tpch_q03/vortex-file-compressed 453349997 475435042 0.953548 ns
tpch_q04/vortex-file-compressed 393057378 385965537 1.01837 ns
tpch_q05/vortex-file-compressed 519080458 521416161 0.99552 ns
tpch_q06/vortex-file-compressed 392871403 391249744 1.00414 ns
tpch_q07/vortex-file-compressed 615157961 625235333 0.983882 ns
tpch_q08/vortex-file-compressed 777812256 756079162 1.02874 ns
tpch_q09/vortex-file-compressed 644176493 658079302 0.978874 ns
tpch_q10/vortex-file-compressed 537967188 529821960 1.01537 ns
tpch_q11/vortex-file-compressed 168990719 174567151 0.968056 ns
tpch_q12/vortex-file-compressed 509603194 522265813 0.975754 ns
tpch_q13/vortex-file-compressed 173432546 156029969 1.11153 ns
tpch_q14/vortex-file-compressed 298196119 294316386 1.01318 ns
tpch_q15/vortex-file-compressed 708870708 734388037 0.965254 ns
tpch_q16/vortex-file-compressed 195641020 218318494 0.896127 ns
tpch_q17/vortex-file-compressed 397992563 390224219 1.01991 ns
tpch_q18/vortex-file-compressed 442962409 448725716 0.987156 ns
tpch_q19/vortex-file-compressed 468988301 456845513 1.02658 ns
tpch_q20/vortex-file-compressed 520765159 560515874 0.929082 ns
tpch_q21/vortex-file-compressed 1134851707 951594105 1.19258 ns
tpch_q22/vortex-file-compressed 176773689 168227044 1.0508 ns

Copy link
Contributor

github-actions bot commented Feb 26, 2025

Benchmarks: Clickbench on NVME

Table of Results
name PR 432f061 base c016864 ratio (PR/base) unit
clickbench_q00/parquet 2117578 2.1134e+06 1.00198 ns
clickbench_q01/parquet 60138031 6.04364e+07 0.995063 ns
clickbench_q02/parquet 117907831 1.16133e+08 1.01528 ns
clickbench_q03/parquet 86050804 8.49422e+07 1.01305 ns
clickbench_q04/parquet 627749546 6.54593e+08 0.958992 ns
clickbench_q05/parquet 703838547 7.04848e+08 0.998568 ns
clickbench_q06/parquet 2241696 2.2212e+06 1.00923 ns
clickbench_q07/parquet 60375010 6.27454e+07 0.962222 ns
clickbench_q08/parquet 722420365 7.23244e+08 0.998861 ns
clickbench_q09/parquet 1016350736 1.00024e+09 1.01611 ns
clickbench_q10/parquet 266265409 2.60953e+08 1.02036 ns
clickbench_q11/parquet 309636224 3.09383e+08 1.00082 ns
clickbench_q12/parquet 746023076 7.29489e+08 1.02266 ns
clickbench_q13/parquet 1004542365 9.87917e+08 1.01683 ns
clickbench_q14/parquet 746573210 7.20223e+08 1.03659 ns
clickbench_q15/parquet 733641938 7.11853e+08 1.03061 ns
clickbench_q16/parquet 1551677082 1.55618e+09 0.997107 ns
clickbench_q17/parquet 1464558002 1.4227e+09 1.02942 ns
clickbench_q18/parquet 3046386532 3.00869e+09 1.01253 ns
clickbench_q19/parquet 68404707 7.00286e+07 0.976811 ns
clickbench_q20/parquet 1126297259 1.08189e+09 1.04105 ns
clickbench_q21/parquet 1324783779 1.20572e+09 1.09875 ns
clickbench_q22/parquet 1892536985 1.87274e+09 1.01057 ns
clickbench_q23/parquet 7660826524 7.71696e+09 0.992726 ns
clickbench_q24/parquet 445272932 4.48474e+08 0.992862 ns
clickbench_q25/parquet 394506562 3.93771e+08 1.00187 ns
clickbench_q26/parquet 493933782 4.9561e+08 0.996618 ns
clickbench_q27/parquet 1581291703 1.54852e+09 1.02116 ns
clickbench_q28/parquet 11373506751 1.1317e+10 1.00499 ns
clickbench_q29/parquet 443518332 4.45541e+08 0.995459 ns
clickbench_q30/parquet 688441144 6.85583e+08 1.00417 ns
clickbench_q31/parquet 727450569 7.42684e+08 0.979489 ns
clickbench_q32/parquet 2764814770 2.86926e+09 0.963597 ns
clickbench_q33/parquet 2807913206 2.86969e+09 0.978472 ns
clickbench_q34/parquet 2763191635 2.75617e+09 1.00255 ns
clickbench_q35/parquet 852296768 8.72537e+08 0.976803 ns
clickbench_q36/parquet 177087378 1.76614e+08 1.00268 ns
clickbench_q37/parquet 87208203 8.73352e+07 0.998545 ns
clickbench_q38/parquet 107315284 1.1055e+08 0.970743 ns
clickbench_q39/parquet 316275967 3.15536e+08 1.00234 ns
clickbench_q40/parquet 53685899 5.55512e+07 0.966422 ns
clickbench_q41/parquet 52684192 5.2884e+07 0.996221 ns
clickbench_q42/parquet 68446676 7.22164e+07 0.9478 ns
clickbench_q00/vortex-file-compressed 4644791 4.43821e+06 1.04655 ns
clickbench_q01/vortex-file-compressed 24041355 2.31001e+07 1.04075 ns
clickbench_q02/vortex-file-compressed 42356303 4.13302e+07 1.02483 ns
clickbench_q03/vortex-file-compressed 63335167 6.33974e+07 0.999018 ns
clickbench_q04/vortex-file-compressed 573513066 5.76816e+08 0.994275 ns
clickbench_q05/vortex-file-compressed 592605608 6.07165e+08 0.97602 ns
clickbench_q06/vortex-file-compressed 5138989 4.59718e+06 1.11786 ns
clickbench_q07/vortex-file-compressed 28054640 2.93034e+07 0.957386 ns
clickbench_q08/vortex-file-compressed 652714646 6.77967e+08 0.962753 ns
clickbench_q09/vortex-file-compressed 790186687 7.95697e+08 0.993075 ns
clickbench_q10/vortex-file-compressed 113717394 1.12248e+08 1.01309 ns
clickbench_q11/vortex-file-compressed 133813494 1.3096e+08 1.02179 ns
clickbench_q12/vortex-file-compressed 468403088 4.78073e+08 0.979772 ns
clickbench_q13/vortex-file-compressed 691771375 6.96894e+08 0.99265 ns
clickbench_q14/vortex-file-compressed 433844526 4.44107e+08 0.976893 ns
clickbench_q15/vortex-file-compressed 672072724 6.69789e+08 1.00341 ns
clickbench_q16/vortex-file-compressed 1386588143 1.33885e+09 1.03565 ns
clickbench_q17/vortex-file-compressed 1308359312 1.28806e+09 1.01576 ns
clickbench_q18/vortex-file-compressed 2738206774 2.79758e+09 0.978778 ns
clickbench_q19/vortex-file-compressed 34721371 3.36156e+07 1.0329 ns
clickbench_q20/vortex-file-compressed 747648480 7.65372e+08 0.976844 ns
clickbench_q21/vortex-file-compressed 826847974 8.38082e+08 0.986596 ns
clickbench_q22/vortex-file-compressed 1117774786 1.17062e+09 0.954855 ns
clickbench_q23/vortex-file-compressed 1996092401 2.00967e+09 0.993244 ns
clickbench_q24/vortex-file-compressed 175260671 1.78046e+08 0.984357 ns
clickbench_q25/vortex-file-compressed 186347264 1.92877e+08 0.966144 ns
clickbench_q26/vortex-file-compressed 222264939 2.31467e+08 0.960243 ns
clickbench_q27/vortex-file-compressed 1224447233 1.22291e+09 1.00126 ns
clickbench_q28/vortex-file-compressed 10352684749 1.02876e+10 1.00632 ns
clickbench_q29/vortex-file-compressed 717500746 6.76367e+08 1.06082 ns
clickbench_q30/vortex-file-compressed 380416240 3.81214e+08 0.997907 ns
clickbench_q31/vortex-file-compressed 392122805 3.97044e+08 0.987605 ns
clickbench_q32/vortex-file-compressed 2706269597 2.64612e+09 1.02273 ns
clickbench_q33/vortex-file-compressed 2353112413 2.40329e+09 0.979121 ns
clickbench_q34/vortex-file-compressed 2359679799 2.41565e+09 0.976828 ns
clickbench_q35/vortex-file-compressed 898970148 9.25823e+08 0.970996 ns
clickbench_q36/vortex-file-compressed 100862488 1.31992e+08 0.764156 ns
clickbench_q37/vortex-file-compressed 56183211 5.78926e+07 0.970473 ns
clickbench_q38/vortex-file-compressed 41691861 4.22572e+07 0.986621 ns
clickbench_q39/vortex-file-compressed 163749310 1.66718e+08 0.982191 ns
clickbench_q40/vortex-file-compressed 32464197 3.14704e+07 1.03158 ns
clickbench_q41/vortex-file-compressed 30362831 3.08475e+07 0.984289 ns
clickbench_q42/vortex-file-compressed 50408637 5.08298e+07 0.991714 ns

Copy link
Contributor

github-actions bot commented Feb 26, 2025

Benchmarks: compress

Table of Results
name PR 432f061 base c016864 ratio (PR/base) unit
compress time/taxi throughput 0.224959 0.224955 1.00002 bytes/ns
parquet_rs-zstd compress time/taxi throughput 0.282433 0.283835 0.995061 bytes/ns
decompress time/taxi throughput 1.96117 1.93411 1.01399 bytes/ns
parquet_rs-zstd decompress time/taxi throughput 1.65521 1.66699 0.992936 bytes/ns
compress time/AirlineSentiment throughput 0.00283829 0.00289781 0.979458 bytes/ns
parquet_rs-zstd compress time/AirlineSentiment throughput 0.0534437 0.0514177 1.0394 bytes/ns
decompress time/AirlineSentiment throughput 0.0256187 0.0268374 0.954591 bytes/ns
parquet_rs-zstd decompress time/AirlineSentiment throughput 0.0938506 0.0859248 1.09224 bytes/ns
compress time/Arade throughput 0.162821 0.160827 1.0124 bytes/ns
parquet_rs-zstd compress time/Arade throughput 0.398823 0.395531 1.00832 bytes/ns
decompress time/Arade throughput 1.96142 1.84467 1.06329 bytes/ns
parquet_rs-zstd decompress time/Arade throughput 1.9129 1.9051 1.0041 bytes/ns
compress time/Bimbo throughput 0.37854 0.367722 1.02942 bytes/ns
parquet_rs-zstd compress time/Bimbo throughput 0.335607 0.327557 1.02458 bytes/ns
decompress time/Bimbo throughput 2.23127 2.17802 1.02445 bytes/ns
parquet_rs-zstd decompress time/Bimbo throughput 2.82375 2.79978 1.00856 bytes/ns
compress time/CMSprovider throughput 0.0542185 0.0542794 0.998879 bytes/ns
parquet_rs-zstd compress time/CMSprovider throughput 0.347776 0.349149 0.996068 bytes/ns
decompress time/CMSprovider throughput 3.84994 3.3592 1.14609 bytes/ns
parquet_rs-zstd decompress time/CMSprovider throughput 1.79665 1.70536 1.05353 bytes/ns
compress time/Euro2016 throughput 0.152824 0.146683 1.04186 bytes/ns
parquet_rs-zstd compress time/Euro2016 throughput 0.297919 0.287797 1.03517 bytes/ns
decompress time/Euro2016 throughput 2.41844 2.37816 1.01694 bytes/ns
parquet_rs-zstd decompress time/Euro2016 throughput 0.945826 0.971986 0.973087 bytes/ns
compress time/Food throughput 0.180983 0.175553 1.03094 bytes/ns
parquet_rs-zstd compress time/Food throughput 0.311349 0.310499 1.00274 bytes/ns
decompress time/Food throughput 5.09879 4.74468 1.07463 bytes/ns
parquet_rs-zstd decompress time/Food throughput 1.58587 1.59041 0.997143 bytes/ns
compress time/HashTags throughput 0.191046 0.19328 0.988441 bytes/ns
parquet_rs-zstd compress time/HashTags throughput 0.806885 0.805699 1.00147 bytes/ns
decompress time/HashTags throughput 5.79391 5.98356 0.968306 bytes/ns
parquet_rs-zstd decompress time/HashTags throughput 2.6772 2.7933 0.958433 bytes/ns
compress time/TPC-H l_comment chunked throughput 0.210937 0.217418 0.970189 bytes/ns
parquet_rs-zstd compress time/TPC-H l_comment chunked throughput 0.283576 0.286623 0.989369 bytes/ns
decompress time/TPC-H l_comment chunked throughput 3.07982 3.16058 0.974446 bytes/ns
parquet_rs-zstd decompress time/TPC-H l_comment chunked throughput 1.07796 1.09002 0.988937 bytes/ns
compress time/TPC-H l_comment canonical throughput 0.0287061 0.02904 0.9885 bytes/ns
parquet_rs-zstd compress time/TPC-H l_comment canonical throughput 0.283852 0.286759 0.989859 bytes/ns
decompress time/TPC-H l_comment canonical throughput 3.045 3.15185 0.966102 bytes/ns
parquet_rs-zstd decompress time/TPC-H l_comment canonical throughput 1.07371 1.07144 1.00211 bytes/ns
compress time/wide table cols=10 chunks=1 rows=1000 throughput 0.12886 0.128455 1.00315 bytes/ns
parquet_rs-zstd compress time/wide table cols=10 chunks=1 rows=1000 throughput 0.203963 0.191858 1.0631 bytes/ns
decompress time/wide table cols=10 chunks=1 rows=1000 throughput 0.795982 0.782502 1.01723 bytes/ns
parquet_rs-zstd decompress time/wide table cols=10 chunks=1 rows=1000 throughput 0.513412 0.518143 0.99087 bytes/ns
compress time/wide table cols=100 chunks=1 rows=1000 throughput 0.129906 0.129093 1.0063 bytes/ns
parquet_rs-zstd compress time/wide table cols=100 chunks=1 rows=1000 throughput 0.181841 0.19796 0.918573 bytes/ns
decompress time/wide table cols=100 chunks=1 rows=1000 throughput 1.0705 1.09648 0.976313 bytes/ns
parquet_rs-zstd decompress time/wide table cols=100 chunks=1 rows=1000 throughput 0.512742 0.513492 0.998539 bytes/ns
compress time/wide table cols=1000 chunks=1 rows=1000 throughput 0.115481 0.118129 0.977582 bytes/ns
parquet_rs-zstd compress time/wide table cols=1000 chunks=1 rows=1000 throughput 0.165328 0.167367 0.987814 bytes/ns
decompress time/wide table cols=1000 chunks=1 rows=1000 throughput 0.866671 0.900153 0.962804 bytes/ns
parquet_rs-zstd decompress time/wide table cols=1000 chunks=1 rows=1000 throughput 0.433649 0.487622 0.889314 bytes/ns
compress time/wide table cols=10 chunks=50 rows=1000 throughput 0.0701426 0.0704101 0.9962 bytes/ns
parquet_rs-zstd compress time/wide table cols=10 chunks=50 rows=1000 throughput 0.124414 0.131384 0.946954 bytes/ns
decompress time/wide table cols=10 chunks=50 rows=1000 throughput 0.681819 0.791063 0.861902 bytes/ns
parquet_rs-zstd decompress time/wide table cols=10 chunks=50 rows=1000 throughput 0.527746 0.537489 0.981873 bytes/ns
compress time/wide table cols=100 chunks=50 rows=1000 throughput 0.0628821 0.0686043 0.916591 bytes/ns
parquet_rs-zstd compress time/wide table cols=100 chunks=50 rows=1000 throughput 0.111385 0.131728 0.845571 bytes/ns
decompress time/wide table cols=100 chunks=50 rows=1000 throughput 1.09519 1.14446 0.956942 bytes/ns
parquet_rs-zstd decompress time/wide table cols=100 chunks=50 rows=1000 throughput 0.525361 0.540335 0.972287 bytes/ns
compress time/wide table cols=1000 chunks=50 rows=1000 throughput 0.0557493 0.0589242 0.946119 bytes/ns
parquet_rs-zstd compress time/wide table cols=1000 chunks=50 rows=1000 throughput 0.09066 0.103777 0.873606 bytes/ns
decompress time/wide table cols=1000 chunks=50 rows=1000 throughput 0.876148 0.967643 0.905446 bytes/ns
parquet_rs-zstd decompress time/wide table cols=1000 chunks=50 rows=1000 throughput 0.477251 0.49205 0.969923 bytes/ns
vortex:raw size/taxi 0.117731 0.117731 1
vortex size/taxi 5.8248e+07 5.8248e+07 1
vortex:parquet-zstd size/taxi 1.04086 1.04086 1
vortex:raw size/AirlineSentiment 1.35456 1.35456 1
vortex size/AirlineSentiment 4424 4424 1
vortex:parquet-zstd size/AirlineSentiment 4.57497 4.57497 1
vortex:raw size/Arade 0.255852 0.255852 1
vortex size/Arade 3.03615e+08 3.03615e+08 1
vortex:parquet-zstd size/Arade 0.994177 0.994177 1
vortex:raw size/Bimbo 0.115537 0.115537 1
vortex size/Bimbo 8.25993e+08 8.25993e+08 1
vortex:parquet-zstd size/Bimbo 2.12802 2.12802 1
vortex:raw size/CMSprovider 0.188932 0.188932 1
vortex size/CMSprovider 1.18657e+09 1.18657e+09 1
vortex:parquet-zstd size/CMSprovider 1.54196 1.54196 1
vortex:raw size/Euro2016 0.471342 0.471342 1
vortex size/Euro2016 2.14471e+08 2.14471e+08 1
vortex:parquet-zstd size/Euro2016 1.80395 1.80395 1
vortex:raw size/Food 0.177319 0.177319 1
vortex size/Food 5.97288e+07 5.97288e+07 1
vortex:parquet-zstd size/Food 1.6486 1.6486 1
vortex:raw size/HashTags 0.14274 0.14274 1
vortex size/HashTags 2.73378e+08 2.73378e+08 1
vortex:parquet-zstd size/HashTags 2.03916 2.03916 1
vortex:raw size/TPC-H l_comment chunked 0.417675 0.419071 0.996668
vortex size/TPC-H l_comment chunked 1.04083e+08 1.04431e+08 0.996668
vortex:parquet-zstd size/TPC-H l_comment chunked 1.82802 1.83438 0.996535
vortex:raw size/TPC-H l_comment canonical 0.4255 0.425496 1.00001
vortex size/TPC-H l_comment canonical 1.06031e+08 1.0603e+08 1.00001
vortex:parquet-zstd size/TPC-H l_comment canonical 1.86219 1.86232 0.999935
vortex:raw size/wide table cols=10 chunks=1 rows=1000 0.62503 0.62503 1
vortex size/wide table cols=10 chunks=1 rows=1000 100096 100096 1
vortex:parquet-zstd size/wide table cols=10 chunks=1 rows=1000 1.07073 1.07073 1
vortex:raw size/wide table cols=100 chunks=1 rows=1000 0.622073 0.622073 1
vortex size/wide table cols=100 chunks=1 rows=1000 996136 996136 1
vortex:parquet-zstd size/wide table cols=100 chunks=1 rows=1000 1.06561 1.06561 1
vortex:raw size/wide table cols=1000 chunks=1 rows=1000 0.621778 0.621778 1
vortex size/wide table cols=1000 chunks=1 rows=1000 9.95654e+06 9.95654e+06 1
vortex:parquet-zstd size/wide table cols=1000 chunks=1 rows=1000 1.0651 1.0651 1
vortex:raw size/wide table cols=10 chunks=50 rows=1000 0.599708 0.599708 1
vortex size/wide table cols=10 chunks=50 rows=1000 100096 100096 1
vortex:parquet-zstd size/wide table cols=10 chunks=50 rows=1000 1.07073 1.07073 1
vortex:raw size/wide table cols=100 chunks=50 rows=1000 0.598133 0.598133 1
vortex size/wide table cols=100 chunks=50 rows=1000 996136 996136 1
vortex:parquet-zstd size/wide table cols=100 chunks=50 rows=1000 1.06561 1.06561 1
vortex:raw size/wide table cols=1000 chunks=50 rows=1000 0.597975 0.597975 1
vortex size/wide table cols=1000 chunks=50 rows=1000 9.95654e+06 9.95654e+06 1
vortex:parquet-zstd size/wide table cols=1000 chunks=50 rows=1000 1.0651 1.0651 1

@0ax1 0ax1 force-pushed the ad/take-from-trait branch from 7bb6a09 to 226b73f Compare February 26, 2025 14:29
@0ax1 0ax1 marked this pull request as ready for review February 26, 2025 14:32
@0ax1 0ax1 requested a review from gatesn February 26, 2025 14:33
@0ax1 0ax1 force-pushed the ad/take-from-trait branch from 2958982 to b875888 Compare February 26, 2025 14:40
@0ax1 0ax1 force-pushed the ad/take-from-trait branch from b875888 to 4d6cfda Compare February 26, 2025 14:43
Copy link
Contributor

@gatesn gatesn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



fn take_impl(array: &dyn Array, indices: &dyn Array) -> VortexResult<ArrayRef> {
    // First look for a TakeFrom specialized on the indices.
    if let Some(take_from_fn) = indices.vtable().take_from_fn() {
        if let Some(arr) = take_from_fn.take_from(indices, array)? {
            return Ok(arr);
        }
    }

    // If TakeFn defined for the encoding, delegate to TakeFn.
    // If we know from stats that indices are all valid, we can avoid all bounds checks.
    if let Some(take_fn) = array.vtable().take_fn() {
        return take_fn.take(array, indices);
    }
    ```

@0ax1 0ax1 requested a review from gatesn February 26, 2025 15:56
Copy link

codspeed-hq bot commented Feb 26, 2025

CodSpeed Performance Report

Merging #2527 will improve performances by 41.79%

Comparing ad/take-from-trait (78b05e7) with develop (358bdcf)

Summary

⚡ 4 improvements
✅ 765 untouched benchmarks
🆕 6 new benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
patched_take_10k_adversarial 2 ms 1.4 ms +41.79%
patched_take_10k_contiguous_not_patches 1,050.3 µs 741.4 µs +41.67%
patched_take_10k_contiguous_patches 1.8 ms 1.4 ms +23.61%
patched_take_10k_random 2.1 ms 1.6 ms +26.01%
🆕 take_indices[(1000, 16)] N/A 20.4 µs N/A
🆕 take_indices[(1000, 256)] N/A 20 µs N/A
🆕 take_indices[(1000, 4)] N/A 24.1 µs N/A
🆕 take_indices[(10000, 16)] N/A 26.8 µs N/A
🆕 take_indices[(10000, 256)] N/A 21 µs N/A
🆕 take_indices[(10000, 4)] N/A 43.3 µs N/A

@0ax1 0ax1 force-pushed the ad/take-from-trait branch from dca0511 to 153b967 Compare February 26, 2025 17:36
@0ax1 0ax1 changed the title feat: fuse dict + runend decoding - take_from feat: fuse index runend decoding - take_from Feb 27, 2025
@gatesn gatesn added the benchmark Run benchmarks on this branch label Feb 27, 2025
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Feb 27, 2025
@0ax1 0ax1 force-pushed the ad/take-from-trait branch from 5f3c8bf to 87a80a5 Compare February 27, 2025 13:28
@0ax1 0ax1 added the benchmark Run benchmarks on this branch label Feb 27, 2025
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Feb 27, 2025
@0ax1
Copy link
Member Author

0ax1 commented Feb 27, 2025

Unrelated to this PR, the macro-benchmarks are currently broken. At least for tpch, I'm just doing a git bisect locally. @AdamGS fixed them already

@0ax1 0ax1 added the benchmark Run benchmarks on this branch label Feb 27, 2025
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Feb 27, 2025
@a10y
Copy link
Contributor

a10y commented Feb 27, 2025

So this doesn't actually vectorize the take or anything, it just defers the canonicalization?

@0ax1
Copy link
Member Author

0ax1 commented Feb 27, 2025

So this doesn't actually vectorize the take or anything, it just defers the canonicalization?

It allows for doing a take without materialization. For primitive types in our current runend implementation, writes are auto-vectorized. So skipping the intermediate materialization should bring a perf benefit in those cases. Different thing to keep in mind here is that canonicalizing runend is mainly memory bound, not compute bound. E.g this is not faster: https://github.com/spiraldb/vortex/compare/ad/runend-neon-simd

@gatesn gatesn merged commit c3bd401 into develop Feb 28, 2025
26 checks passed
@gatesn gatesn deleted the ad/take-from-trait branch February 28, 2025 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants