Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-45201: [C++][Parquet] Improve performance of generating size statistics #45202

Merged
merged 4 commits into from
Jan 9, 2025

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Jan 8, 2025

Rationale for this change

We found out in #45085 that there is a non-trivial overhead when writing size statistics is enabled.

What changes are included in this PR?

Dramatically reduce overhead by speeding up def/rep levels histogram updates.

Performance results on the author's machine:

------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                      Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------------------------------
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                   8103053 ns      8098569 ns           86 bytes_per_second=1003.26Mi/s items_per_second=129.477M/s output_size=537.472k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>            8153499 ns      8148492 ns           86 bytes_per_second=997.117Mi/s items_per_second=128.683M/s output_size=537.488k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>     8212560 ns      8207754 ns           83 bytes_per_second=989.918Mi/s items_per_second=127.754M/s output_size=537.502k page_index_size=47

BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::StringType>                 10405020 ns     10400775 ns           67 bytes_per_second=444.142Mi/s items_per_second=100.817M/s output_size=848.305k page_index_size=34
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>          10464784 ns     10460778 ns           66 bytes_per_second=441.594Mi/s items_per_second=100.239M/s output_size=848.325k page_index_size=34
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>   10469832 ns     10465739 ns           67 bytes_per_second=441.385Mi/s items_per_second=100.191M/s output_size=848.344k page_index_size=48

BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                       13004962 ns     12992678 ns           52 bytes_per_second=657.101Mi/s items_per_second=80.7052M/s output_size=617.464k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>                13718352 ns     13705599 ns           50 bytes_per_second=622.921Mi/s items_per_second=76.5071M/s output_size=617.486k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>         13845553 ns     13832138 ns           52 bytes_per_second=617.222Mi/s items_per_second=75.8072M/s output_size=617.506k page_index_size=54

BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::StringType>                      15715263 ns     15702707 ns           44 bytes_per_second=320.449Mi/s items_per_second=66.7768M/s output_size=927.326k page_index_size=35
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>               16507328 ns     16493800 ns           43 bytes_per_second=305.079Mi/s items_per_second=63.5739M/s output_size=927.352k page_index_size=35
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>        16575359 ns     16561311 ns           42 bytes_per_second=303.836Mi/s items_per_second=63.3148M/s output_size=927.377k page_index_size=55

Performance results without this PR:

------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                      Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------------------------------
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                   8042576 ns      8037678 ns           87 bytes_per_second=1010.86Mi/s items_per_second=130.458M/s output_size=537.472k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>            9576627 ns      9571279 ns           73 bytes_per_second=848.894Mi/s items_per_second=109.554M/s output_size=537.488k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>     9570204 ns      9563595 ns           73 bytes_per_second=849.576Mi/s items_per_second=109.642M/s output_size=537.502k page_index_size=47

BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::StringType>                 10165397 ns     10160868 ns           69 bytes_per_second=454.628Mi/s items_per_second=103.197M/s output_size=848.305k page_index_size=34
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>          11662568 ns     11657396 ns           60 bytes_per_second=396.265Mi/s items_per_second=89.9494M/s output_size=848.325k page_index_size=34
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>   11657135 ns     11653063 ns           60 bytes_per_second=396.412Mi/s items_per_second=89.9829M/s output_size=848.344k page_index_size=48

BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                       13182006 ns     13168704 ns           51 bytes_per_second=648.318Mi/s items_per_second=79.6264M/s output_size=617.464k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>                16438205 ns     16421762 ns           43 bytes_per_second=519.89Mi/s items_per_second=63.8528M/s output_size=617.486k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>         16424615 ns     16409032 ns           42 bytes_per_second=520.293Mi/s items_per_second=63.9024M/s output_size=617.506k page_index_size=54

BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::StringType>                      15387808 ns     15373086 ns           46 bytes_per_second=327.32Mi/s items_per_second=68.2086M/s output_size=927.326k page_index_size=35
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>               18319628 ns     18302938 ns           37 bytes_per_second=274.924Mi/s items_per_second=57.29M/s output_size=927.352k page_index_size=35
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>        18346665 ns     18329336 ns           37 bytes_per_second=274.528Mi/s items_per_second=57.2075M/s output_size=927.377k page_index_size=55

Are these changes tested?

Tested by existing tests, validated by existing benchmarks.

Are there any user-facing changes?

No.

@pitrou
Copy link
Member Author

pitrou commented Jan 8, 2025

cc @wgtmac

@pitrou
Copy link
Member Author

pitrou commented Jan 8, 2025

@ursabot please benchmark lang=C++

Copy link

github-actions bot commented Jan 8, 2025

⚠️ GitHub issue #45201 has been automatically assigned in GitHub to PR creator.

@pitrou pitrou force-pushed the faster-size-statistics branch from e67472b to 8a4ffbc Compare January 8, 2025 11:38
@github-actions github-actions bot added the awaiting review Awaiting review label Jan 8, 2025
@pitrou
Copy link
Member Author

pitrou commented Jan 8, 2025

Oops, looks like the Continuous Benchmarking site is completely broken now.

@pitrou pitrou marked this pull request as ready for review January 8, 2025 16:46
@pitrou pitrou requested a review from wgtmac as a code owner January 8, 2025 16:46
Copy link

Thanks for your patience. Conbench analyzed the 2 benchmarking runs that have been run so far on PR commit e67472b.

There weren't enough matching historic benchmark results to make a call on whether there were regressions.

The full Conbench report has more details.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 9, 2025
Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General LGTM!

const int16_t* rep_levels) {
// Update histograms now, to maximize cache efficiency.
Copy link
Member

@mapleFU mapleFU Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't fully understand why scaning a pass of def_levels and rep_levels first would maximize cache efficiency, would WriteDefinitionLevels first also update the L1D cache?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure and I don't see any difference on our mini-benchmark. But WriteDefinitionLevels also writes to a destination buffer, so perhaps some of the levels will be evicted from the CPU cache.

@pitrou
Copy link
Member Author

pitrou commented Jan 9, 2025

Do you want to take a look at the updated code @wgtmac @mapleFU ?

@mapleFU
Copy link
Member

mapleFU commented Jan 9, 2025

Does the benchmark result changed after the new code patch?

@pitrou
Copy link
Member Author

pitrou commented Jan 9, 2025

Yes, it's even better, see the new numbers in the PR description.

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

void UpdateLevelHistogram(::arrow::util::span<const int16_t> levels,
::arrow::util::span<int64_t> histogram) {
const int64_t num_levels = static_cast<int64_t>(levels.size());
const int16_t max_level = static_cast<int16_t>(histogram.size() - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Should we DCHECK_GE(histogram.size(), 1)?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could!

@pitrou
Copy link
Member Author

pitrou commented Jan 9, 2025

@github-actions crossbow submit -g cpp

Copy link

github-actions bot commented Jan 9, 2025

Revision: 9f9bc94

Submitted crossbow builds: ursacomputing/crossbow @ actions-f6939022a3

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-20.04-cuda-11.2.2 GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@pitrou
Copy link
Member Author

pitrou commented Jan 9, 2025

The Valgrind error is unrelated.

@pitrou pitrou merged commit 2b5f56c into apache:main Jan 9, 2025
34 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Jan 9, 2025
@pitrou pitrou deleted the faster-size-statistics branch January 9, 2025 15:01
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 2b5f56c.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them.

amoeba pushed a commit that referenced this pull request Jan 31, 2025
…stics (#45202)

We found out in #45085 that there is a non-trivial overhead when writing size statistics is enabled.

Dramatically reduce overhead by speeding up def/rep levels histogram updates.

Performance results on the author's machine:
```
------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                      Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------------------------------
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                   8103053 ns      8098569 ns           86 bytes_per_second=1003.26Mi/s items_per_second=129.477M/s output_size=537.472k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>            8153499 ns      8148492 ns           86 bytes_per_second=997.117Mi/s items_per_second=128.683M/s output_size=537.488k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>     8212560 ns      8207754 ns           83 bytes_per_second=989.918Mi/s items_per_second=127.754M/s output_size=537.502k page_index_size=47

BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::StringType>                 10405020 ns     10400775 ns           67 bytes_per_second=444.142Mi/s items_per_second=100.817M/s output_size=848.305k page_index_size=34
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>          10464784 ns     10460778 ns           66 bytes_per_second=441.594Mi/s items_per_second=100.239M/s output_size=848.325k page_index_size=34
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>   10469832 ns     10465739 ns           67 bytes_per_second=441.385Mi/s items_per_second=100.191M/s output_size=848.344k page_index_size=48

BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                       13004962 ns     12992678 ns           52 bytes_per_second=657.101Mi/s items_per_second=80.7052M/s output_size=617.464k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>                13718352 ns     13705599 ns           50 bytes_per_second=622.921Mi/s items_per_second=76.5071M/s output_size=617.486k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>         13845553 ns     13832138 ns           52 bytes_per_second=617.222Mi/s items_per_second=75.8072M/s output_size=617.506k page_index_size=54

BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::StringType>                      15715263 ns     15702707 ns           44 bytes_per_second=320.449Mi/s items_per_second=66.7768M/s output_size=927.326k page_index_size=35
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>               16507328 ns     16493800 ns           43 bytes_per_second=305.079Mi/s items_per_second=63.5739M/s output_size=927.352k page_index_size=35
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>        16575359 ns     16561311 ns           42 bytes_per_second=303.836Mi/s items_per_second=63.3148M/s output_size=927.377k page_index_size=55
```

Performance results without this PR:
```
------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                      Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------------------------------
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                   8042576 ns      8037678 ns           87 bytes_per_second=1010.86Mi/s items_per_second=130.458M/s output_size=537.472k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>            9576627 ns      9571279 ns           73 bytes_per_second=848.894Mi/s items_per_second=109.554M/s output_size=537.488k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>     9570204 ns      9563595 ns           73 bytes_per_second=849.576Mi/s items_per_second=109.642M/s output_size=537.502k page_index_size=47

BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::StringType>                 10165397 ns     10160868 ns           69 bytes_per_second=454.628Mi/s items_per_second=103.197M/s output_size=848.305k page_index_size=34
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>          11662568 ns     11657396 ns           60 bytes_per_second=396.265Mi/s items_per_second=89.9494M/s output_size=848.325k page_index_size=34
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>   11657135 ns     11653063 ns           60 bytes_per_second=396.412Mi/s items_per_second=89.9829M/s output_size=848.344k page_index_size=48

BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                       13182006 ns     13168704 ns           51 bytes_per_second=648.318Mi/s items_per_second=79.6264M/s output_size=617.464k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>                16438205 ns     16421762 ns           43 bytes_per_second=519.89Mi/s items_per_second=63.8528M/s output_size=617.486k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>         16424615 ns     16409032 ns           42 bytes_per_second=520.293Mi/s items_per_second=63.9024M/s output_size=617.506k page_index_size=54

BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::StringType>                      15387808 ns     15373086 ns           46 bytes_per_second=327.32Mi/s items_per_second=68.2086M/s output_size=927.326k page_index_size=35
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>               18319628 ns     18302938 ns           37 bytes_per_second=274.924Mi/s items_per_second=57.29M/s output_size=927.352k page_index_size=35
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>        18346665 ns     18329336 ns           37 bytes_per_second=274.528Mi/s items_per_second=57.2075M/s output_size=927.377k page_index_size=55
```

Tested by existing tests, validated by existing benchmarks.

No.

* GitHub Issue: #45201

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants