Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(encoding): add estimate size for value encoding #8692

Merged
merged 5 commits into from
Mar 27, 2023

Conversation

st1page
Copy link
Contributor

@st1page st1page commented Mar 21, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

add function to estimate the value encoding size of the datum
prepare for #8683

Checklist For Contributors

  • All checks passed in ./risedev check (or alias, ./risedev c)

Checklist For Reviewers

  • I have requested macro/micro-benchmarks as this PR can affect performance substantially, and the results are shown.

Documentation

  • My PR DOES NOT contain user-facing changes.
Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

Comment on lines 236 to 256
fn estimate_encoded_scalar_size(value: ScalarRefImpl<'_>) -> usize {
match value {
ScalarRefImpl::Int16(_) => 2,
ScalarRefImpl::Int32(_) => 4,
ScalarRefImpl::Int64(_) => 8,
ScalarRefImpl::Serial(_) => 8,
ScalarRefImpl::Float32(_) => 4,
ScalarRefImpl::Float64(_) => 8,
ScalarRefImpl::Utf8(v) => v.as_bytes().len(),
ScalarRefImpl::Bytea(v) => estimate_encoded_str_size(v),
ScalarRefImpl::Bool(_) => 1,
ScalarRefImpl::Decimal(_) => estimate_encoded_decimal_size(),
ScalarRefImpl::Interval(_) => estimate_encoded_interval_size(),
ScalarRefImpl::NaiveDate(_) => estimate_encoded_naivedate_size(),
ScalarRefImpl::NaiveDateTime(_) => estimate_encoded_naivedatetime_size(),
ScalarRefImpl::NaiveTime(_) => estimate_encoded_naivetime_size(),
ScalarRefImpl::Jsonb(_) => 8,
ScalarRefImpl::Struct(s) => estimate_encoded_struct_size(s),
ScalarRefImpl::List(v) => estimate_encoded_list_size(v),
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can refactor it with macro or other techs in future PRs.

@codecov
Copy link

codecov bot commented Mar 21, 2023

Codecov Report

Merging #8692 (75f0c71) into main (16c9708) will increase coverage by 0.01%.
The diff coverage is 97.24%.

@@            Coverage Diff             @@
##             main    #8692      +/-   ##
==========================================
+ Coverage   71.06%   71.07%   +0.01%     
==========================================
  Files        1166     1166              
  Lines      192345   192454     +109     
==========================================
+ Hits       136691   136792     +101     
- Misses      55654    55662       +8     
Flag Coverage Δ
rust 71.07% <97.24%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/common/src/array/list_array.rs 91.30% <85.71%> (-0.06%) ⬇️
src/common/src/array/struct_array.rs 87.19% <85.71%> (-0.02%) ⬇️
src/common/src/util/value_encoding/mod.rs 93.29% <98.94%> (+2.92%) ⬆️

... and 8 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@st1page st1page added this pull request to the merge queue Mar 21, 2023
@TennyZhuang TennyZhuang removed this pull request from the merge queue due to a manual request Mar 21, 2023
@TennyZhuang
Copy link
Contributor

  1. Why it’s estimated_size? IIUC it’s exact.
  2. It’s better to have fuzz test here, but It may be a little complicated. At least, you should write UT for every type to ensure the format will not be broken silently.

Copy link
Contributor

@TennyZhuang TennyZhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test

@TennyZhuang
Copy link
Contributor

You can write some datums, encode them, and compare with your pre-calculated result.

@st1page
Copy link
Contributor Author

st1page commented Mar 21, 2023

  1. Why it’s estimated_size? IIUC it’s exact.

we can not get the exact size for the jsonb 😿

@st1page st1page enabled auto-merge March 27, 2023 07:48
Co-authored-by: TennyZhuang <zty0826@gmail.com>
@st1page st1page added this pull request to the merge queue Mar 27, 2023
Merged via the queue into main with commit ba583cc Mar 27, 2023
@st1page st1page deleted the sts/estimate_encoded_size_for_value_encoding branch March 27, 2023 08:34
if let Some(d) = datum_ref.to_datum_ref() {
1 + estimate_serialize_scalar_size(d)
} else {
1
Copy link
Contributor

@kwannoel kwannoel Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess 1 here is to store the null u8 flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

ScalarRefImpl::NaiveDate(_) => estimate_serialize_naivedate_size(),
ScalarRefImpl::NaiveDateTime(_) => estimate_serialize_naivedatetime_size(),
ScalarRefImpl::NaiveTime(_) => estimate_serialize_naivetime_size(),
ScalarRefImpl::Jsonb(_) => 8,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a mistake? @st1page

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s heavy to pre-compute the size of jsonb, but anyway, 8 is too small and we need some comments here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I will leave an issue to bench and trade-off if it is worth estimating the jsonb's size. @st1page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants