-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: streamline column-aware row encoding impl #20583
base: main
Are you sure you want to change the base?
Conversation
const OFFSET8 = 0b01; | ||
const OFFSET16 = 0b10; | ||
const OFFSET32 = 0b11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OFFSET32
here overlaps with OFFSET8
and OFFSET16
, for which the bitflag doesn't seem to provide clear semantics. That's why I decide to re-implement with bitfield.
offsets: Vec<u8>, | ||
buf: Vec<u8>, | ||
} | ||
|
||
/// A trait unifying [`ToDatumRef`] and already encoded bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To unify encode
and encode_slice
.
// Use 0 if there's no data. | ||
let max_offset = usize_offsets.last().copied().unwrap_or(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that we support ser/de for empty rows.
.reserve_exact(usize_offsets.len() * offset_width.width()); | ||
for &offset in usize_offsets { | ||
self.offsets | ||
.put_uint_le(offset as u64, offset_width.width()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unifies put_u8
, put_u16_le
, and put_u32_le
.
/// A view of the encoded bytes, which can be iterated over to get the column id and data. | ||
/// Used for deserialization. | ||
// TODO: can we unify this with `RowEncoding`, which is for serialization? | ||
#[derive(Clone)] | ||
struct EncodedBytes<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To unify Deserializer
and try_drop_invalid_columns
.
04f2c7b
to
8acd50f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR refactors the column-aware row encoding implementation to improve clarity, conciseness, and reduce potential errors.
- Replaces the bitflags-based header with a bitfield struct and an enum representing offset width.
- Consolidates encoding logic by introducing a generic Encode trait and refines the serializer/deserializer implementations.
- Updates tests and configuration to match the new encoding scheme.
Reviewed Changes
File | Description |
---|---|
src/common/src/util/value_encoding/column_aware_row_encoding.rs | Refactored row encoding/decoding using a bitfield-based header and generic encoding trait. |
src/storage/src/row_serde/value_serde.rs | Adjusted tests to validate new behavior for handling dropped columns. |
src/common/Cargo.toml | Added the bitfield-struct dependency to support new header implementation. |
src/common/src/util/value_encoding/error.rs | Changed the error message format for invalid flags to use binary representation. |
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (1)
src/common/src/util/value_encoding/error.rs:45
- [nitpick] Using binary formatting for the invalid flag may obscure its numeric value. Consider providing additional context or including a decimal representation for improved clarity.
#[error("Invalid flag: {0:b}")]
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
8acd50f
to
5bc558d
Compare
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Reimplement column-aware row encoding to make the code more clear, concise and less error-prone. See review comments for explanation of changes.
Checklist
Documentation
Release note