Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stream): lazy emit for HashAggExecutor #7752

Closed
wants to merge 10 commits into from
Closed

Conversation

TennyZhuang
Copy link
Contributor

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Support append-only output from HashAgg if the first group key has a watermark defined on it.

The PR is the backend implementation.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer the issue)
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

#6042

TennyZhuang and others added 5 commits February 7, 2023 23:05
Signed-off-by: TennyZhuang <zty0826@gmail.com>
Signed-off-by: TennyZhuang <zty0826@gmail.com>
Signed-off-by: TennyZhuang <zty0826@gmail.com>
Signed-off-by: TennyZhuang <zty0826@gmail.com>
// SAFETY: `keys_in_batch` is a subset of `held_keys`, which is a
// `HashMap`, so we can ensure that every `&mut agg_group_cache` is
// unique.
unsafe {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to https://doc.rust-lang.org/std/primitive.slice.html#method.split_at_mut, this is definitely a safe behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this magic just to avoid agg_group_cache.pop and agg_group_cache.put? It looks OK to me but is it worth doing so? I mean in terms of maintainability and the fact that it does add an unsafe block.

Copy link
Contributor Author

@TennyZhuang TennyZhuang Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The push-pop pattern make it impossible to change the LRU cache to LFU or other advanced cache algorithms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Signed-off-by: TennyZhuang <zty0826@gmail.com>
Signed-off-by: TennyZhuang <zty0826@gmail.com>
@codecov
Copy link

codecov bot commented Feb 8, 2023

Codecov Report

Merging #7752 (b42e591) into main (1246a8b) will increase coverage by 0.03%.
The diff coverage is 87.37%.

@@            Coverage Diff             @@
##             main    #7752      +/-   ##
==========================================
+ Coverage   71.68%   71.72%   +0.03%     
==========================================
  Files        1112     1112              
  Lines      176885   177069     +184     
==========================================
+ Hits       126807   127001     +194     
+ Misses      50078    50068      -10     
Flag Coverage Δ
rust 71.72% <87.37%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/stream/src/common/table/state_table.rs 81.94% <ø> (+1.73%) ⬆️
src/stream/src/common/table/watermark.rs 86.66% <ø> (+33.33%) ⬆️
src/stream/src/executor/aggregation/mod.rs 88.13% <ø> (ø)
src/stream/src/executor/mod.rs 52.69% <ø> (ø)
src/stream/src/from_proto/agg_common.rs 0.00% <0.00%> (ø)
src/stream/src/from_proto/dynamic_filter.rs 0.00% <0.00%> (ø)
src/stream/src/from_proto/global_simple_agg.rs 0.00% <0.00%> (ø)
src/stream/src/from_proto/group_top_n.rs 0.00% <0.00%> (ø)
...rc/stream/src/from_proto/group_top_n_appendonly.rs 0.00% <0.00%> (ø)
src/stream/src/from_proto/hash_agg.rs 0.00% <0.00%> (ø)
... and 32 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

// SAFETY: `keys_in_batch` is a subset of `held_keys`, which is a
// `HashMap`, so we can ensure that every `&mut agg_group_cache` is
// unique.
unsafe {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this magic just to avoid agg_group_cache.pop and agg_group_cache.put? It looks OK to me but is it worth doing so? I mean in terms of maintainability and the fact that it does add an unsafe block.

@@ -654,6 +734,8 @@ mod tests {
pk_indices: PkIndices,
extreme_cache_size: usize,
executor_id: u64,
// TODO: should we use an enum here?
emit_immediate: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: What about directly using the generic EmitPolicy here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not, it's an internal structure, and is not ready to be used for other executors.

Copy link
Member

@stdrc stdrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

// NULL is unexpected in watermark column, however, if it exists, we'll treat it as the largest, so emit it here.
return true;
};
watermark_val <= &cur_watermark.val.as_scalar_ref_impl()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be < ?

Copy link
Contributor

@soundOfDestiny soundOfDestiny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we recover held_keys after recovery?

@TennyZhuang
Copy link
Contributor Author

How do we recover held_keys after recovery?

Good catch, I’ll do some redesigning.

mergify bot pushed a commit that referenced this pull request Feb 13, 2023
…able (#7869)

Add an option to control the buffering strategy of the watermarks in state table.

Will be used in #7752

Approved-By: st1page
Approved-By: soundOfDestiny
@st1page st1page marked this pull request as draft February 13, 2023 13:04
@TennyZhuang
Copy link
Contributor Author

#8750

@xxchan xxchan deleted the feat/window-agg branch May 14, 2023 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants