-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(stream): lazy emit for HashAggExecutor #7752
Conversation
Signed-off-by: TennyZhuang <zty0826@gmail.com>
Signed-off-by: TennyZhuang <zty0826@gmail.com>
Signed-off-by: TennyZhuang <zty0826@gmail.com>
// SAFETY: `keys_in_batch` is a subset of `held_keys`, which is a | ||
// `HashMap`, so we can ensure that every `&mut agg_group_cache` is | ||
// unique. | ||
unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to https://doc.rust-lang.org/std/primitive.slice.html#method.split_at_mut, this is definitely a safe behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this magic just to avoid agg_group_cache.pop
and agg_group_cache.put
? It looks OK to me but is it worth doing so? I mean in terms of maintainability and the fact that it does add an unsafe
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The push-pop pattern make it impossible to change the LRU cache to LFU or other advanced cache algorithms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
Signed-off-by: TennyZhuang <zty0826@gmail.com>
Signed-off-by: TennyZhuang <zty0826@gmail.com>
Codecov Report
@@ Coverage Diff @@
## main #7752 +/- ##
==========================================
+ Coverage 71.68% 71.72% +0.03%
==========================================
Files 1112 1112
Lines 176885 177069 +184
==========================================
+ Hits 126807 127001 +194
+ Misses 50078 50068 -10
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
// SAFETY: `keys_in_batch` is a subset of `held_keys`, which is a | ||
// `HashMap`, so we can ensure that every `&mut agg_group_cache` is | ||
// unique. | ||
unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this magic just to avoid agg_group_cache.pop
and agg_group_cache.put
? It looks OK to me but is it worth doing so? I mean in terms of maintainability and the fact that it does add an unsafe
block.
@@ -654,6 +734,8 @@ mod tests { | |||
pk_indices: PkIndices, | |||
extreme_cache_size: usize, | |||
executor_id: u64, | |||
// TODO: should we use an enum here? | |||
emit_immediate: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: What about directly using the generic EmitPolicy
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not, it's an internal structure, and is not ready to be used for other executors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
// NULL is unexpected in watermark column, however, if it exists, we'll treat it as the largest, so emit it here. | ||
return true; | ||
}; | ||
watermark_val <= &cur_watermark.val.as_scalar_ref_impl() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be <
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we recover held_keys
after recovery?
Good catch, I’ll do some redesigning. |
Signed-off-by: TennyZhuang <zty0826@gmail.com>
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Support append-only output from HashAgg if the first group key has a watermark defined on it.
The PR is the backend implementation.
Checklist
./risedev check
(or alias,./risedev c
)Refer to a related PR or issue link (optional)
#6042