Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(batch): introduce batch AsOf join #19790

Merged
merged 12 commits into from
Jan 28, 2025
Merged

feat(batch): introduce batch AsOf join #19790

merged 12 commits into from
Jan 28, 2025

Conversation

yuhao-su
Copy link
Contributor

@yuhao-su yuhao-su commented Dec 12, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

add batch ASOF join functionality to batch HashJoinExecutor.

This implementation changed the matched row iter process to return only 1 matched row that fit the asof requirement.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

Usage is the same same as #18683

Ties in the asof column in the right table.

The query result is non-deterministic when there are ties in the inequality condition column in the right table. ASOF JOIN aims to find and join the nearest record from the right table and only matches 1 row for each record from the left table. Therefore when there are ties in the right table, the right table row matched in the returned query result could be different in multiple runs.

  • My PR needs documentation updates.
Release note

stream_error: 'Invalid input syntax: AsOf join requires exactly 1 ineuquality condition'
- sql: CREATE TABLE t1(v1 varchar, v2 int, v3 int); CREATE TABLE t2(v1 varchar, v2 int, v3 int); SELECT t1.v1 t1_v1, t1.v2 t1_v2, t2.v1 t2_v1, t2.v2 t2_v2 FROM t1 ASOF JOIN t2 ON t1.v1 = t2.v1 || 'a' and t1.v2 > t2.v2;
batch_plan: |-
BatchExchange { order: [], dist: Single }
└─BatchGroupTopN { order: [t2.v2 DESC], limit: 1, offset: 0, group_key: [t1.v1, t1.v2] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plan seems incorrect because it will make the rows less then expected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the implementation and this should be resolved now.

Copy link

gru-agent bot commented Jan 22, 2025

This pull request has been modified. If you want me to regenerate unit test for any of the files related, please find the file in "Files Changed" tab and add a comment @gru-agent. (The github "Comment on this file" feature is in the upper right corner of each file in "Files Changed" tab.)

@yuhao-su yuhao-su requested a review from chenzl25 January 22, 2025 07:19
@yuhao-su yuhao-su added the user-facing-changes Contains changes that are visible to users label Jan 22, 2025
Copy link
Contributor

Hi, there.

📝 Telemetry Reminder:
If you're implementing this feature, please consider adding telemetry metrics to track its usage. This helps us understand how the feature is being used and improve it further.
You can find the function report_event of telemetry reporting in the following files. Feel free to ask questions if you need any guidance!

  • src/frontend/src/telemetry.rs
  • src/meta/src/telemetry.rs
  • src/stream/src/telemetry.rs
  • src/storage/compactor/src/telemetry.rs
    Or calling report_event_common (src/common/telemetry_event/src/lib.rs) as if finding it hard to implement.
    ✨ Thank you for your contribution to RisingWave! ✨

This is an automated comment created by the peaceiris/actions-label-commenter. Responding to the bot or mentioning it won't have any effect.

@yuhao-su yuhao-su requested a review from fuyufjh January 22, 2025 19:02
Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@yuhao-su yuhao-su enabled auto-merge January 27, 2025 20:33
@yuhao-su yuhao-su added this pull request to the merge queue Jan 28, 2025
Merged via the queue into main with commit c89eeed Jan 28, 2025
30 of 31 checks passed
@yuhao-su yuhao-su deleted the yuhao/batch-asof-join branch January 28, 2025 04:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants