Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: nexmark q22 #8162

Closed
Tracked by #7289
lmatz opened this issue Feb 23, 2023 · 3 comments
Closed
Tracked by #7289

perf: nexmark q22 #8162

lmatz opened this issue Feb 23, 2023 · 3 comments

Comments

@lmatz
Copy link
Contributor

lmatz commented Feb 23, 2023

Query:

CREATE MATERIALIZED VIEW nexmark_q22 AS
SELECT
    auction, bidder, price, channel,
    split_part(url, '/', 4) as dir1,
    split_part(url, '/', 5) as dir2,
    split_part(url, '/', 6) as dir3 
FROM 
    bid;

RW:

 StreamMaterialize { columns: [auction, bidder, price, channel, dir1, dir2, dir3, _row_id(hidden)], pk_columns: [_row_id], pk_conflict: "no check" }
 └─StreamExchange { dist: HashShard(_row_id) }
   └─StreamProject { exprs: [Field(bid, 0:Int32) as $expr1, Field(bid, 1:Int32) as $expr2, Field(bid, 2:Int32) as $expr3, Field(bid, 3:Int32) as $expr4, SplitPart(Field(bid, 4:Int32), '/':Varchar, 4:Int32) as $expr5, SplitPart(Field(bid, 4:Int32), '/':Varchar, 5:Int32) as $expr6, SplitPart(Field(bid, 4:Int32), '/':Varchar, 6:Int32) as $expr7, _row_id] }
     └─StreamFilter { predicate: (event_type = 2:Int32) }
       └─StreamRowIdGen { row_id_index: 4 }
         └─StreamSource { source: "nexmark", columns: ["event_type", "person", "auction", "bid", "_row_id"] }
(6 rows)
 Fragment 0
   StreamMaterialize { columns: [auction, bidder, price, channel, dir1, dir2, dir3, _row_id(hidden)], pk_columns: [_row_id], pk_conflict: "no check" }
       materialized table: 4294967294
     StreamExchange Hash([7]) from 1
 
 Fragment 1
   StreamProject { exprs: [Field(bid, 0:Int32) as $expr245, Field(bid, 1:Int32) as $expr246, Field(bid, 2:Int32) as $expr247, Field(bid, 3:Int32) as $expr248, SplitPart(Field(bid, 4:Int32), '/':Varchar, 4:Int32) as $expr249, SplitPart(Field(bid, 4:Int32), '/':Varchar, 5:Int32) as $expr250, SplitPart(Field(bid, 4:Int32), '/':Varchar, 6:Int32) as $expr251, _row_id] }
     StreamFilter { predicate: (event_type = 2:Int32) }
       StreamRowIdGen { row_id_index: 4 }
         StreamSource { source: "nexmark", columns: ["event_type", "person", "auction", "bid", "_row_id"] }
             source state table: 0
 
  Table 0 { columns: [partition_id, offset], primary key: [$0 ASC], value indices: [0, 1], distribution key: [] }
  Table 4294967294 { columns: [auction, bidder, price, channel, dir1, dir2, dir3, _row_id], primary key: [$7 ASC], value indices: [0, 1, 2, 3, 4, 5, 6, 7], distribution key: [7] }
(14 rows)
@github-actions github-actions bot added this to the release-0.1.18 milestone Feb 23, 2023
@lmatz
Copy link
Contributor Author

lmatz commented Feb 23, 2023

Flink:

== Optimized Physical Plan ==
Calc(select=[bid.auction AS auction, bid.bidder AS bidder, bid.price AS price, bid.channel AS channel, SPLIT_INDEX(bid.url, _UTF-16LE'/', 3) AS dir1, SPLIT_INDEX(bid.url, _UTF-16LE'/', 4) AS dir2, SPLIT_INDEX(bid.url, _UTF-16LE'/', 5) AS dir3], where=[=(event_type, 2)])
+- WatermarkAssigner(rowtime=[dateTime], watermark=[-(dateTime, 4000:INTERVAL SECOND)])
   +- Calc(select=[event_type, person, auction, bid, CASE(=(event_type, 0), person.dateTime, =(event_type, 1), auction.dateTime, bid.dateTime) AS dateTime])
      +- TableSourceScan(table=[[default_catalog, default_database, datagen]], fields=[event_type, person, auction, bid])

== Optimized Execution Plan ==
Calc(select=[bid.auction AS auction, bid.bidder AS bidder, bid.price AS price, bid.channel AS channel, SPLIT_INDEX(bid.url, '/', 3) AS dir1, SPLIT_INDEX(bid.url, '/', 4) AS dir2, SPLIT_INDEX(bid.url, '/', 5) AS dir3], where=[(event_type = 2)])
+- WatermarkAssigner(rowtime=[dateTime], watermark=[(dateTime - 4000:INTERVAL SECOND)])
   +- Calc(select=[event_type, person, auction, bid, CASE((event_type = 0), person.dateTime, (event_type = 1), auction.dateTime, bid.dateTime) AS dateTime])
      +- TableSourceScan(table=[[default_catalog, default_database, datagen]], fields=[event_type, person, auction, bid])

@lmatz
Copy link
Contributor Author

lmatz commented Feb 23, 2023

split_part(varchar,varchar,int32) | 29.032 | 28.368 | -2.3%

as shown in #6868

@lmatz lmatz removed this from the release-0.18 milestone Mar 22, 2023
@github-actions
Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

@lmatz lmatz closed this as not planned Won't fix, can't repro, duplicate, stale May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant