Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: nexmark q21 #8161

Closed
Tracked by #7289
lmatz opened this issue Feb 23, 2023 · 3 comments
Closed
Tracked by #7289

perf: nexmark q21 #8161

lmatz opened this issue Feb 23, 2023 · 3 comments

Comments

@lmatz
Copy link
Contributor

lmatz commented Feb 23, 2023

Query:

CREATE MATERIALIZED VIEW nexmark_q21 AS
SELECT
    auction, bidder, price, channel,
    CASE
        WHEN LOWER(channel) = 'apple' THEN '0'
        WHEN LOWER(channel) = 'google' THEN '1'
        WHEN LOWER(channel) = 'facebook' THEN '2'
        WHEN LOWER(channel) = 'baidu' THEN '3'
    ELSE (regexp_match(url, '(&|^)channel_id=([^&]*)'))[2]
    END
    AS channel_id 
FROM 
    bid
WHERE 
    (regexp_match(url, '(&|^)channel_id=([^&]*)'))[2] is not null or LOWER(channel) in ('apple', 'google', 'facebook', 'baidu');
 StreamMaterialize { columns: [auction, bidder, price, channel, channel_id, _row_id(hidden)], pk_columns: [_row_id], pk_conflict: "no check" }
 └─StreamExchange { dist: HashShard(_row_id) }
   └─StreamProject { exprs: [Field(bid, 0:Int32) as $expr1, Field(bid, 1:Int32) as $expr2, Field(bid, 2:Int32) as $expr3, Field(bid, 3:Int32) as $expr4, Case((Lower(Field(bid, 3:Int32)) = 'apple':Varchar), '0':Varchar, (Lower(Field(bid, 3:Int32)) = 'google':Varchar), '1':Varchar, (Lower(Field(bid, 3:Int32)) = 'facebook':Varchar), '2':Varchar, (Lower(Field(bid, 3:Int32)) = 'baidu':Varchar), '3':Varchar, ArrayAccess(RegexpMatch(Field(bid, 4:Int32), '(&|^)channel_id=([^&]*)':Varchar), 2:Int32)) as $expr5, _row_id] }
     └─StreamFilter { predicate: (IsNotNull(ArrayAccess(RegexpMatch(Field(bid, 4:Int32), '(&|^)channel_id=([^&]*)':Varchar), 2:Int32)) OR In(Lower(Field(bid, 3:Int32)), 'apple':Varchar, 'google':Varchar, 'facebook':Varchar, 'baidu':Varchar)) AND (event_type = 2:Int32) }
       └─StreamRowIdGen { row_id_index: 4 }
         └─StreamSource { source: "nexmark", columns: ["event_type", "person", "auction", "bid", "_row_id"] }
(6 rows)
@github-actions github-actions bot added this to the release-0.1.18 milestone Feb 23, 2023
@lmatz
Copy link
Contributor Author

lmatz commented Feb 23, 2023

Flink:

== Optimized Physical Plan ==
Calc(select=[bid.auction AS auction, bid.bidder AS bidder, bid.price AS price, bid.channel AS channel, CASE(=(LOWER(bid.channel), _UTF-16LE'apple':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), _UTF-16LE'0':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", =(LOWER(bid.channel), _UTF-16LE'google':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), _UTF-16LE'1':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", =(LOWER(bid.channel), _UTF-16LE'facebook':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), _UTF-16LE'2':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", =(LOWER(bid.channel), _UTF-16LE'baidu':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), _UTF-16LE'3':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", REGEXP_EXTRACT(bid.url, _UTF-16LE'(&|^)channel_id=([^&]*)', 2)) AS channel_id], where=[AND(=(event_type, 2), OR(IS NOT NULL(REGEXP_EXTRACT(bid.url, _UTF-16LE'(&|^)channel_id=([^&]*)', 2)), SEARCH(LOWER(bid.channel), Sarg[_UTF-16LE'apple':VARCHAR(8) CHARACTER SET "UTF-16LE", _UTF-16LE'baidu':VARCHAR(8) CHARACTER SET "UTF-16LE", _UTF-16LE'facebook':VARCHAR(8) CHARACTER SET "UTF-16LE", _UTF-16LE'google':VARCHAR(8) CHARACTER SET "UTF-16LE"]:VARCHAR(8) CHARACTER SET "UTF-16LE")))])
+- WatermarkAssigner(rowtime=[dateTime], watermark=[-(dateTime, 4000:INTERVAL SECOND)])
   +- Calc(select=[event_type, person, auction, bid, CASE(=(event_type, 0), person.dateTime, =(event_type, 1), auction.dateTime, bid.dateTime) AS dateTime])
      +- TableSourceScan(table=[[default_catalog, default_database, datagen]], fields=[event_type, person, auction, bid])

== Optimized Execution Plan ==
Calc(select=[bid.auction AS auction, bid.bidder AS bidder, bid.price AS price, bid.channel AS channel, CASE((LOWER(bid.channel) = 'apple'), '0', (LOWER(bid.channel) = 'google'), '1', (LOWER(bid.channel) = 'facebook'), '2', (LOWER(bid.channel) = 'baidu'), '3', REGEXP_EXTRACT(bid.url, '(&|^)channel_id=([^&]*)', 2)) AS channel_id], where=[((event_type = 2) AND (REGEXP_EXTRACT(bid.url, '(&|^)channel_id=([^&]*)', 2) IS NOT NULL OR SEARCH(LOWER(bid.channel), Sarg[_UTF-16LE'apple', _UTF-16LE'baidu', _UTF-16LE'facebook', _UTF-16LE'google'])))])
+- WatermarkAssigner(rowtime=[dateTime], watermark=[(dateTime - 4000:INTERVAL SECOND)])
   +- Calc(select=[event_type, person, auction, bid, CASE((event_type = 0), person.dateTime, (event_type = 1), auction.dateTime, bid.dateTime) AS dateTime])
      +- TableSourceScan(table=[[default_catalog, default_database, datagen]], fields=[event_type, person, auction, bid])

@lmatz
Copy link
Contributor Author

lmatz commented Feb 23, 2023

Don't see any difference, just testing whose case when and regexp_match is better I guess......

@lmatz lmatz removed this from the release-0.18 milestone Mar 22, 2023
@github-actions
Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

@lmatz lmatz closed this as not planned Won't fix, can't repro, duplicate, stale May 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant