bug: batch query dropped table may get wrong result #7615

zwang28 · 2023-01-31T06:40:09Z

Describe the bug

As mentioned in #5446

Batch query when dropping
Similar to the above, we may be able to scan the dropped table whose data is being cleaned-up by the storage, due to asynchronous updates of catalog deletion in this frontend. This may cause wrong results or even break some assumptions in batch executors.

To be specific,

The state clean compaction filter is allowed to delete dropped table's KVs once drop_streaming_jobs has succeeded.
Frontend may query table that has already been dropped by another frontend, due to asynchronous notification.
"associate notification version with epoch" doesn't help when frontend uses an old epoch earlier than the drop table epoch.

For example:

The frontend_1 drops MV_1 and succeeds.
A storage compaction removes partial KVs of MV_1.
The frontend_2 doesn't receive any notification due to network issue. Neither its pinned epoch nor catalog is updated.
The frontend_2 queries MV_1 with an old epoch earlier than the drop table epoch. It will get wrong query results.

To Reproduce

No response

Expected behavior

No response

Additional context

No response

zwang28 · 2023-01-31T06:42:23Z

@Li0k @Little-Wallace @hzxa21
I don't see any mechanism to prevent this issue in current codebase. Please correct me if I'm wrong.

Li0k · 2023-01-31T15:48:15Z

After discussing with zheng, some points were summarized

we have discussed that the filter also takes watermark into account when compacting to avoid the problem that the data accessed in the above example has already been managed. But this is not easy to do. In lsm's storage engine, we reserve at least one version of each key to serve. And it relies on the assumption that the dropped table data will not be accessed again, and the state_clean is done by filtering the table_id. Therefore, when we consider watermark in the state-clean, we will keep at least one version of the key under all conditions. In other words, state-clean cannot be implemented and the data will be retained permanently unless we show it deleted. So it is difficult to introduce watermark in compact to solve the above problem.
Inspired by TTL, we discuss whether we can specify min_epoch to prevent external access to the deleted data.However, the above bug is not caused by using the wrong epoch, but by accessing the data corresponding to a non-existent table, so the above solution cannot be applied directly. According to zheng's analysis, the bug occurs after drop_strema_job, i.e., the local_state_store corresponding to the table should have been deleted. I think the data accessed by batch_query is from committed_version instead of local_state_store, ~~so we can try to check read_version_mapping to determine if the table exists before read_request to avoid the above bug.~~

The above conclusion is not entirely correct, when using batch cluster, there may not be a local_state_store, all the data comes from committed_version, and this update is asynchronous, the catalog can not synchronously find the table has been deleted.

zwang28 · 2023-02-01T07:36:04Z

It seems neither frontend nor compute node has a deterministic way to tell a table has been dropped, when serving a batch query. 😐

hzxa21 · 2023-02-03T06:16:56Z

Can we store the list of table ids in hummock version and update the list atomically on table creation/drop? In this way, we know what tables are visible in every version/snapshot.

zwang28 · 2023-02-14T05:19:25Z

Can we store the list of table ids in hummock version and update the list atomically on table creation/drop? In this way, we know what tables are visible in every version/snapshot.

One problem with this solution is,

Firstly compute node's uncommitted data of the dropped table is removed.
However, the new hummock version generated after step 1, which contains the latest table id list, is delayed for arbitrary long time before received by compute node.
Then, build_read_version_tuple cannot reject batch query over this dropped table, because its old hummock version's table id list still contains it. Meanwhile, it actually performs a committed read unexpectedly, because uncommitted data has been removed in step 1, i.e. user may find monotonic reads is broken in two consecutive read in one session, or even inconsistent data aggregated from multiple compute nodes.

hzxa21 · 2023-03-27T15:27:25Z

Link #8796

Why is this related to #8796?

lmatz · 2023-03-27T16:04:15Z

Link #8796

Why is this related to #8796?

Sorry, should be #7002, misclick

zwang28 added the type/bug Something isn't working label Jan 31, 2023

github-actions bot added this to the release-0.1.17 milestone Jan 31, 2023

zwang28 added priority/low and removed priority/low labels Feb 1, 2023

lmatz mentioned this issue Feb 2, 2023

Tracking: Critical Performance & Stability Issues #6640

Open

65 tasks

fuyufjh assigned hzxa21 and zwang28 Feb 6, 2023

zwang28 mentioned this issue Feb 8, 2023

fix(storage): reject query upon dropped table #7772

Closed

5 tasks

zwang28 modified the milestones: release-0.1.17, release-0.1.18 Feb 15, 2023

zwang28 modified the milestones: release-0.18, release-0.19 Mar 21, 2023

zwang28 modified the milestones: release-0.19, release-0.20 Apr 24, 2023

zwang28 closed this as not planned Won't fix, can't repro, duplicate, stale May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: batch query dropped table may get wrong result #7615

bug: batch query dropped table may get wrong result #7615

zwang28 commented Jan 31, 2023 •

edited

Loading

zwang28 commented Jan 31, 2023

Li0k commented Jan 31, 2023 •

edited

Loading

zwang28 commented Feb 1, 2023 •

edited

Loading

hzxa21 commented Feb 3, 2023

zwang28 commented Feb 14, 2023 •

edited

Loading

hzxa21 commented Mar 27, 2023

lmatz commented Mar 27, 2023

bug: batch query dropped table may get wrong result #7615

bug: batch query dropped table may get wrong result #7615

Comments

zwang28 commented Jan 31, 2023 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Additional context

zwang28 commented Jan 31, 2023

Li0k commented Jan 31, 2023 • edited Loading

zwang28 commented Feb 1, 2023 • edited Loading

hzxa21 commented Feb 3, 2023

zwang28 commented Feb 14, 2023 • edited Loading

hzxa21 commented Mar 27, 2023

lmatz commented Mar 27, 2023

zwang28 commented Jan 31, 2023 •

edited

Loading

Li0k commented Jan 31, 2023 •

edited

Loading

zwang28 commented Feb 1, 2023 •

edited

Loading

zwang28 commented Feb 14, 2023 •

edited

Loading