Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: batch query dropped table may get wrong result #7615

Closed
zwang28 opened this issue Jan 31, 2023 · 7 comments
Closed

bug: batch query dropped table may get wrong result #7615

zwang28 opened this issue Jan 31, 2023 · 7 comments
Assignees
Labels
type/bug Something isn't working
Milestone

Comments

@zwang28
Copy link
Contributor

zwang28 commented Jan 31, 2023

Describe the bug

As mentioned in #5446

Batch query when dropping
Similar to the above, we may be able to scan the dropped table whose data is being cleaned-up by the storage, due to asynchronous updates of catalog deletion in this frontend. This may cause wrong results or even break some assumptions in batch executors.

To be specific,

  • The state clean compaction filter is allowed to delete dropped table's KVs once drop_streaming_jobs has succeeded.
  • Frontend may query table that has already been dropped by another frontend, due to asynchronous notification.
  • "associate notification version with epoch" doesn't help when frontend uses an old epoch earlier than the drop table epoch.

For example:

  1. The frontend_1 drops MV_1 and succeeds.
  2. A storage compaction removes partial KVs of MV_1.
  3. The frontend_2 doesn't receive any notification due to network issue. Neither its pinned epoch nor catalog is updated.
  4. The frontend_2 queries MV_1 with an old epoch earlier than the drop table epoch. It will get wrong query results.

To Reproduce

No response

Expected behavior

No response

Additional context

No response

@zwang28 zwang28 added the type/bug Something isn't working label Jan 31, 2023
@github-actions github-actions bot added this to the release-0.1.17 milestone Jan 31, 2023
@zwang28
Copy link
Contributor Author

zwang28 commented Jan 31, 2023

@Li0k @Little-Wallace @hzxa21
I don't see any mechanism to prevent this issue in current codebase. Please correct me if I'm wrong.

@Li0k
Copy link
Contributor

Li0k commented Jan 31, 2023

After discussing with zheng, some points were summarized

  1. we have discussed that the filter also takes watermark into account when compacting to avoid the problem that the data accessed in the above example has already been managed. But this is not easy to do. In lsm's storage engine, we reserve at least one version of each key to serve. And it relies on the assumption that the dropped table data will not be accessed again, and the state_clean is done by filtering the table_id. Therefore, when we consider watermark in the state-clean, we will keep at least one version of the key under all conditions. In other words, state-clean cannot be implemented and the data will be retained permanently unless we show it deleted. So it is difficult to introduce watermark in compact to solve the above problem.
  2. Inspired by TTL, we discuss whether we can specify min_epoch to prevent external access to the deleted data.However, the above bug is not caused by using the wrong epoch, but by accessing the data corresponding to a non-existent table, so the above solution cannot be applied directly. According to zheng's analysis, the bug occurs after drop_strema_job, i.e., the local_state_store corresponding to the table should have been deleted. I think the data accessed by batch_query is from committed_version instead of local_state_store, so we can try to check read_version_mapping to determine if the table exists before read_request to avoid the above bug.

The above conclusion is not entirely correct, when using batch cluster, there may not be a local_state_store, all the data comes from committed_version, and this update is asynchronous, the catalog can not synchronously find the table has been deleted.

image

@zwang28
Copy link
Contributor Author

zwang28 commented Feb 1, 2023

It seems neither frontend nor compute node has a deterministic way to tell a table has been dropped, when serving a batch query. 😐

@hzxa21
Copy link
Collaborator

hzxa21 commented Feb 3, 2023

Can we store the list of table ids in hummock version and update the list atomically on table creation/drop? In this way, we know what tables are visible in every version/snapshot.

@zwang28
Copy link
Contributor Author

zwang28 commented Feb 14, 2023

Can we store the list of table ids in hummock version and update the list atomically on table creation/drop? In this way, we know what tables are visible in every version/snapshot.

One problem with this solution is,

  1. Firstly compute node's uncommitted data of the dropped table is removed.
  2. However, the new hummock version generated after step 1, which contains the latest table id list, is delayed for arbitrary long time before received by compute node.
  3. Then, build_read_version_tuple cannot reject batch query over this dropped table, because its old hummock version's table id list still contains it. Meanwhile, it actually performs a committed read unexpectedly, because uncommitted data has been removed in step 1, i.e. user may find monotonic reads is broken in two consecutive read in one session, or even inconsistent data aggregated from multiple compute nodes.

@zwang28 zwang28 modified the milestones: release-0.18, release-0.19 Mar 21, 2023
@hzxa21
Copy link
Collaborator

hzxa21 commented Mar 27, 2023

Link #8796

Why is this related to #8796?

@lmatz
Copy link
Contributor

lmatz commented Mar 27, 2023

Link #8796

Why is this related to #8796?

Sorry, should be #7002, misclick

@zwang28 zwang28 modified the milestones: release-0.19, release-0.20 Apr 24, 2023
@zwang28 zwang28 closed this as not planned Won't fix, can't repro, duplicate, stale May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants