-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transaction whose execution time exceeds max-txn-ttl can still return results #48907
Comments
Since t1 is started before t2 and has the same ttl, it seems that t1 should be aborted as t2 due to timeout. |
The TTL keepalive routine actually starts on receiving the pessimistic lock response of the primary key. So, the TTL of t1 is slightly later than the TTL of t2, depending on the time interval between t2's delete and t1's update. And if you execute the select immediately after the update, it's won't response timeout error. In addition, the error |
Thanks for your reply. We have tested the case you mentioned. Its form is similar to the first case in this issue. It seems that t1's select responses timeout error in an unexpected way.
Notice that t1's update(o1) is loaded 5min later then t2's delete. And o1 is blocked for some seconds and then executed. It seems that the lock hold by t2's delete is removed at that time. Thus t2's next select(o2) responses timeout error. Then we execute t1's next select(o3) immediately after the update. Though it is executed immediately after the update, it still responses timeout error, as o2 did. If TTL keepalive routine starts at t1's update, t1's select(o3) should be executed instead of failing. This situation seems unexpected. |
We also discovered a phenomenon. If the update of t1 ( o1 ) waits long enough, like 7 minutes ( max-txn-ttl is 5 min ), the result of the later select ( o3) will be returned successfully, just like the first case. Otherwise, it will time out directly, just like the second case. This also seems to be unexpected. |
Sorry, the previous explaination is not completely true. The TTL keepalive routine starts on t1's update (o3), however the txn uptime is calculated by the start ts, so t1's TTL < t2's TTL.
@wengsy150943 Since the update(o1) has been blocked for a while, it seems that the keepalive routine was running at that moment, and this line was executed. Thus select(o3) got the timeout error. Would you mind provide the table schema you used in this case? I'd like to try it locally.
@lxr599 The keepalive routine started when update(o1) got executed, however, it waited 10s for checking TTL first time. I guess that's why the select(o3) returned successfully. Could you try an another case, let t1: 1) execute update(o1); 2) wait more than 15s; 3) execute select(o3) ? Thanks, @lxr599 @wengsy150943 ! You’ve really done some in-depth testing. |
The schema we used here is very simple: create table t1 (a int, b int);
insert into t1 values(3,2);
insert into t1 values(2,3); -- in fact, this record is not necessary. To take a more precise time control, I execute these commands in one line after t1's begin.
There are two The phenomenons we discovered are shown below, hope that these phenomenons help: When the second
It seems that there is a 10s delay for timeout responce after txn begin(instead of update(o1)), as you said. BTW, during testing, |
I've run some tests locally. It seems the TTL manager works as expected, and the results is shown as the following. Here is the code I used, you may run it by the command: deno test -A https://gist.githubusercontent.com/zyguan/bf2ba336fc7902f3b279535417905554/raw/471cf22ea251b05edfd6dad0d9aabdbf107036aa/test-tidb-48907.js Also thanks to your cases, I found an another issue #49151 . |
Thanks for your test cases. However, the last case we mentioned is not included in these cases. And we find that it works in a different way when is executed manually or by test tool. Its test case is shown as below: Deno.test('tidb-48907-c', async () => {
const c1 = await mysql.createConnection({ host: TIDB_HOST, port: 4000, user: "root", database: 'test' });
const c2 = await mysql.createConnection({ host: TIDB_HOST, port: 4000, user: "root", database: 'test' });
try {
await c1.query("set @@tidb_general_log=1, @@innodb_lock_wait_timeout=600");
await c1.query("drop table if exists t1");
await c1.query("create table t1 (a int, b int)");
await c1.query("insert into t1 values (3, 2), (2, 3)");
log("show config", await pp(c1.query("show config where name like '%max-txn-ttl'")));
await c1.query("begin");
log("t1 select(o1):", await pp(c2.query("select * from t1")));
await c2.query("begin");
log("t2 delete:", await pp(c2.query("delete from t1 where a=3")));
log("wait 5 minutes");
await sleep((5*60)*1000);
log("continue");
log("t1 update(o3):", await pp(c1.query("update t1 set a=4 where a=3")));
log("wait 5 minutes");
await sleep((5*60)*1000);
log("continue");
log("t1 select(o3'):", await pp(c1.query("select * from t1")));
log("t2 commit:", await pp(c2.query("commit")));
log("t1 commit:", await pp(c1.query("commit")));
} finally {
await close(c1);
await close(c2);
}
}); The last select of t1 is failed while executing by tool. However, it can be successfully executed manually. Could you try executing this case locally? Maybe this phenomenon is related with the jdbc or mysql client. BTW, responsing time out after |
This case can be reproduced when the update matches and changes 0 row. ): My fault. |
So I will close it,if there are any updates, you can reopen it. |
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
2. What did you expect to see? (Required)
t1 starts before t2, but t2 times out, so at least the first
select
afterupdate
in t1 should also time out instead of returning a result.3. What did you see instead (Required)
The
select
afterupdate
and theupdate
itself in t1 were executed successfully.4. What is your TiDB version? (Required)
tidb_version(): Release Version: v7.4.0
Edition: Community
Git Commit Hash: 38cb4f3
Git Branch: heads/refs/tags/v7.4.0
UTC Build Time: 2023-10-10 14:18:50
GoVersion: go1.21.1
Race Enabled: false
Check Table Before Drop: false
Store: tikv
The text was updated successfully, but these errors were encountered: