现象描述
客户在夜间批量执行数据处理时发生了死锁现象,是由不同的会话并发删除数据引起的,这个问题原因是比较简单,但想通过这个案例让大家熟悉如何去排查死锁问题,如何去阅读死锁日志这才是目的。通过模拟用户死锁现象后,死锁日志如下:
*** (1) TRANSACTION:
TRANSACTION 39474, ACTIVE 58 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1200, 4 row lock(s), undo log entries 3
MySQL thread id 9, OS thread handle 123145525800960, query id 77 localhost root updating
DELETE FROM t1 WHERE id = 4
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 114 page no 4 n bits 80 index PRIMARY of table `dhy`.`t1` trx id 39474 lock_mode X locks rec but not gap waiting
Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
0: len 4; hex 00000004; asc ;;
1: len 6; hex 000000009a33; asc 3;;
2: len 7; hex 02000001471399; asc G ;;
3: len 2; hex 6464; asc dd;;
*** (2) TRANSACTION:
TRANSACTION 39475, ACTIVE 46 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1200, 4 row lock(s), undo log entries 3
MySQL thread id 10, OS thread handle 123145526104064, query id 78 localhost root updating
DELETE FROM t1 WHERE id = 3
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 114 page no 4 n bits 80 index PRIMARY of table `dhy`.`t1` trx id 39475 lock_mode X locks rec but not gap
Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
0: len 4; hex 00000004; asc ;;
1: len 6; hex 000000009a33; asc 3;;
2: len 7; hex 02000001471399; asc G ;;
3: len 2; hex 6464; asc dd;;
Record lock, heap no 6 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
0: len 4; hex 00000005; asc ;;
1: len 6; hex 000000009a33; asc 3;;
2: len 7; hex 02000001471375; asc G u;;
3: len 2; hex 6565; asc ee;;
Record lock, heap no 7 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
0: len 4; hex 00000006; asc ;;
1: len 6; hex 000000009a33; asc 3;;
2: len 7; hex 02000001471351; asc G Q;;
3: len 2; hex 6666; asc ff;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 114 page no 4 n bits 80 index PRIMARY of table `dhy`.`t1` trx id 39475 lock_mode X locks rec but not gap waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
0: len 4; hex 00000003; asc ;;
1: len 6; hex 000000009a32; asc 2;;
2: len 7; hex 01000001462e1f; asc F. ;;
3: len 2; hex 6363; asc cc;;
*** WE ROLL BACK TRANSACTION (2)
如何阅读死锁日志
要排查死锁问题我们就要学会如何查看死锁日志,但MySQL死锁日志看起来并不是很直观需要我们一步一步耐心分析。 我们将上面的死锁日志拆分阅读,我们可以得出以下信息:
- 两个事务的事务ID TRANSACTION 39474 TRANSACTION 39475
- 事务39474在执行delete语句是发生了锁等待
DELETE FROM t1 WHERE id = 4
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 114 page no 4 n bits 80 index PRIMARY of table `dhy`.`t1` trx id 39474 lock_mode X locks rec but not gap waiting
Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
0: len 4; hex 00000004; asc ;; //聚集索引的值
1: len 6; hex 000000009a33; asc 3;; //事务ID
2: len 7; hex 02000001471399; asc G ;; //undo 记录
3: len 2; hex 6464; asc dd;; //非主键字段的值
通过以上信息可以得出事务39474执行delete语句时,锁等待发生在申请ID=4这条记录上的X锁“lock_mode X locks rec but not gap waiting”
- 事务39475持有锁的信息
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 114 page no 4 n bits 80 index PRIMARY of table `dhy`.`t1` trx id 39475 lock_mode X locks rec but not gap
Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
0: len 4; hex 00000004; asc ;;
1: len 6; hex 000000009a33; asc 3;;
2: len 7; hex 02000001471399; asc G ;;
3: len 2; hex 6464; asc dd;;
Record lock, heap no 6 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
0: len 4; hex 00000005; asc ;;
1: len 6; hex 000000009a33; asc 3;;
2: len 7; hex 02000001471375; asc G u;;
3: len 2; hex 6565; asc ee;;
Record lock, heap no 7 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
0: len 4; hex 00000006; asc ;;
1: len 6; hex 000000009a33; asc 3;;
2: len 7; hex 02000001471351; asc G Q;;
3: len 2; hex 6666; asc ff;;
事务39475持有在ID=4,5,6上X锁
- 事务39475同样在执行delete语句时发生了所等待
*** (2) TRANSACTION:
TRANSACTION 39475, ACTIVE 46 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1200, 4 row lock(s), undo log entries 3
MySQL thread id 10, OS thread handle 123145526104064, query id 78 localhost root updating
DELETE FROM t1 WHERE id = 3
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 114 page no 4 n bits 80 index PRIMARY of table `dhy`.`t1` trx id 39475 lock_mode X locks rec but not gap waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 32
0: len 4; hex 00000003; asc ;;
1: len 6; hex 000000009a32; asc 2;;
2: len 7; hex 01000001462e1f; asc F. ;;
3: len 2; hex 6363; asc cc;;
申请ID=3上的X锁时发生了所等待,执行的语句是:DELETE FROM t1 WHERE id = 3,那么可以得出39474在id=3上持有了X锁,但是在死锁日志中并没有显示出事务39474持有锁的信息 那么这两个事务加锁的顺序应是: 1. 事务39474持有了id=3上的X锁 2. 事务39475持有了id=4上的X锁 3. 事务39474申请id=4上X锁时发生了锁等待执行的语句是:DELETE FROM t1 WHERE id = 4 4. 事务39475申请id=3上X锁时触发了死锁,因为此时双方都在申请对方持有的锁不能进行下去了。
- 事务2被回滚
*** WE ROLL BACK TRANSACTION (2)
- 事务39475持有ID = 4, 5, 6上的X锁是由哪个语句引起的,无法直观从死锁日志里看出。可以通过打开general日志或者binlog或者业务代码来查看整个事务逻辑
实验步骤及表结构
搭建可按实验步骤自己模拟测试,表结构及数据如下:
CREATE TABLE t1 (id int unsigned NOT NULL PRIMARY KEY, c1
varchar(10));
INSERT INTO t1 VALUES (1, 'aa'), (2, 'bb'), (3, 'cc'), (4, 'dd'), (5, 'ee'), (6, 'ff');
操作步骤如表所示:
总结
这个案例根本原因是两个会话同时删除数据时,没有控制好删除的顺序造成了死锁,这就需要我们在做应用开发时对数据库操作一定要注意操作数据的前后关系、是否有数据依赖、会话之间是否会操作相同的数据。 通过这个案例我们也了解到了应如何去阅读和分析死锁日志。