MySQL · 引擎特性 · 手动分析InnoDB B+Tree结构
说明
本文用查找一条数据库表记录的例子来分析InnoDB B+Tree的结构
先用sysbench插入一千万条记录:
sysbench --db-driver=mysql --mysql-user=username --mysql-password=password --mysql-db=sbtest \
--table_size=10000000 --tables=1 oltp_read_write --mysql-host=127.0.0.1 prepare
生成的sbtest1.ibd大小为2.3G,用hexdump来查看内容
随意选一条记录,比如id=3905000,通过手动找到这条记录来分析B+Tree的结构。
mysql> select * from sbtest1 where id=3905000 \G
*************************** 1. row ***************************
id: 3905000
k: 7152454
c: 33061989913-01978996152-96897051302-66804054532-36658200903-75265952777-90162670547-62113775555-84037309450-68725639441
pad: 93314475890-72810819110-74153294523-75348581725-15287112137
1 row in set (0.00 sec)
Page format
详细请查看: https://dev.mysql.com/doc/internals/en/innodb-page-overview.html
Name Size(bytes) File Header 38 Page Header 56 Infimum + Supremum Records 16 User Records 不定 Free Space 不定 Page Directory 不定 Fil Trailer 8查看B+Tree root page
page 3是root page,先用hexdump查看file header
page 3的offset是3161024 = 49152,file header size是38 bytes
$hexdump -s 49152 -n 38 -C sbtest1.ibd
0000c000 58 0f 5c bd 00 00 00 03 ff ff ff ff ff ff ff ff |X.\.............|
0000c010 00 00 00 00 99 4a f4 5a 45 bf 00 00 00 00 00 00 |.....J.ZE.......|
0000c020 00 00 00 00 00 20 |..... |
page type是0x45bf,表示B+Tree节点
再看page header,offset是3161024 + 38 = 49190,page header size是56 bytes
$hexdump -s 49190 -n 56 sbtest1.ibd
0000c026 00 1d 06 4f 80 75 00 00 00 00 06 47 00 02 00 72 |...O.u.....G...r|
0000c036 00 73 00 00 00 00 00 00 00 00 00 02 00 00 00 00 |.s..............|
0000c046 00 00 00 38 00 00 00 20 00 00 00 02 00 f2 00 00 |...8... ........|
0000c056 00 20 00 00 00 02 00 32 |. .....2|
page directory slots数目为29(0x001d)
page level为2(0x0002),这个是根节点。叶子节点level总是为0,所以这棵B+Tree深度为3。
root page directory
每个slot占2 bytes,29个slot共58 bytes。 所以root page directory offset为 3(161024) - 58 - 8(trailer) = 65470
$hexdump -s 65470 -n 58 -C sbtest1.ibd
0000ffbe 00 70 05 ec 05 b8 05 84 05 50 05 1c 04 e8 04 b4 |.p.......P......|
0000ffce 04 80 04 4c 04 18 03 e4 03 b0 03 7c 03 48 03 14 |...L.......|.H..|
0000ffde 02 e0 02 ac 02 78 02 44 02 10 01 dc 01 a8 01 74 |.....x.D.......t|
0000ffee 01 40 01 0c 00 d8 00 a4 00 63 |.@.......c|
page directory按primary key逆序排列,0x0063是infimum record offset,0x0070是supremum record offset
$hexdump -s 49246 -n 12 -C sbtest1.ibd
0000c05e 01 00 02 00 1a 69 6e 66 69 6d 75 6d |.....infimum|
“01 00 02 00 1a” 是record header,最后1个byte 0x1a表示next record offset
3161024+0x63+0x1a=49277,找到第1条记录
$hexdump -s 49277 -n 8 sbtest1.ibd
0000c07d 80 00 00 01 00 00 00 24 |.......$|
slot
offset
record(hex)
primary key
page no
0
0x0063
69 6e 66 69 6d 75 6d 00
1
36
1
0x00a4
80 03 59 53 00 00 00 27
219475
39
2
0x00d8
80 08 b5 7f 00 00 00 2b
570751
43
3
0x010c
80 0e 11 ab 00 00 00 2f
922027
47
4
0x0140
80 13 6d d7 00 00 00 33
1273303
51
5
0x0174
80 18 ca 03 00 00 00 37
1624579
55
6
0x01a8
80 1e 26 2f 00 00 00 3b
1975855
59
7
0x01dc
80 23 82 5b 00 00 00 3f
2327131
63
8
0x0210
80 28 de 87 00 00 90 00
2678407
36864
9
0x0244
80 2e 3a b3 00 00 90 04
3029683
36868
10
0x0278
80 33 96 df 00 00 90 08
3380959
36872
11
0x02ac
80 38 f3 0b 00 00 90 0c
3732235
36876
12
0x02e0
80 3e 4f 37 00 00 90 10
4083511
36880
13
0x0314
80 43 ab 63 00 00 90 14
4434787
36884
14
0x0348
80 49 07 8f 00 00 90 18
4786063
36888
15
0x037c
80 4e 63 bb 00 00 90 1c
5137339
36892
16
0x03b0
80 53 bf e7 00 00 90 20
5488615
36896
17
0x03e4
80 59 1c 13 00 00 90 24
5839891
36900
18
0x0418
80 5e 78 3f 00 00 90 28
6191167
36904
19
0x044c
80 63 d4 6b 00 00 90 2c
6542443
36908
20
0x0480
80 69 30 97 00 00 90 30
6893719
36912
21
0x04b4
80 6e 8c c3 00 00 90 34
7244995
36916
22
0x04e8
80 73 e8 ef 00 00 90 38
7596271
36920
23
0x051c
80 79 45 1b 00 00 90 3c
7947547
36924
24
0x0550
80 7e a1 47 00 01 be 00
8298823
114176
25
0x0584
80 83 fd 73 00 01 be 04
8650099
114180
26
0x05b8
80 89 59 9f 00 01 be 08
9001375
114184
27
0x05ec
80 8e b5 cb 00 01 be 0c
9352651
114188
28
0x0070
73 75 70 72 65 6d 75 6d
我们要找的记录primary key为3905000,在以上29个slots中做binary search,发现可能在slot 11和slot 12之间 slot 11的primary key为3732235,并不是我们要找的3905000,需要继续在slot 11的record list中查找。
查看page 0 slot 11的record list
record header共5个bytes,第5个byte记录了next record的相对于current record的offset, 即next record offset = current record offset + current record header的第5个byte 非叶子节点record body为8 bytes
offset record(hex) primary key page no 684 80 38 f3 0b 00 00 90 0c 3732235 36876 697 80 3a 4a 16 00 00 90 0d 3820054 36877 710 80 3b a1 21 00 00 90 0e 3907873 36878 723 80 3c f8 2c 00 00 90 0f 3995692 36879 736 80 3e 4f 37 00 00 90 10 4083511 36880 … … … … 1529 80 90 0c d6 00 01 be 0d 9440470 114189 1542 80 91 63 e1 00 01 be 0e 9528289 114190 1555 80 92 ba ec 00 01 be 0f 9616108 114191 1568 80 94 11 f7 00 01 be 10 9703927 114192 1581 80 95 69 02 00 01 be 11 9791746 114193 1594 80 96 c0 0d 00 01 be 12 9879565 114194 1607 80 98 17 18 00 01 be 13 9967384 114195到此发现我们要找的记录(primary key 3905000)可能在offset 697所指向的page 36877
查看page 36877
先看file header, offset = 36877 * (16*1024) = 604192768, size为38 bytes
$hexdump -s 604192768 -n 38 -C sbtest1.ibd
24034000 d1 d6 61 2a 00 00 90 0d 00 00 90 0c 00 00 90 0e |..a*............|
24034010 00 00 00 00 3b ff 8b 03 45 bf 00 00 00 00 00 00 |....;...E.......|
24034020 00 00 00 00 00 20 |..... |
page type是0x45bf,确认了是B+Tree节点
再看page header, offset = 36877 * (16*1024) + 38 = 604192806, size为56 bytes
$hexdump -s 604192806 -n 56 -C sbtest1.ibd
24034026 01 2d 3d 8f 84 b5 00 00 00 00 3d 87 00 02 04 b2 |.-=.......=.....|
24034036 04 b3 00 00 00 00 00 00 00 00 00 01 00 00 00 00 |................|
24034046 00 00 00 38 00 00 00 00 00 00 00 00 00 00 00 00 |...8............|
24034056 00 00 00 00 00 00 00 00 |........|
page directory slots数目为301(0x012d)
page level为1(0x0001),还不是叶子节点。
再看page diretory 每个slot占2 bytes,301个slot共602 bytes。 所以page directory offset为 36877(161024) - 602 - 8(trailer) = 604208542
$hexdump -s 604208542 -n 602 -C sbtest1.ibd
24037d9e 00 70 3d 2c 3c f8 3c c4 3c 90 3c 5c 3c 28 3b f4 |.p=,<.<.<.<\<(;.|
24037dae 3b c0 3b 8c 3b 58 3b 24 3a f0 3a bc 3a 88 3a 54 |;.;.;X;$:.:.:.:T|
24037dbe 3a 20 39 ec 39 b8 39 84 39 50 39 1c 38 e8 38 b4 |: 9.9.9.9P9.8.8.|
24037dce 38 80 38 4c 38 18 37 e4 37 b0 37 7c 37 48 37 14 |8.8L8.7.7.7|7H7.|
24037dde 36 e0 36 ac 36 78 36 44 36 10 35 dc 35 a8 35 74 |6.6.6x6D6.5.5.5t|
24037dee 35 40 35 0c 34 d8 34 a4 34 70 34 3c 34 08 33 d4 |5@5.4.4.4p4<4.3.|
24037dfe 33 a0 33 6c 33 38 33 04 32 d0 32 9c 32 68 32 34 |3.3l383.2.2.2h24|
24037e0e 32 00 31 cc 31 98 31 64 31 30 30 fc 30 c8 30 94 |2.1.1.1d100.0.0.|
24037e1e 30 60 30 2c 2f f8 2f c4 2f 90 2f 5c 2f 28 2e f4 |0`0,/./././\/(..|
24037e2e 2e c0 2e 8c 2e 58 2e 24 2d f0 2d bc 2d 88 2d 54 |.....X.$-.-.-.-T|
24037e3e 2d 20 2c ec 2c b8 2c 84 2c 50 2c 1c 2b e8 2b b4 |- ,.,.,.,P,.+.+.|
24037e4e 2b 80 2b 4c 2b 18 2a e4 2a b0 2a 7c 2a 48 2a 14 |+.+L+.*.*.*|*H*.|
24037e5e 29 e0 29 ac 29 78 29 44 29 10 28 dc 28 a8 28 74 |).).)x)D).(.(.(t|
24037e6e 28 40 28 0c 27 d8 27 a4 27 70 27 3c 27 08 26 d4 |(@(.'.'.'p'<'.&.|
24037e7e 26 a0 26 6c 26 38 26 04 25 d0 25 9c 25 68 25 34 |&.&l&8&.%.%.%h%4|
24037e8e 25 00 24 cc 24 98 24 64 24 30 23 fc 23 c8 23 94 |%.$.$.$d$0#.#.#.|
24037e9e 23 60 23 2c 22 f8 22 c4 22 90 22 5c 22 28 21 f4 |#`#,"."."."\"(!.|
24037eae 21 c0 21 8c 21 58 21 24 20 f0 20 bc 20 88 20 54 |!.!.!X!$ . . . T|
24037ebe 20 20 1f ec 1f b8 1f 84 1f 50 1f 1c 1e e8 1e b4 | .......P......|
24037ece 1e 80 1e 4c 1e 18 1d e4 1d b0 1d 7c 1d 48 1d 14 |...L.......|.H..|
24037ede 1c e0 1c ac 1c 78 1c 44 1c 10 1b dc 1b a8 1b 74 |.....x.D.......t|
24037eee 1b 40 1b 0c 1a d8 1a a4 1a 70 1a 3c 1a 08 19 d4 |.@.......p.<....|
24037efe 19 a0 19 6c 19 38 19 04 18 d0 18 9c 18 68 18 34 |...l.8.......h.4|
24037f0e 18 00 17 cc 17 98 17 64 17 30 16 fc 16 c8 16 94 |.......d.0......|
24037f1e 16 60 16 2c 15 f8 15 c4 15 90 15 5c 15 28 14 f4 |.`.,.......\.(..|
24037f2e 14 c0 14 8c 14 58 14 24 13 f0 13 bc 13 88 13 54 |.....X.$.......T|
24037f3e 13 20 12 ec 12 b8 12 84 12 50 12 1c 11 e8 11 b4 |. .......P......|
24037f4e 11 80 11 4c 11 18 10 e4 10 b0 10 7c 10 48 10 14 |...L.......|.H..|
24037f5e 0f e0 0f ac 0f 78 0f 44 0f 10 0e dc 0e a8 0e 74 |.....x.D.......t|
24037f6e 0e 40 0e 0c 0d d8 0d a4 0d 70 0d 3c 0d 08 0c d4 |.@.......p.<....|
24037f7e 0c a0 0c 6c 0c 38 0c 04 0b d0 0b 9c 0b 68 0b 34 |...l.8.......h.4|
24037f8e 0b 00 0a cc 0a 98 0a 64 0a 30 09 fc 09 c8 09 94 |.......d.0......|
24037f9e 09 60 09 2c 08 f8 08 c4 08 90 08 5c 08 28 07 f4 |.`.,.......\.(..|
24037fae 07 c0 07 8c 07 58 07 24 06 f0 06 bc 06 88 06 54 |.....X.$.......T|
24037fbe 06 20 05 ec 05 b8 05 84 05 50 05 1c 04 e8 04 b4 |. .......P......|
24037fce 04 80 04 4c 04 18 03 e4 03 b0 03 7c 03 48 03 14 |...L.......|.H..|
24037fde 02 e0 02 ac 02 78 02 44 02 10 01 dc 01 a8 01 74 |.....x.D.......t|
24037fee 01 40 01 0c 00 d8 00 a4 00 63 |.@.......c|
解析page directory
slot offset record(hex) primary key page no 1 0x00a4 80 3a 4a f1 00 00 cd 8d 3820273 52621 2 0x00d8 80 3a 4c 15 00 00 cd 91 3820565 52625 3 0x010c 80 3a 4d 39 00 00 cd 95 3820857 52629 4 0x0140 80 3a 4e 5d 00 00 cd 99 3821149 52633 5 0x0174 80 3a 4f 81 00 00 cd 9d 3821441 52637 6 0x01a8 80 3a 50 a5 00 00 cd a1 3821733 52641 7 0x01dc 80 3a 51 c9 00 00 cd a5 3822025 52645 8 0x0210 80 3a 52 ed 00 00 cd a9 3822317 52649 9 0x0244 80 3a 54 11 00 00 cd ad 3822609 52653 10 0x0278 80 3a 55 35 00 00 cd b1 3822901 52657 … … … … … 288 0x3af0 80 3b 92 4d 00 00 d2 09 3904077 53769 289 0x3b24 80 3b 93 71 00 00 d2 0d 3904369 53773 290 0x3b58 80 3b 94 95 00 00 d2 11 3904661 53777 291 0x3b8c 80 3b 95 b9 00 00 d2 15 3904953 53781 292 0x3bc0 80 3b 96 dd 00 00 d2 19 3905245 53785 293 0x3bf4 80 3b 98 01 00 00 d2 1d 3905537 53789 294 0x3c28 80 3b 99 25 00 00 d2 21 3905829 53793 295 0x3c5c 80 3b 9a 49 00 00 d2 25 3906121 53797 296 0x3c90 80 3b 9b 6d 00 00 d2 29 3906413 53801 297 0x3cc4 80 3b 9c 91 00 00 d2 2d 3906705 53805 298 0x3cf8 80 3b 9d b5 00 00 d2 31 3906997 53809 299 0x3d2c 80 3b 9e d9 00 00 d2 35 3907289 53813在以上slots中做binary search,发现我们要找的记录(primary key 3905000)可能在slot 291和slot 292之间 继续查看slot 291的第1个record所指向的page 53781
查看page 53781
先看file header, offset = 53781 * (16*1024) = 881147904, size为38 bytes
$hexdump -s 881147904 -n 38 -C sbtest1.ibd
34854000 09 d0 e4 a7 00 00 d2 15 00 00 d2 14 00 00 d2 16 |................|
34854010 00 00 00 00 3b f4 ac 4b 45 bf 00 00 00 00 00 00 |....;..KE.......|
34854020 00 00 00 00 00 20 |..... |
page type是0x45bf,确认了是B+Tree节点
再看page header, offset = 53781 * (16*1024) + 38 = 881147942, size为56 bytes
$hexdump -s 881147942 -n 56 -C sbtest1.ibd
34854026 00 13 3b c8 80 4b 00 00 00 00 3a ff 00 02 00 48 |..;..K....:....H|
34854036 00 49 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.I..............|
34854046 00 00 00 38 00 00 00 00 00 00 00 00 00 00 00 00 |...8............|
34854056 00 00 00 00 00 00 00 00 |........|
page directory slots数目为19(0x0013)
page level为0(0x0000),这是叶子节点了,是真正存数据的page
再看page diretory 每个slot占2 bytes,19个slot共38 bytes。 所以page directory offset为 53781(161024) - 38 - 8(trailer) = 881164242
$hexdump -s 881164242 -n 38 -C sbtest1.ibd
34857fd2 00 70 36 ef 33 af 30 6f 2d 2f 29 ef 26 af 23 6f |.p6.3.0o-/).&.#o|
34857fe2 20 2f 1c ef 19 af 16 6f 13 2f 0f ef 0c af 09 6f | /.....o./.....o|
34857ff2 06 2f 02 ef 00 63 |./...c|
解析page directory
slot | offset | record(hex) | primary key | page no
----- | ------ | ------ | -------- | --------
1 | 0x02ef | 80 3b 95 bc 00 00 00 00 | 3904956 | 0
2 | 0x062f | 80 3b 95 c0 00 00 00 00 | 3904960 | 0
3 | 0x096f | 80 3b 95 c4 00 00 00 00 | 3904964 | 0
4 | 0x0caf | 80 3b 95 c8 00 00 00 00 | 3904968 | 0
5 | 0x0fef | 80 3b 95 cc 00 00 00 00 | 3904972 | 0
6 | 0x132f | 80 3b 95 d0 00 00 00 00 | 3904976 | 0
7 | 0x166f | 80 3b 95 d4 00 00 00 00 | 3904980 | 0
8 | 0x19af | 80 3b 95 d8 00 00 00 00 | 3904984 | 0
9 | 0x1cef | 80 3b 95 dc 00 00 00 00 | 3904988 | 0
10 | 0x202f | 80 3b 95 e0 00 00 00 00 | 3904992 | 0
11 | 0x236f | 80 3b 95 e4 00 00 00 00 | 3904996 | 0
12 | 0x26af | 80 3b 95 e8 00 00 00 00 | 3905000 | 0
13 | 0x29ef | 80 3b 95 ec 00 00 00 00 | 3905004 | 0
14 | 0x2d2f | 80 3b 95 f0 00 00 00 00 | 3905008 | 0
15 | 0x306f | 80 3b 95 f4 00 00 00 00 | 3905012 | 0
16 | 0x33af | 80 3b 95 f8 00 00 00 00 | 3905016 | 0
17 | 0x36ef | 80 3b 95 fc 00 00 00 00 | 3905020 | 0
OK,在slot 12找到了primary key 3905000
查看primary key 3905000的记录
mysql> select * from sbtest1 where id=3905000 \G
*************************** 1. row ***************************
id: 3905000
k: 7152454
c: 33061989913-01978996152-96897051302-66804054532-36658200903-75265952777-90162670547-62113775555-84037309450-68725639441
pad: 93314475890-72810819110-74153294523-75348581725-15287112137
1 row in set (0.00 sec)
打印256 bytes验证下是不是我们要找的记录 offset为 53781(161024)+0x26af=881157807 id = 3905000 = 0x3b95e8 k = 7152454 = 0x6d2346 剩下的是c和pad的字符串,内容是对的。
hexdump -s 881157807 -n 256 -C /home/ming.lin/one_key_env_57/run_primary/dbs/testdb/sbtest1.ibd
348566af 80 3b 95 e8 00 00 00 00 0e d3 cb 00 00 00 0f 24 |.;.............$|
348566bf ec 80 6d 23 46 33 33 30 36 31 39 38 39 39 31 33 |..m#F33061989913|
348566cf 2d 30 31 39 37 38 39 39 36 31 35 32 2d 39 36 38 |-01978996152-968|
348566df 39 37 30 35 31 33 30 32 2d 36 36 38 30 34 30 35 |97051302-6680405|
348566ef 34 35 33 32 2d 33 36 36 35 38 32 30 30 39 30 33 |4532-36658200903|
348566ff 2d 37 35 32 36 35 39 35 32 37 37 37 2d 39 30 31 |-75265952777-901|
3485670f 36 32 36 37 30 35 34 37 2d 36 32 31 31 33 37 37 |62670547-6211377|
3485671f 35 35 35 35 2d 38 34 30 33 37 33 30 39 34 35 30 |5555-84037309450|
3485672f 2d 36 38 37 32 35 36 33 39 34 34 31 20 39 33 33 |-68725639441 933|
3485673f 31 34 34 37 35 38 39 30 2d 37 32 38 31 30 38 31 |14475890-7281081|
3485674f 39 31 31 30 2d 37 34 31 35 33 32 39 34 35 32 33 |9110-74153294523|
3485675f 2d 37 35 33 34 38 35 38 31 37 32 35 2d 31 35 32 |-75348581725-152|
3485676f 38 37 31 31 32 31 33 37 20 3c 78 00 01 90 00 d0 |87112137 <x.....|
3485677f 80 3b 95 e9 00 00 00 00 0e d3 cb 00 00 00 0f 24 |.;.............$|
3485678f f9 80 57 82 ca 37 38 33 32 36 37 32 37 31 32 39 |..W..78326727129|
3485679f 2d 32 30 30 39 36 39 37 37 37 38 30 2d 38 34 31 |-20096977780-841
至此我们手动找到了primary key为3905000的记录。
小结
page directory很关键,先在page directory中做binary search,定位到某个slot,再遍历slot上的record list
文章来源:
Author:数据库内核月报
link:http://10.101.233.47:4000/monthly/2020/04/06/