MYSQL_GTIDS_SLAVE服务器报错1236
故障原因:
是主服务器数据丢失使用xtrbackup备份恢复后导致mysql的slave服务器gitd状态报错
1.slave服务器
老规矩看报错信息:
Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been' 这里可以反应出是主服务器的gtid小于slave服务器导致不可用(也就是可能是主服务器事物和从服务器的事物不一致,很有可能我恢复时覆盖了binlog日志导致)
解决思路:
主从复制原理说人话就是两个线程,一个sql线程和一个binlogdump线程两者之间传输复制日志,其中binlogdump线程负责读取master的binlog日志,而io线程负责读取binlog线程的日志而写入slave服务器的中继日志,再通过slave自己的sql线程写入到或者落盘到slave服务器。通过原理不难发现既然报错说的是:slave服务器的gtid大于master的gtid事物id,就已经能够判定不存在123层原因(最近123层设备有点拉),且slave的sql线程挂了,就说明在读取relay日志写入到mysql时出错,重点排查slave读取master的gtid位置
解决步骤
排查已经处理gtid事物
1.slave服务器
mysql> show global variables like '%gtid%';
+----------------------------------+---------------------------------------------+
| Variable_name | Value |
+----------------------------------+---------------------------------------------+
| binlog_gtid_simple_recovery | ON |
| enforce_gtid_consistency | ON |
| gtid_executed | b3c0ef51-0fab-11ee-8901-0050563ef6ab:5681-2 | #这里
| gtid_executed_compression_period | 1000 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
| session_track_gtids | OFF |
+----------------------------------+---------------------------------------------+
2.master服务器
mysql> show global variables like '%gtid%';
+----------------------------------+------------------------------------------+
| Variable_name | Value |
+----------------------------------+------------------------------------------+
| binlog_gtid_simple_recovery | ON |
| enforce_gtid_consistency | ON |
| gtid_executed | b3c0ef51-0fab-11ee-8901-0050563ef6ab:1-2 | 这里
| gtid_executed_compression_period | 1000 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
| session_track_gtids | OFF |
+----------------------------------+------------------------------------------+
好家伙差这么多
slave重置gtid处理值
mysql> STOP slave;停止主从
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> RESET SLAVE ALL;从新设置主从
Query OK, 0 rows affected (0.00 sec)
mysql> RESET MASTER;重新设置以复制状态集
Query OK, 0 rows affected (0.02 sec)
mysql> change master to master_host='10.0.0.161',master_user='用户',master_password='密码不',master_port=3306,master_auto_position=1;从新建立gtid
Query OK, 0 rows affected, 2 warnings (0.01 sec)
mysql> START SLAVE;启动复制
Query OK, 0 rows affected (0.01 sec)
mysql> SHOW SLAVE STATUS\G;
......................
Slave_IO_Running: Yes 两个大yes 代表ok
Slave_SQL_Running: Yes 两个大yes 代表ok
.................
期间问题:
这个可能会导致丢事物,取决于你恢复时怎么恢复,这相当于重建gtid
dame,为了保证数据统一我吧主服务器的数据导了一遍给slave以防数据不统一,这不是最完美办法,如果不导也可以选择基于binlog日志保持数据一致,发生事故时,数据没有变化太多加上这个库的数据不太重要,一些监控数据。
吃饭啥也不是:今天中午吃啥子,罗罗罗飞鱼