How to perform Switchover/Failover in MySql InnoDB ReplicaSet
In the previous blog, we saw Part I: How to Configure InnoDB ReplicaSet MySQL Click here to read more. We will use the same configuration to perform Switchover & Failover in InnoDB ReplicaSet.
Below are the setup information.
Hostname | IP | Role | Server_ID | Port | Installation Link |
test-machine01 | 192.168.114.177 | Master | 1 | 3306 | Click Here |
test-machine01 | 192.168.114.177 | Replica-1 | 3 | 3307 | Click Here |
test-machine02 | 192.168.114.176 | Replica-2 | 2 | 3306 | Click Here |
test-machine02 | 192.168.114.176 | Replica-3 | 4 | 3307 |
We will use two function Replicaset.setPrimaryInstance for planned Switchover and Replicaset.forcePrimaryInstance for failover in case of the master is inaccessible.
Step 1. Login to mysqlshell: Login to mysqlshell using clusteradmin user and check the status of InnoDB ReplicaSet.
[root@test-machine01 ~]# mysqlsh
MySQL Shell 8.0.23
Copyright (c) 2016, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.
Type '\help' or '\?' for help; '\quit' to exit.
MySQL JS >
MySQL JS > \connect rsadmin@test-machine01:3306
Creating a session to 'rsadmin@test-machine01:3306'
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 120
Server version: 8.0.22 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS > var rs =dba.getReplicaSet()
You are connected to a member of replicaset 'TestReplicaSet'.
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS > rs.status()
{
"replicaSet": {
"name": "TestReplicaSet",
"primary": "test-machine01:3306",
"status": "AVAILABLE",
"statusText": "All instances available.",
"topology": {
"test-machine01:3306": {
"address": "test-machine01:3306",
"instanceRole": "PRIMARY",
"mode": "R/W",
"status": "ONLINE"
},
"test-machine01:3307": {
"address": "test-machine01:3307",
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"replicationLag": null
},
"status": "ONLINE"
},
"test-machine02:3306": {
"address": "test-machine02:3306",
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"replicationLag": null
},
"status": "ONLINE"
},
"test-machine02:3307": {
"address": "test-machine02:3307",
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"replicationLag": null
},
"status": "ONLINE"
}
},
"type": "ASYNC"
}
}
MySQL test-machine01:3306 ssl JS >
Step 2. Perform DryRun: Perform dryrun of switchover process to new master ‘test-machine02:3306’ to check for any potential issues before performing actual switchover. You can use \help Replicaset.setPrimaryInstance to get full commands options.
MySQL test-machine01:3306 ssl JS > \help Replicaset.setPrimaryInstance
NAME
setPrimaryInstance - Performs a safe PRIMARY switchover, promoting the
given instance.
SYNTAX
<ReplicaSet>.setPrimaryInstance(instance, options)
MySQL test-machine01:3306 ssl JS > rs.setPrimaryInstance('test-machine02:3306',{dryRun: true})
test-machine02:3306 will be promoted to PRIMARY of 'TestReplicaSet'.
The current PRIMARY is test-machine01:3306.
* Connecting to replicaset instances
** Connecting to test-machine01:3306
** Connecting to test-machine01:3307
** Connecting to test-machine02:3306
** Connecting to test-machine02:3307
** Connecting to test-machine01:3306
** Connecting to test-machine01:3307
** Connecting to test-machine02:3306
** Connecting to test-machine02:3307
* Performing validation checks
** Checking async replication topology...
** Checking transaction state of the instance...
* Synchronizing transaction backlog at test-machine02:3306
* Updating metadata
* Acquiring locks in replicaset instances
** Pre-synchronizing SECONDARIES
** Acquiring global lock at PRIMARY
** Acquiring global lock at SECONDARIES
* Updating replication topology
** Configuring test-machine01:3306 to replicate from test-machine02:3306
** Changing replication source of test-machine01:3307 to test-machine02:3306
** Changing replication source of test-machine02:3307 to test-machine02:3306
test-machine02:3306 was promoted to PRIMARY.
dryRun finished.
MySQL test-machine01:3306 ssl JS >
Step 3. Actual Switchover: Use the below command to perform switchover. Once completion of switchover, process is done. Use rs.status() function to check for replicaset status. You will notice “primary”: “test-machine02:3306”
MySQL test-machine01:3306 ssl JS > rs.setPrimaryInstance('test-machine02:3306');
test-machine02:3306 will be promoted to PRIMARY of 'TestReplicaSet'.
The current PRIMARY is test-machine01:3306.
* Connecting to replicaset instances
** Connecting to test-machine01:3306
** Connecting to test-machine01:3307
** Connecting to test-machine02:3306
** Connecting to test-machine02:3307
** Connecting to test-machine01:3306
** Connecting to test-machine01:3307
** Connecting to test-machine02:3306
** Connecting to test-machine02:3307
* Performing validation checks
** Checking async replication topology...
** Checking transaction state of the instance...
* Synchronizing transaction backlog at test-machine02:3306
* Updating metadata
* Acquiring locks in replicaset instances
** Pre-synchronizing SECONDARIES
** Acquiring global lock at PRIMARY
** Acquiring global lock at SECONDARIES
* Updating replication topology
** Configuring test-machine01:3306 to replicate from test-machine02:3306
** Changing replication source of test-machine01:3307 to test-machine02:3306
** Changing replication source of test-machine02:3307 to test-machine02:3306
test-machine02:3306 was promoted to PRIMARY.
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS > rs.status();
{
"replicaSet": {
"name": "TestReplicaSet",
"primary": "test-machine02:3306",
"status": "AVAILABLE",
"statusText": "All instances available.",
"topology": {
"test-machine01:3306": {
"address": "test-machine01:3306",
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"replicationLag": null
},
"status": "ONLINE"
},
"test-machine01:3307": {
"address": "test-machine01:3307",
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"replicationLag": null
},
"status": "ONLINE"
},
"test-machine02:3306": {
"address": "test-machine02:3306",
"instanceRole": "PRIMARY",
"mode": "R/W",
"status": "ONLINE"
},
"test-machine02:3307": {
"address": "test-machine02:3307",
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"replicationLag": null
},
"status": "ONLINE"
}
},
"type": "ASYNC"
}
}
MySQL test-machine01:3306 ssl JS >
Step 4: Let simulate Failover: Let stop mysql instance : test-machine02:3306 which is new master after switchover to simulate failover scenarios. Once mysql instance : test-machine02:3306 is stopped check replicaset status using function rs.status(). You will notice “test-machine02:3306” : “status”: “UNAVAILABLE”.
[root@test-machine02 ~]# ps -ef|grep mysqld
mysql 26028 1 0 10:25 ? 00:01:37 /u01/mysql-8/bin/mysqld --defaults-group-suffix=@inst2
mysql 27425 1 0 Mar18 ? 00:15:00 /u01/mysql-8/bin/mysqld --defaults-group-suffix=@inst1
[root@test-machine02 ~]#
[root@test-machine02 ~]# systemctl stop mysqld@inst1
[root@test-machine02 ~]#
MySQL test-machine01:3306 ssl JS > rs.status();
ReplicaSet.status: Failed to execute query on Metadata server test-machine02:3306: Lost connection to MySQL server during query (MySQL Error 2013)
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS > \quit
Bye!
[root@test-machine01 ~]# mysqlsh
MySQL Shell 8.0.23
Copyright (c) 2016, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.
Type '\help' or '\?' for help; '\quit' to exit.
MySQL JS > \connect rsadmin@test-machine01:3306
Creating a session to 'rsadmin@test-machine01:3306'
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 11709
Server version: 8.0.22 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS > dba.getReplicaSet()
You are connected to a member of replicaset 'TestReplicaSet'.
<ReplicaSet:TestReplicaSet>
MySQL test-machine01:3306 ssl JS > var rs =dba.getReplicaSet()
You are connected to a member of replicaset 'TestReplicaSet'.
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS > rs.status();
ERROR: Unable to connect to the PRIMARY of the replicaset TestReplicaSet: MySQL Error 2003: Could not open connection to 'test-machine02:3306': Can't connect to MySQL server on 'test-machine02' (111)
Cluster change operations will not be possible unless the PRIMARY can be reached.
If the PRIMARY is unavailable, you must either repair it or perform a forced failover.
See \help forcePrimaryInstance for more information.
WARNING: MYSQLSH 51118: PRIMARY instance is unavailable
{
"replicaSet": {
"name": "TestReplicaSet",
"primary": "test-machine02:3306",
"status": "UNAVAILABLE",
"statusText": "PRIMARY instance is not available, but there is at least one SECONDARY that could be force-promoted.",
"topology": {
"test-machine01:3306": {
"address": "test-machine01:3306",
"fenced": true,
"instanceErrors": [
"ERROR: Replication I/O thread (receiver) has stopped with an error."
],
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"expectedSource": "test-machine02:3306",
"receiverLastError": "error reconnecting to master 'mysql_innodb_rs_1@test-machine02:3306' - retry-time: 60 retries: 3 message: Can't connect to MySQL server on 'test-machine02' (111)",
"receiverLastErrorNumber": 2003,
"receiverLastErrorTimestamp": "2021-03-22 13:50:56.318987",
"receiverStatus": "ERROR",
"receiverThreadState": "",
"replicationLag": null,
"source": "test-machine02:3306"
},
"status": "ERROR",
"transactionSetConsistencyStatus": null
},
"test-machine01:3307": {
"address": "test-machine01:3307",
"fenced": true,
"instanceErrors": [
"ERROR: Replication I/O thread (receiver) has stopped with an error."
],
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"expectedSource": "test-machine02:3306",
"receiverLastError": "error reconnecting to master 'mysql_innodb_rs_3@test-machine02:3306' - retry-time: 60 retries: 3 message: Can't connect to MySQL server on 'test-machine02' (111)",
"receiverLastErrorNumber": 2003,
"receiverLastErrorTimestamp": "2021-03-22 13:50:56.318986",
"receiverStatus": "ERROR",
"receiverThreadState": "",
"replicationLag": null,
"source": "test-machine02:3306"
},
"status": "ERROR",
"transactionSetConsistencyStatus": null
},
"test-machine02:3306": {
"address": "test-machine02:3306",
"connectError": "Could not open connection to 'test-machine02:3306': Can't connect to MySQL server on 'test-machine02' (111)",
"fenced": null,
"instanceRole": "PRIMARY",
"mode": null,
"status": "UNREACHABLE"
},
"test-machine02:3307": {
"address": "test-machine02:3307",
"fenced": true,
"instanceErrors": [
"ERROR: Replication I/O thread (receiver) has stopped with an error."
],
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"expectedSource": "test-machine02:3306",
"receiverLastError": "error reconnecting to master 'mysql_innodb_rs_4@test-machine02:3306' - retry-time: 60 retries: 3 message: Can't connect to MySQL server on 'test-machine02' (111)",
"receiverLastErrorNumber": 2003,
"receiverLastErrorTimestamp": "2021-03-22 13:50:56.314898",
"receiverStatus": "ERROR",
"receiverThreadState": "",
"replicationLag": null,
"source": "test-machine02:3306"
},
"status": "ERROR",
"transactionSetConsistencyStatus": null
}
},
"type": "ASYNC"
}
}
MySQL test-machine01:3306 ssl JS >
Step 5: Dryrun for Failover: Run function forcePrimaryInstance with dryrun option to ‘test-machine01:3306’ to check for any potential error. You can use command \help Replicaset.forcePrimaryInstance to get full options.
MySQL test-machine01:3306 ssl JS > \help Replicaset.forcePrimaryInstance
NAME
forcePrimaryInstance - Performs a failover in a replicaset with an
unavailable PRIMARY.
SYNTAX
<ReplicaSet>.forcePrimaryInstance(instance, options)
rs.forcePrimaryInstance('test-machine01:3306',{dryRun: true})
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS > rs.forcePrimaryInstance('test-machine01:3306',{dryRun: true})
* Connecting to replicaset instances
** Connecting to test-machine01:3306
** Connecting to test-machine01:3307
** Connecting to test-machine02:3307
* Waiting for all received transactions to be applied
** Waiting for received transactions to be applied at test-machine01:3307
** Waiting for received transactions to be applied at test-machine01:3306
** Waiting for received transactions to be applied at test-machine02:3307
test-machine01:3306 will be promoted to PRIMARY of the replicaset and the former PRIMARY will be invalidated.
* Checking status of last known PRIMARY
NOTE: test-machine02:3306 is UNREACHABLE
* Checking status of promoted instance
NOTE: test-machine01:3306 has status ERROR
* Checking transaction set status
* Promoting test-machine01:3306 to a PRIMARY...
* Updating metadata...
test-machine01:3306 was force-promoted to PRIMARY.
NOTE: Former PRIMARY test-machine02:3306 is now invalidated and must be removed from the replicaset.
* Updating source of remaining SECONDARY instances
** Changing replication source of test-machine01:3307 to test-machine01:3306
** Changing replication source of test-machine02:3307 to test-machine01:3306
Failover finished successfully.
dryRun finished.
MySQL test-machine01:3306 ssl JS >
Step 6. Actual Failover: Use below command to perform actual failover. Once process is finished you will notice “primary”: “test-machine01:3306” and “test-machine02:3306” “status”: “INVALIDATED”
MySQL test-machine01:3306 ssl JS > rs.forcePrimaryInstance('test-machine01:3306')
* Connecting to replicaset instances
** Connecting to test-machine01:3306
** Connecting to test-machine01:3307
** Connecting to test-machine02:3307
* Waiting for all received transactions to be applied
** Waiting for received transactions to be applied at test-machine01:3307
** Waiting for received transactions to be applied at test-machine01:3306
** Waiting for received transactions to be applied at test-machine02:3307
test-machine01:3306 will be promoted to PRIMARY of the replicaset and the former PRIMARY will be invalidated.
* Checking status of last known PRIMARY
NOTE: test-machine02:3306 is UNREACHABLE
* Checking status of promoted instance
NOTE: test-machine01:3306 has status ERROR
* Checking transaction set status
* Promoting test-machine01:3306 to a PRIMARY...
* Updating metadata...
test-machine01:3306 was force-promoted to PRIMARY.
NOTE: Former PRIMARY test-machine02:3306 is now invalidated and must be removed from the replicaset.
* Updating source of remaining SECONDARY instances
** Changing replication source of test-machine01:3307 to test-machine01:3306
** Changing replication source of test-machine02:3307 to test-machine01:3306
Failover finished successfully.
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS > rs.status()
{
"replicaSet": {
"name": "TestReplicaSet",
"primary": "test-machine01:3306",
"status": "AVAILABLE_PARTIAL",
"statusText": "The PRIMARY instance is available, but one or more SECONDARY instances are not.",
"topology": {
"test-machine01:3306": {
"address": "test-machine01:3306",
"instanceRole": "PRIMARY",
"mode": "R/W",
"status": "ONLINE"
},
"test-machine01:3307": {
"address": "test-machine01:3307",
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"replicationLag": null
},
"status": "ONLINE"
},
"test-machine02:3306": {
"address": "test-machine02:3306",
"connectError": "Could not open connection to 'test-machine02:3306': Can't connect to MySQL server on 'test-machine02' (111)",
"fenced": null,
"instanceRole": null,
"mode": null,
"status": "INVALIDATED"
},
"test-machine02:3307": {
"address": "test-machine02:3307",
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"replicationLag": null
},
"status": "ONLINE"
}
},
"type": "ASYNC"
}
}
MySQL test-machine01:3306 ssl JS >
Step 7. Lets remove “test-machine02:3306”: Remove mysql instance “test-machine02:3306” from topology using function rs.removeInstance.
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS > rs.removeInstance('test-machine02:3306')
ERROR: Unable to connect to the target instance test-machine02:3306. Please make sure the instance is available and try again. If the instance is permanently not reachable, use the 'force' option to remove it from the replicaset metadata and skip reconfiguration of that instance.
ReplicaSet.removeInstance: Could not open connection to 'test-machine02:3306': Can't connect to MySQL server on 'test-machine02' (111) (MySQL Error 2003)
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS > rs.removeInstance('test-machine02:3306',{force:true})
NOTE: Unable to connect to the target instance test-machine02:3306. The instance will only be removed from the metadata, but its replication configuration cannot be updated. Please, take any necessary actions to make sure that the instance will not replicate from the replicaset if brought back online.
NOTE: test-machine02:3306 is invalidated, replication sync will be skipped.
Metadata for instance 'test-machine02:3306' was deleted, but instance configuration could not be updated.
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS > rs.status()
{
"replicaSet": {
"name": "TestReplicaSet",
"primary": "test-machine01:3306",
"status": "AVAILABLE",
"statusText": "All instances available.",
"topology": {
"test-machine01:3306": {
"address": "test-machine01:3306",
"instanceRole": "PRIMARY",
"mode": "R/W",
"status": "ONLINE"
},
"test-machine01:3307": {
"address": "test-machine01:3307",
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"replicationLag": null
},
"status": "ONLINE"
},
"test-machine02:3307": {
"address": "test-machine02:3307",
"instanceRole": "SECONDARY",
"mode": "R/O",
"replication": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Slave has read all relay log; waiting for more updates",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"replicationLag": null
},
"status": "ONLINE"
}
},
"type": "ASYNC"
}
}
MySQL test-machine01:3306 ssl JS >
MySQL test-machine01:3306 ssl JS >
Reference : MySQL InnoDB ReplicaSet : https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-innodb-replicaset.html
Reference : MySQL Database Architectures : https://www.youtube.com/watch?v=NYrcHMdwrpo
Part I : How to Configure InnoDB ReplicaSet Click here to read more.
Part III : How To Bootstrap MySQL Router For InnoDB ReplicaSet Click here to read more
This document is only for learning purpose and always validate in the LAB environment first before applying in the LIVE environment.
Hope so you like this article!
Please share your valuable feedback/comments/subscribe and follow us below and don’t forget to click on the bell icon to get the latest update. Click here to know more about our pursuit.