How to perform Switchover/Failover in MySql InnoDB ReplicaSet

How to perform Switchover/Failover in MySql InnoDB ReplicaSet


In the previous blog, we saw Part I: How to Configure InnoDB ReplicaSet MySQL Click here to read more. We will use the same configuration to perform Switchover & Failover in InnoDB ReplicaSet.


Below are the setup information.

HostnameIPRoleServer_IDPortInstallation Link
test-machine01192.168.114.177Master13306Click Here
test-machine01192.168.114.177Replica-133307Click Here
test-machine02192.168.114.176Replica-223306Click Here
test-machine02192.168.114.176Replica-343307

We will use two function Replicaset.setPrimaryInstance for planned Switchover and Replicaset.forcePrimaryInstance for failover in case of the master is inaccessible.


Step 1. Login to mysqlshell: Login to mysqlshell using clusteradmin user and check the status of InnoDB ReplicaSet.

[root@test-machine01 ~]# mysqlsh
MySQL Shell 8.0.23

Copyright (c) 2016, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
 MySQL  JS >
 MySQL  JS > \connect  rsadmin@test-machine01:3306
Creating a session to 'rsadmin@test-machine01:3306'
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 120
Server version: 8.0.22 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.
 MySQL  test-machine01:3306 ssl  JS >



 MySQL  test-machine01:3306 ssl  JS >
 MySQL  test-machine01:3306 ssl  JS > var rs =dba.getReplicaSet()
You are connected to a member of replicaset 'TestReplicaSet'.
 MySQL  test-machine01:3306 ssl  JS >

 MySQL  test-machine01:3306 ssl  JS > rs.status()
{
    "replicaSet": {
        "name": "TestReplicaSet",
        "primary": "test-machine01:3306",
        "status": "AVAILABLE",
        "statusText": "All instances available.",
        "topology": {
            "test-machine01:3306": {
                "address": "test-machine01:3306",
                "instanceRole": "PRIMARY",
                "mode": "R/W",
                "status": "ONLINE"
            },
            "test-machine01:3307": {
                "address": "test-machine01:3307",
                "instanceRole": "SECONDARY",
                "mode": "R/O",
                "replication": {
                    "applierStatus": "APPLIED_ALL",
                    "applierThreadState": "Slave has read all relay log; waiting for more updates",
                    "receiverStatus": "ON",
                    "receiverThreadState": "Waiting for master to send event",
                    "replicationLag": null
                },
                "status": "ONLINE"
            },
            "test-machine02:3306": {
                "address": "test-machine02:3306",
                "instanceRole": "SECONDARY",
                "mode": "R/O",
                "replication": {
                    "applierStatus": "APPLIED_ALL",
                    "applierThreadState": "Slave has read all relay log; waiting for more updates",
                    "receiverStatus": "ON",
                    "receiverThreadState": "Waiting for master to send event",
                    "replicationLag": null
                },
                "status": "ONLINE"
            },
            "test-machine02:3307": {
                "address": "test-machine02:3307",
                "instanceRole": "SECONDARY",
                "mode": "R/O",
                "replication": {
                    "applierStatus": "APPLIED_ALL",
                    "applierThreadState": "Slave has read all relay log; waiting for more updates",
                    "receiverStatus": "ON",
                    "receiverThreadState": "Waiting for master to send event",
                    "replicationLag": null
                },
                "status": "ONLINE"
            }
        },
        "type": "ASYNC"
    }
}
 MySQL  test-machine01:3306 ssl  JS >


Step 2. Perform DryRun: Perform dryrun of switchover process to new master ‘test-machine02:3306’ to check for any potential issues before performing actual switchover. You can use \help Replicaset.setPrimaryInstance to get full commands options.

 MySQL  test-machine01:3306 ssl  JS > \help Replicaset.setPrimaryInstance
NAME
      setPrimaryInstance - Performs a safe PRIMARY switchover, promoting the
                           given instance.

SYNTAX
      <ReplicaSet>.setPrimaryInstance(instance, options)

 MySQL  test-machine01:3306 ssl  JS > rs.setPrimaryInstance('test-machine02:3306',{dryRun: true})
test-machine02:3306 will be promoted to PRIMARY of 'TestReplicaSet'.
The current PRIMARY is test-machine01:3306.

* Connecting to replicaset instances
** Connecting to test-machine01:3306
** Connecting to test-machine01:3307
** Connecting to test-machine02:3306
** Connecting to test-machine02:3307
** Connecting to test-machine01:3306
** Connecting to test-machine01:3307
** Connecting to test-machine02:3306
** Connecting to test-machine02:3307

* Performing validation checks
** Checking async replication topology...
** Checking transaction state of the instance...

* Synchronizing transaction backlog at test-machine02:3306

* Updating metadata

* Acquiring locks in replicaset instances
** Pre-synchronizing SECONDARIES
** Acquiring global lock at PRIMARY
** Acquiring global lock at SECONDARIES

* Updating replication topology
** Configuring test-machine01:3306 to replicate from test-machine02:3306
** Changing replication source of test-machine01:3307 to test-machine02:3306
** Changing replication source of test-machine02:3307 to test-machine02:3306

test-machine02:3306 was promoted to PRIMARY.

dryRun finished.

 MySQL  test-machine01:3306 ssl  JS >


Step 3. Actual Switchover: Use the below command to perform switchover. Once completion of switchover, process is done. Use rs.status() function to check for replicaset status. You will notice “primary”: “test-machine02:3306”

MySQL  test-machine01:3306 ssl  JS > rs.setPrimaryInstance('test-machine02:3306');
test-machine02:3306 will be promoted to PRIMARY of 'TestReplicaSet'.
The current PRIMARY is test-machine01:3306.

* Connecting to replicaset instances
** Connecting to test-machine01:3306
** Connecting to test-machine01:3307
** Connecting to test-machine02:3306
** Connecting to test-machine02:3307
** Connecting to test-machine01:3306
** Connecting to test-machine01:3307
** Connecting to test-machine02:3306
** Connecting to test-machine02:3307

* Performing validation checks
** Checking async replication topology...
** Checking transaction state of the instance...

* Synchronizing transaction backlog at test-machine02:3306

* Updating metadata

* Acquiring locks in replicaset instances
** Pre-synchronizing SECONDARIES
** Acquiring global lock at PRIMARY
** Acquiring global lock at SECONDARIES

* Updating replication topology
** Configuring test-machine01:3306 to replicate from test-machine02:3306
** Changing replication source of test-machine01:3307 to test-machine02:3306
** Changing replication source of test-machine02:3307 to test-machine02:3306

test-machine02:3306 was promoted to PRIMARY.

 MySQL  test-machine01:3306 ssl  JS >



 MySQL  test-machine01:3306 ssl  JS >
 MySQL  test-machine01:3306 ssl  JS > rs.status();
{
    "replicaSet": {
        "name": "TestReplicaSet",
        "primary": "test-machine02:3306",
        "status": "AVAILABLE",
        "statusText": "All instances available.",
        "topology": {
            "test-machine01:3306": {
                "address": "test-machine01:3306",
                "instanceRole": "SECONDARY",
                "mode": "R/O",
                "replication": {
                    "applierStatus": "APPLIED_ALL",
                    "applierThreadState": "Slave has read all relay log; waiting for more updates",
                    "receiverStatus": "ON",
                    "receiverThreadState": "Waiting for master to send event",
                    "replicationLag": null
                },
                "status": "ONLINE"
            },
            "test-machine01:3307": {
                "address": "test-machine01:3307",
                "instanceRole": "SECONDARY",
                "mode": "R/O",
                "replication": {
                    "applierStatus": "APPLIED_ALL",
                    "applierThreadState": "Slave has read all relay log; waiting for more updates",
                    "receiverStatus": "ON",
                    "receiverThreadState": "Waiting for master to send event",
                    "replicationLag": null
                },
                "status": "ONLINE"
            },
            "test-machine02:3306": {
                "address": "test-machine02:3306",
                "instanceRole": "PRIMARY",
                "mode": "R/W",
                "status": "ONLINE"
            },
            "test-machine02:3307": {
                "address": "test-machine02:3307",
                "instanceRole": "SECONDARY",
                "mode": "R/O",
                "replication": {
                    "applierStatus": "APPLIED_ALL",
                    "applierThreadState": "Slave has read all relay log; waiting for more updates",
                    "receiverStatus": "ON",
                    "receiverThreadState": "Waiting for master to send event",
                    "replicationLag": null
                },
                "status": "ONLINE"
            }
        },
        "type": "ASYNC"
    }
}
 MySQL  test-machine01:3306 ssl  JS >


Step 4: Let simulate Failover: Let stop mysql instance : test-machine02:3306 which is new master after switchover to simulate failover scenarios. Once mysql instance : test-machine02:3306 is stopped check replicaset status using function rs.status(). You will notice “test-machine02:3306” : “status”: “UNAVAILABLE”.

[root@test-machine02 ~]# ps -ef|grep mysqld
mysql     26028      1  0 10:25 ?        00:01:37 /u01/mysql-8/bin/mysqld --defaults-group-suffix=@inst2
mysql     27425      1  0 Mar18 ?        00:15:00 /u01/mysql-8/bin/mysqld --defaults-group-suffix=@inst1

[root@test-machine02 ~]#
[root@test-machine02 ~]# systemctl stop mysqld@inst1
[root@test-machine02 ~]#

 MySQL  test-machine01:3306 ssl  JS > rs.status();
ReplicaSet.status: Failed to execute query on Metadata server test-machine02:3306: Lost connection to MySQL server during query (MySQL Error 2013)
 MySQL  test-machine01:3306 ssl  JS >
 MySQL  test-machine01:3306 ssl  JS >


 MySQL  test-machine01:3306 ssl  JS > \quit
Bye!
[root@test-machine01 ~]# mysqlsh
MySQL Shell 8.0.23

Copyright (c) 2016, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
 MySQL  JS > \connect rsadmin@test-machine01:3306
Creating a session to 'rsadmin@test-machine01:3306'
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 11709
Server version: 8.0.22 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.
 MySQL  test-machine01:3306 ssl  JS >
 MySQL  test-machine01:3306 ssl  JS > dba.getReplicaSet()
You are connected to a member of replicaset 'TestReplicaSet'.
<ReplicaSet:TestReplicaSet>
 MySQL  test-machine01:3306 ssl  JS >  var rs =dba.getReplicaSet()
You are connected to a member of replicaset 'TestReplicaSet'.
 MySQL  test-machine01:3306 ssl  JS >
 MySQL  test-machine01:3306 ssl  JS > rs.status();
ERROR: Unable to connect to the PRIMARY of the replicaset TestReplicaSet: MySQL Error 2003: Could not open connection to 'test-machine02:3306': Can't connect to MySQL server on 'test-machine02' (111)
Cluster change operations will not be possible unless the PRIMARY can be reached.
If the PRIMARY is unavailable, you must either repair it or perform a forced failover.
See \help forcePrimaryInstance for more information.
WARNING: MYSQLSH 51118: PRIMARY instance is unavailable
{
    "replicaSet": {
        "name": "TestReplicaSet",
        "primary": "test-machine02:3306",
        "status": "UNAVAILABLE",
        "statusText": "PRIMARY instance is not available, but there is at least one SECONDARY that could be force-promoted.",
        "topology": {
            "test-machine01:3306": {
                "address": "test-machine01:3306",
                "fenced": true,
                "instanceErrors": [
                    "ERROR: Replication I/O thread (receiver) has stopped with an error."
                ],
                "instanceRole": "SECONDARY",
                "mode": "R/O",
                "replication": {
                    "applierStatus": "APPLIED_ALL",
                    "applierThreadState": "Slave has read all relay log; waiting for more updates",
                    "expectedSource": "test-machine02:3306",
                    "receiverLastError": "error reconnecting to master 'mysql_innodb_rs_1@test-machine02:3306' - retry-time: 60 retries: 3 message: Can't connect to MySQL server on 'test-machine02' (111)",
                    "receiverLastErrorNumber": 2003,
                    "receiverLastErrorTimestamp": "2021-03-22 13:50:56.318987",
                    "receiverStatus": "ERROR",
                    "receiverThreadState": "",
                    "replicationLag": null,
                    "source": "test-machine02:3306"
                },
                "status": "ERROR",
                "transactionSetConsistencyStatus": null
            },
            "test-machine01:3307": {
                "address": "test-machine01:3307",
                "fenced": true,
                "instanceErrors": [
                    "ERROR: Replication I/O thread (receiver) has stopped with an error."
                ],
                "instanceRole": "SECONDARY",
                "mode": "R/O",
                "replication": {
                    "applierStatus": "APPLIED_ALL",
                    "applierThreadState": "Slave has read all relay log; waiting for more updates",
                    "expectedSource": "test-machine02:3306",
                    "receiverLastError": "error reconnecting to master 'mysql_innodb_rs_3@test-machine02:3306' - retry-time: 60 retries: 3 message: Can't connect to MySQL server on 'test-machine02' (111)",
                    "receiverLastErrorNumber": 2003,
                    "receiverLastErrorTimestamp": "2021-03-22 13:50:56.318986",
                    "receiverStatus": "ERROR",
                    "receiverThreadState": "",
                    "replicationLag": null,
                    "source": "test-machine02:3306"
                },
                "status": "ERROR",
                "transactionSetConsistencyStatus": null
            },
            "test-machine02:3306": {
                "address": "test-machine02:3306",
                "connectError": "Could not open connection to 'test-machine02:3306': Can't connect to MySQL server on 'test-machine02' (111)",
                "fenced": null,
                "instanceRole": "PRIMARY",
                "mode": null,
                "status": "UNREACHABLE"
            },
            "test-machine02:3307": {
                "address": "test-machine02:3307",
                "fenced": true,
                "instanceErrors": [
                    "ERROR: Replication I/O thread (receiver) has stopped with an error."
                ],
                "instanceRole": "SECONDARY",
                "mode": "R/O",
                "replication": {
                    "applierStatus": "APPLIED_ALL",
                    "applierThreadState": "Slave has read all relay log; waiting for more updates",
                    "expectedSource": "test-machine02:3306",
                    "receiverLastError": "error reconnecting to master 'mysql_innodb_rs_4@test-machine02:3306' - retry-time: 60 retries: 3 message: Can't connect to MySQL server on 'test-machine02' (111)",
                    "receiverLastErrorNumber": 2003,
                    "receiverLastErrorTimestamp": "2021-03-22 13:50:56.314898",
                    "receiverStatus": "ERROR",
                    "receiverThreadState": "",
                    "replicationLag": null,
                    "source": "test-machine02:3306"
                },
                "status": "ERROR",
                "transactionSetConsistencyStatus": null
            }
        },
        "type": "ASYNC"
    }
}
 MySQL  test-machine01:3306 ssl  JS >


Step 5: Dryrun for Failover: Run function forcePrimaryInstance with dryrun option to ‘test-machine01:3306’ to check for any potential error. You can use command \help Replicaset.forcePrimaryInstance to get full options.

MySQL  test-machine01:3306 ssl  JS > \help Replicaset.forcePrimaryInstance
NAME
      forcePrimaryInstance - Performs a failover in a replicaset with an
                             unavailable PRIMARY.

SYNTAX
      <ReplicaSet>.forcePrimaryInstance(instance, options)

rs.forcePrimaryInstance('test-machine01:3306',{dryRun: true})


 MySQL  test-machine01:3306 ssl  JS >
 MySQL  test-machine01:3306 ssl  JS > rs.forcePrimaryInstance('test-machine01:3306',{dryRun: true})
* Connecting to replicaset instances
** Connecting to test-machine01:3306
** Connecting to test-machine01:3307
** Connecting to test-machine02:3307

* Waiting for all received transactions to be applied
** Waiting for received transactions to be applied at test-machine01:3307
** Waiting for received transactions to be applied at test-machine01:3306
** Waiting for received transactions to be applied at test-machine02:3307
test-machine01:3306 will be promoted to PRIMARY of the replicaset and the former PRIMARY will be invalidated.

* Checking status of last known PRIMARY
NOTE: test-machine02:3306 is UNREACHABLE
* Checking status of promoted instance
NOTE: test-machine01:3306 has status ERROR
* Checking transaction set status
* Promoting test-machine01:3306 to a PRIMARY...

* Updating metadata...

test-machine01:3306 was force-promoted to PRIMARY.
NOTE: Former PRIMARY test-machine02:3306 is now invalidated and must be removed from the replicaset.
* Updating source of remaining SECONDARY instances
** Changing replication source of test-machine01:3307 to test-machine01:3306
** Changing replication source of test-machine02:3307 to test-machine01:3306

Failover finished successfully.

dryRun finished.

 MySQL  test-machine01:3306 ssl  JS >


Step 6. Actual Failover: Use below command to perform actual failover. Once process is finished you will notice “primary”: “test-machine01:3306” and “test-machine02:3306” “status”: “INVALIDATED”

 MySQL  test-machine01:3306 ssl  JS > rs.forcePrimaryInstance('test-machine01:3306')
* Connecting to replicaset instances
** Connecting to test-machine01:3306
** Connecting to test-machine01:3307
** Connecting to test-machine02:3307

* Waiting for all received transactions to be applied
** Waiting for received transactions to be applied at test-machine01:3307
** Waiting for received transactions to be applied at test-machine01:3306
** Waiting for received transactions to be applied at test-machine02:3307
test-machine01:3306 will be promoted to PRIMARY of the replicaset and the former PRIMARY will be invalidated.

* Checking status of last known PRIMARY
NOTE: test-machine02:3306 is UNREACHABLE
* Checking status of promoted instance
NOTE: test-machine01:3306 has status ERROR
* Checking transaction set status
* Promoting test-machine01:3306 to a PRIMARY...

* Updating metadata...

test-machine01:3306 was force-promoted to PRIMARY.
NOTE: Former PRIMARY test-machine02:3306 is now invalidated and must be removed from the replicaset.
* Updating source of remaining SECONDARY instances
** Changing replication source of test-machine01:3307 to test-machine01:3306
** Changing replication source of test-machine02:3307 to test-machine01:3306

Failover finished successfully.

 MySQL  test-machine01:3306 ssl  JS >



 MySQL  test-machine01:3306 ssl  JS >  rs.status()
{
    "replicaSet": {
        "name": "TestReplicaSet",
        "primary": "test-machine01:3306",
        "status": "AVAILABLE_PARTIAL",
        "statusText": "The PRIMARY instance is available, but one or more SECONDARY instances are not.",
        "topology": {
            "test-machine01:3306": {
                "address": "test-machine01:3306",
                "instanceRole": "PRIMARY",
                "mode": "R/W",
                "status": "ONLINE"
            },
            "test-machine01:3307": {
                "address": "test-machine01:3307",
                "instanceRole": "SECONDARY",
                "mode": "R/O",
                "replication": {
                    "applierStatus": "APPLIED_ALL",
                    "applierThreadState": "Slave has read all relay log; waiting for more updates",
                    "receiverStatus": "ON",
                    "receiverThreadState": "Waiting for master to send event",
                    "replicationLag": null
                },
                "status": "ONLINE"
            },
            "test-machine02:3306": {
                "address": "test-machine02:3306",
                "connectError": "Could not open connection to 'test-machine02:3306': Can't connect to MySQL server on 'test-machine02' (111)",
                "fenced": null,
                "instanceRole": null,
                "mode": null,
                "status": "INVALIDATED"
            },
            "test-machine02:3307": {
                "address": "test-machine02:3307",
                "instanceRole": "SECONDARY",
                "mode": "R/O",
                "replication": {
                    "applierStatus": "APPLIED_ALL",
                    "applierThreadState": "Slave has read all relay log; waiting for more updates",
                    "receiverStatus": "ON",
                    "receiverThreadState": "Waiting for master to send event",
                    "replicationLag": null
                },
                "status": "ONLINE"
            }
        },
        "type": "ASYNC"
    }
}
 MySQL  test-machine01:3306 ssl  JS >



Step 7. Lets remove “test-machine02:3306”: Remove mysql instance “test-machine02:3306” from topology using function rs.removeInstance.

MySQL  test-machine01:3306 ssl  JS >
 MySQL  test-machine01:3306 ssl  JS > rs.removeInstance('test-machine02:3306')
ERROR: Unable to connect to the target instance test-machine02:3306. Please make sure the instance is available and try again. If the instance is permanently not reachable, use the 'force' option to remove it from the replicaset metadata and skip reconfiguration of that instance.
ReplicaSet.removeInstance: Could not open connection to 'test-machine02:3306': Can't connect to MySQL server on 'test-machine02' (111) (MySQL Error 2003)
 MySQL  test-machine01:3306 ssl  JS >
 MySQL  test-machine01:3306 ssl  JS >


 MySQL  test-machine01:3306 ssl  JS >
 MySQL  test-machine01:3306 ssl  JS > rs.removeInstance('test-machine02:3306',{force:true})
NOTE: Unable to connect to the target instance test-machine02:3306. The instance will only be removed from the metadata, but its replication configuration cannot be updated. Please, take any necessary actions to make sure that the instance will not replicate from the replicaset if brought back online.
NOTE: test-machine02:3306 is invalidated, replication sync will be skipped.
Metadata for instance 'test-machine02:3306' was deleted, but instance configuration could not be updated.

 MySQL  test-machine01:3306 ssl  JS >
 MySQL  test-machine01:3306 ssl  JS >
 MySQL  test-machine01:3306 ssl  JS >  rs.status()
{
    "replicaSet": {
        "name": "TestReplicaSet",
        "primary": "test-machine01:3306",
        "status": "AVAILABLE",
        "statusText": "All instances available.",
        "topology": {
            "test-machine01:3306": {
                "address": "test-machine01:3306",
                "instanceRole": "PRIMARY",
                "mode": "R/W",
                "status": "ONLINE"
            },
            "test-machine01:3307": {
                "address": "test-machine01:3307",
                "instanceRole": "SECONDARY",
                "mode": "R/O",
                "replication": {
                    "applierStatus": "APPLIED_ALL",
                    "applierThreadState": "Slave has read all relay log; waiting for more updates",
                    "receiverStatus": "ON",
                    "receiverThreadState": "Waiting for master to send event",
                    "replicationLag": null
                },
                "status": "ONLINE"
            },
            "test-machine02:3307": {
                "address": "test-machine02:3307",
                "instanceRole": "SECONDARY",
                "mode": "R/O",
                "replication": {
                    "applierStatus": "APPLIED_ALL",
                    "applierThreadState": "Slave has read all relay log; waiting for more updates",
                    "receiverStatus": "ON",
                    "receiverThreadState": "Waiting for master to send event",
                    "replicationLag": null
                },
                "status": "ONLINE"
            }
        },
        "type": "ASYNC"
    }
}
 MySQL  test-machine01:3306 ssl  JS >
 MySQL  test-machine01:3306 ssl  JS >


Reference : MySQL InnoDB ReplicaSet : https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-innodb-replicaset.html

Reference : MySQL Database Architectures : https://www.youtube.com/watch?v=NYrcHMdwrpo

Part I : How to Configure InnoDB ReplicaSet Click here to read more.

Part III : How To Bootstrap MySQL Router For InnoDB ReplicaSet Click here to read more

This document is only for learning purpose and always validate in the LAB environment first before applying in the LIVE environment.


Hope so you like this article
!
Please share your valuable feedback/comments/subscribe and follow us below and don’t forget to click on the bell icon to get the latest update. Click here to know more about our pursuit.

 102 Total Views,  1 Views Today

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

Spread the love

Leave a Reply

Your email address will not be published. Required fields are marked *

13 − three =

Follow us on Social Media
Open chat
1
Contact Us:
Hi,

Greetings of the day! How can we help you?

Thanks.
Team DBsGuru
Direct Call:+91-9310167776
Email: info@dbsguru.com