Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, a first embodiment of the present invention provides a method for switching between a main database and a standby database, including:
step S10, a join local database service operation is performed.
The execution main body of the method of the present invention may be a database management platform or a database server, and the specific application to the main database or the standby database will be specifically described in the following description, and certainly not limited to other devices or servers capable of implementing switching between the main database and the standby database.
The method mainly ensures that the synchronization process of the main data and the standby data of the database can be monitored by utilizing MariaDB and binlog logs, and ensures quick and automatic switching when the database is abnormal.
In specific implementation, database agent software DBAgent (or other software capable of realizing the same function) can be deployed on equipment where the main database and the standby database are located, wherein the DBAgent is database agent software and can be used for monitoring the service operation condition of the database, analyzing binlog log files and switching control processes when a fault occurs; in the MariaDB database execution process, when the binlog log file is generated, the binlog position, namely the current log position (long shaping value), is given correspondingly.
Specifically, the dbagents of the master and slave databases regularly monitor the running conditions of the local databases and the latest GTID in the current binlog log, and simultaneously, the master and slave databases share the running conditions of the master and slave databases with each other.
When the local database is a main database and the opposite database is a standby database:
the DBAgent of the main database is connected with a service port of the main database by a specific user through a MYSQL protocol at regular time, inquires and obtains the current position (current gtid _ current _ pos) of the local database, performs record updating, and sends the record updating to the DBAgent of the standby database through a heartbeat message interface for recording.
When the local database is a standby database and the opposite database is a main database:
the DBAgent of the standby database is connected with a service port of the standby database by a specific user through a MYSQL protocol at regular time, inquires and obtains the current position (current gtid _ current _ pos) of the local database, performs record updating, and sends the record updating to the DBAgent of the main database through a heartbeat message interface for recording.
And step S20, receiving the current position of the opposite terminal database sent by the opposite terminal database.
Specifically, when the local database is the primary database and the opposite database is the backup database:
and the DBagent of the main database receives the heartbeat message sent by the DBagent of the standby database, and records the current gtid _ current _ pos position of the standby database.
When the local database is a standby database and the opposite database is a main database:
and the DBagent of the standby database receives the heartbeat message sent by the DBagent of the main database and records the current gtid _ current _ pos position of the main database.
Step S30, if the number of failures in the service operation for connecting the local database exceeds the preset number, determining that the local database is abnormal, and performing an alarm operation or performing a master-slave switching operation between the local database and the opposite-end database, or if the number of failures in the current position of the opposite-end database sent by the receiving opposite-end database exceeds the preset number, performing an alarm operation or performing a master-slave switching operation between the local database and the opposite-end database.
Specifically, when the local database is the standby database and the opposite database is the main database:
if the DBAgent of the backup database fails to connect the database service for more than a specified number of times (for example, 10 times, which can be preset by the system or modified by the user according to the actual use condition), it is determined that the current main database is abnormal, and the connection detection operation is continuously executed.
When the local database is a main database and the opposite database is a standby database:
if the DBAgent of the backup database fails to connect the database service for more than a specified number of times (for example, 10 times, which may be preset by the system or modified by the user according to the actual use condition), determining that the current primary database is abnormal, analyzing the log file of the primary database, obtaining the last position of the primary database, and executing the primary/backup switching operation according to the last position, wherein the following method can be adopted in the concrete implementation:
1. acquiring the last gtid _ current _ pos position of a main database in a mode of analyzing a binlog log file of the database, and recording the position as a last _ pos position;
2. the DBagent of the main database sends a switching request message to the DBagent of the standby database, wherein the message comprises a last _ pos position of the main database;
3. the DBagent of the main database continues to detect whether the database service is normal or not at regular time until the database service is recovered;
4. and the DBagent of the main database initiates a main/standby switching process according to the last _ pos position of the main database in the switching request, wherein:
if the last position of the main database is not empty, comparing the last position of the main database with the current position of the standby database, wherein if the current position of the standby database is equal to the last position of the main database, the automatic switching operation is executed, and if the current position of the standby database is smaller than the last position of the main database, whether the automatic switching operation is executed or not is decided according to business requirements;
and if the last position of the main database is empty, comparing the current position of the main database with the current position of the standby database, wherein if the current position of the standby database is greater than or equal to the current position of the main database, the automatic switching operation is executed, and otherwise, whether the automatic switching operation is executed or not is decided according to the service requirement.
The automatic switching operation specifically includes: and switching the standby database to be the main database, and restoring the main-standby copying relation after the service of the standby database is restored.
Or, when the host database is the main database and the opposite database is the standby database:
and if the DBagent of the master database does not receive the heartbeat message sent by the DBagent of the master database for more than the specified times, executing alarm operation.
When the host database is a standby database and the opposite database is a main database:
if the DBagent of the standby database exceeds the specified times and does not receive the heartbeat message sent by the DBagent of the main database, the MYSQL protocol is connected with a service port of the main database to obtain the current position gtid _ current _ pos of the main database, if the DBagent process of the main database is successfully obtained, the database service is normal, the main/standby switch is not required to be initiated, only an alarm is required to be sent and a relevant log is recorded, if the DBagent process of the main database is failed to be obtained, the DBagent process and the database service of the main database are abnormal, and at the moment, a main/standby switch request can be initiated.
In addition, as shown in fig. 2, a master-slave copy monitoring management platform architecture diagram is specifically described as follows:
when a master database executes DDL statements or SQL updating transactions, a globally unique sequence GTID number is distributed for the transactions, binary binlog log files are written into the slave database before the transactions are submitted and are synchronized, binlog log information synchronization data of the master database are read from the slave database, and the corresponding SQL statements are replied and executed, so that data synchronization with the master database is achieved;
database agent software DBAgents monitor the service operation condition of a local database at fixed time, access the local database through MYSQL protocol connection to monitor whether the database is normal, simultaneously acquire the current gtid _ current _ pos position of the local database and send heartbeat messages to an opposite terminal DBAgent at fixed time, after receiving the heartbeat messages of the opposite terminal, the DBAgents analyze and record the gtid _ current _ pos position of the opposite terminal database, compare the gtid _ current _ pos positions of the local database and the opposite terminal database, judge the data synchronization condition of a master database and a slave database according to the difference, and perform related measures such as real-time alarm when the difference between the gtid _ current _ pos of the master database and the slave database exceeds a set range.
Further, after the database agent software DBAgent cannot receive heartbeat messages sent by the opposite end DBAgent within a certain time, actively connecting the opposite end database service through an MYSQL protocol, judging whether the opposite end database service is normal, simultaneously acquiring the gtid _ current _ pos position of the opposite end database, judging that the opposite end database is abnormal if the connection fails, and deciding whether to initiate the main-standby switching according to the main role and the slave role of the database.
By the scheme, the invention provides a master-slave database switching method, which can monitor and master-slave data synchronization difference conditions in real time and indirectly know the write transaction load change condition of a system; a third-party management component is removed, and the problem that the whole system is unavailable due to the fault of the third-party management component is directly avoided; the installation and deployment are simplified while the processing logic is simplified, the DBagent can be bundled with the DB, and the deployment of each database node is uniform and simple; the data inconsistency of the main and standby machine databases is quickly identified, the main and standby machine switching can be automatically carried out when the data is consistent, and a new main and standby copy relationship is established. When the data are inconsistent, the differential SQL script can be automatically generated, so that the time is saved for manual intervention, and the data consistency is highly ensured; and under the condition that the DBagent per se has normal process abnormal database service, unnecessary error switching is avoided.
As shown in fig. 3, a second embodiment of the present invention provides a method for switching between a main database and a standby database, including:
and step S11, executing the operation of connecting the local database service, and acquiring the current position of the local database at regular time.
Specifically, the dbagents of the master and slave databases regularly monitor the running conditions of the local databases and the latest GTID in the current binlog log, and simultaneously, the master and slave databases share the running conditions of the master and slave databases with each other.
When the local database is a main database and the opposite database is a standby database:
the DBAgent of the main database is connected with a service port of the main database by a specific user through a MYSQL protocol at regular time, inquires and obtains the current position (current gtid _ current _ pos) of the local database, performs record updating, and sends the record updating to the DBAgent of the standby database through a heartbeat message interface for recording.
When the local database is a standby database and the opposite database is a main database:
the DBAgent of the standby database is connected with a service port of the standby database by a specific user through a MYSQL protocol at regular time, inquires and obtains the current position (current gtid _ current _ pos) of the local database, performs record updating, and sends the record updating to the DBAgent of the main database through a heartbeat message interface for recording.
Step S21, receiving the current location of the peer database sent by the peer database, and comparing the current location of the local database with the current location of the peer database.
Specifically, when the local database is the primary database and the opposite database is the backup database:
the DBagent of the main database receives the heartbeat message sent by the DBagent of the standby database, records the current gtid _ current _ pos position of the standby database, and compares the current gtid _ current _ pos position of the main database with the current gtid _ current _ pos position of the standby database.
When the local database is a standby database and the opposite database is a main database:
the DBagent of the standby database receives the heartbeat message sent by the DBagent of the main database, records the current gtid _ current _ pos position of the main database, and compares the current gtid _ current _ pos position of the main database with the current gtid _ current _ pos position of the standby database.
And step S40, when the difference between the current position of the local database and the current position of the opposite-end database is greater than a preset threshold value, executing an alarm operation.
Specifically, when the difference between the current gtid _ current _ pos location of the primary database and the current gtid _ current _ pos location of the secondary database is greater than a configured threshold (the threshold may be preset by the system or modified by the user according to the actual use condition), a corresponding measure, such as an alarm, is taken.
By the scheme, the invention provides a master-slave database switching method, which can monitor and master-slave data synchronization difference conditions in real time and indirectly know the write transaction load change condition of a system; a third-party management component is removed, and the problem that the whole system is unavailable due to the fault of the third-party management component is directly avoided; the installation and deployment are simplified while the processing logic is simplified, the DBagent can be bundled with the DB, and the deployment of each database node is uniform and simple; the data inconsistency of the main and standby machine databases is quickly identified, the main and standby machine switching can be automatically carried out when the data is consistent, and a new main and standby copy relationship is established. When the data are inconsistent, the differential SQL script can be automatically generated, so that the time is saved for manual intervention, and the data consistency is highly ensured; and under the condition that the DBagent per se has normal process abnormal database service, unnecessary error switching is avoided.
Further, in order to better ensure the consistency of data during the switching of the stored databases, referring to fig. 4, a flowchart of the steps of determining that the local database is abnormal and executing an alarm operation or performing a primary/standby switching operation between the local database and the opposite-end database in the specific embodiment of the present invention if the number of failures of the service operation of connecting the local database exceeds a preset number.
As an embodiment, the step S30 includes:
and step S31, if the local database is a standby database and the opposite database is a main database, executing an alarm operation and continuing to execute a connection detection operation.
Specifically, when the local database is the standby database and the opposite database is the main database:
if the DBAgent of the backup database fails to connect the database service for more than a specified number of times (for example, 10 times, which can be preset by the system or modified by the user according to the actual use condition), it is determined that the current main database is abnormal, and the connection detection operation is continuously executed.
Step S32, if the local database is a primary database and the opposite database is a backup database, analyzing the log file of the local database, obtaining the last location of the local database, and executing the primary/backup switching operation according to the last location.
Specifically, when the local database is the primary database and the opposite database is the backup database:
if the DBAgent of the backup database fails to connect the database service for more than a specified number of times (for example, 10 times, which may be preset by the system or modified by the user according to the actual use condition), determining that the current primary database is abnormal, analyzing the log file of the primary database, obtaining the last position of the primary database, and executing the primary/backup switching operation according to the last position, wherein the following method can be adopted in the concrete implementation:
1. acquiring the last gtid _ current _ pos position of a main database in a mode of analyzing a binlog log file of the database, and recording the position as a last _ pos position;
2. the DBagent of the main database sends a switching request message to the DBagent of the standby database, wherein the message comprises a last _ pos position of the main database;
3. the DBagent of the main database continues to detect whether the database service is normal or not at regular time until the database service is recovered;
4. and the DBagent of the main database initiates a main/standby switching process according to the last _ pos position of the main database in the switching request, wherein:
if the last position of the main database is not empty, comparing the last position of the main database with the current position of the standby database, wherein if the current position of the standby database is equal to the last position of the main database, the automatic switching operation is executed, and if the current position of the standby database is smaller than the last position of the main database, whether the automatic switching operation is executed or not is decided according to business requirements;
and if the last position of the main database is empty, comparing the current position of the main database with the current position of the standby database, wherein if the current position of the standby database is greater than or equal to the current position of the main database, the automatic switching operation is executed, and otherwise, whether the automatic switching operation is executed or not is decided according to the service requirement.
Through the scheme, the invention provides a method for switching the main database and the standby database, which better realizes real-time monitoring and mastering the synchronous difference condition of the main database and the standby database through different processing modes executed by the main database and the standby database when the local database is abnormal, simplifies the operation, ensures the data consistency and avoids error switching at the same time.
Further, in order to better ensure the consistency of data during the switching of the storage database, fig. 5 is a flowchart illustrating steps of executing the active/standby switching operation in the embodiment of the present invention.
As an embodiment, the step S32 includes:
step S321, if the last position of the main database is not empty, comparing the last position of the main database with the current position of the standby database, wherein if the current position of the standby database is equal to the last position of the main database, an automatic switching operation is executed, and if the current position of the standby database is smaller than the last position of the main database, a decision is made whether to execute the automatic switching operation according to a service requirement.
Specifically, if the last _ pos position of the primary database is not empty, comparing whether last GTID position last _ pos of the primary database is consistent with current GTID _ current _ pos of the secondary database, if the GTID _ current _ pos of the secondary database is equal to last _ pos of the primary database, performing automatic switching operation, and if the GTID _ current _ pos of the secondary database is smaller than last _ pos of the primary database, it indicates that data of the primary database is not completely synchronized to the primary database, and data of the primary database is inconsistent, at this time, it may be decided whether to automatically switch and record related log information according to service requirements.
According to the MariaDB replication principle, the situation that the gtid _ current _ pos of the standby database is larger than the last _ pos of the main database can not occur.
Step S322, if the last position of the main database is empty, comparing the current position of the main database with the current position of the standby database, wherein if the current position of the standby database is greater than or equal to the current position of the main database, the automatic switching operation is executed, otherwise, the automatic switching operation is decided whether to be executed according to the service requirement.
Specifically, if the last _ pos position of the primary database is empty, whether the current GTID _ current _ pos position of the primary database is consistent with the current GTID _ current _ pos position of the secondary database is compared, if the secondary database is greater than or equal to the GTID _ current _ pos position of the primary database, automatic switching operation is performed, otherwise, it is indicated that the data of the primary database is not completely synchronized to the secondary database and the data of the primary and secondary databases are inconsistent, and at this time, whether automatic switching is required and relevant log information is recorded can be determined according to service requirements.
Through the scheme, the invention provides a main and standby database switching method, which better realizes real-time monitoring and mastering of the synchronous difference condition of the main and standby databases, simplifies operation, ensures data consistency and avoids error switching at the same time by executing corresponding main and standby switching operation according to different conditions.
Further, in order to better ensure the consistency of the data during the switching of the storage database, referring to fig. 6, a flowchart of the steps of performing the automatic switching operation in the embodiment of the present invention is shown.
As an embodiment, the step S32 includes:
step S323, the standby database is switched to be a main database, and the main-standby copying relation is restored after the service of the standby database is restored.
Specifically, the DBagent switches the standby database into a main database, sets the main original database into a standby database, and restores the copying relation of the main database and the standby database again after the service of the standby database is restored; when the data of the main database and the standby database are inconsistent, a binlog log file of the original main database needs to be analyzed, all SQL statements related to GTID after the analysis is carried out at the position of the terminal _ current _ pos of the standby database generate a forward SQL statement, and the data of the original main database and the standby database are recovered to be consistent after the supplementary execution of the original standby database is finished.
By the scheme, the method for switching the main and standby databases better realizes real-time monitoring and mastering of the synchronous difference condition of the main and standby databases by executing automatic switching operation, simplifies operation, ensures data consistency and avoids error switching.
Further, in order to better ensure the consistency of the data during the switching of the storage database, referring to fig. 7, a flowchart of the step of acquiring the last location of the local database in the embodiment of the present invention is shown.
As an embodiment, the step S32 includes:
step S324, obtain the current log file list of the local database.
Specifically, when the host database service is abnormal, the DBAgent acquires the last GTID position last _ pos of the database by calling the mysql brinlog tool.
And acquiring a current binlog file list through index files under a database binlog binary log folder.
Step S325, start scanning from the last log file in the current log file list, and obtain the last location of the local database from the last event of the last log file.
Specifically, scanning is started from the last binlog LOG file, the last GTID EVENT (GTID _ LOG _ EVENT) of the binlog file is found through scanning, and the obtained last GTID is last _ pos.
In step S326, if the last event cannot be found from the last log file, the last log file of the last log file is scanned until the last event is found.
Specifically, if the current binlog does not find a gtid event, i.e., no transaction commit is recorded in this binlog file, then the last binlog file is scanned and iterated until the last gtid is found.
Further, the method for searching the designated gtid _ current _ pos location through the binlog log file is the same as the method for searching last _ pos described above.
In addition, as shown in fig. 8, the change of the binlog log position corresponding to the master-slave database before and after the maridb master-slave switching operation is performed.
Wherein, PosA is the position of gtid _ current _ pos in the normal heartbeat message reported last time by the original main database; last _ pos is the maximum GTID position where the Last write transaction of the original master database is actually written into the binlog binary log; PosB is a GTID position corresponding to the last transaction of the binlog log copied from the main database by the database of the original standby machine; when the Last _ pos and the PosB are overlapped and equal, the main data and the standby data are completely consistent, and the main data and the standby data can be automatically switched; when Last _ pos is unknown (abnormal conditions of physical machines such as power failure and network disconnection), and when PosB is larger than or equal to PosA, whether the main data and the standby data are consistent or unknown; after the main and standby machines are switched, the relationship between the Last _ pos position and the PosB needs to be confirmed again after the original host machine recovers the abnormity, if the Last _ pos is larger than the PosB, the binlog difference exists between the main and standby machines, otherwise, the data of the main and standby machines are consistent; when Last _ pos is unknown (abnormal conditions of physical machines such as power failure, network disconnection and the like) and PosB is smaller than PosA, the data of the original standby machine is obviously lagged behind the data of the original host, and the data of the main standby machine is inconsistent.
Through the scheme, the invention provides the method for switching the main database and the standby database, so that the synchronous difference condition of the main database and the standby database can be better monitored and mastered in real time, the operation is simplified, the data consistency is ensured, and meanwhile, the wrong switching is avoided.
Based on the implementation of the method embodiment, the invention also provides a corresponding device embodiment.
As shown in fig. 9, a first embodiment of the present invention provides a device for switching between a main database and a standby database, including:
a connection module 100 for performing a connect native database service operation.
The method mainly ensures that the synchronization process of the main data and the standby data of the database can be monitored by utilizing MariaDB and binlog logs, and ensures quick and automatic switching when the database is abnormal.
In specific implementation, database agent software DBAgent (or other software capable of realizing the same function) can be deployed on equipment where the main database and the standby database are located, wherein the DBAgent is database agent software and can be used for monitoring the service operation condition of the database, analyzing binlog log files and switching control processes when a fault occurs; in the MariaDB database execution process, when the binlog log file is generated, the binlog position, namely the current log position (long shaping value), is given correspondingly.
Specifically, the dbagents of the master and slave databases regularly monitor the running conditions of the local databases and the latest GTID in the current binlog log, and simultaneously, the master and slave databases share the running conditions of the master and slave databases with each other.
When the local database is a main database and the opposite database is a standby database:
the DBAgent of the main database is connected with a service port of the main database by a specific user through a MYSQL protocol at regular time, inquires and obtains the current position (current gtid _ current _ pos) of the local database, performs record updating, and sends the record updating to the DBAgent of the standby database through a heartbeat message interface for recording.
When the local database is a standby database and the opposite database is a main database:
the DBAgent of the standby database is connected with a service port of the standby database by a specific user through a MYSQL protocol at regular time, inquires and obtains the current position (current gtid _ current _ pos) of the local database, performs record updating, and sends the record updating to the DBAgent of the main database through a heartbeat message interface for recording.
A location module 200, configured to receive the current location of the peer database sent by the peer database.
Specifically, when the local database is the primary database and the opposite database is the backup database:
and the DBagent of the main database receives the heartbeat message sent by the DBagent of the standby database, and records the current gtid _ current _ pos position of the standby database.
When the local database is a standby database and the opposite database is a main database:
and the DBagent of the standby database receives the heartbeat message sent by the DBagent of the main database and records the current gtid _ current _ pos position of the main database.
An executing module 300, configured to determine that the local database is abnormal if the number of failures in the service operation for connecting the local database exceeds a preset number, and execute an alarm operation or perform a primary-standby switching operation between the local database and the opposite-end database, or execute an alarm operation or perform a primary-standby switching operation between the local database and the opposite-end database if the number of failures in the current position of the opposite-end database sent by the receiving opposite-end database exceeds the preset number.
Specifically, when the local database is the standby database and the opposite database is the main database:
if the DBAgent of the backup database fails to connect the database service for more than a specified number of times (for example, 10 times, which can be preset by the system or modified by the user according to the actual use condition), it is determined that the current main database is abnormal, and the connection detection operation is continuously executed.
When the local database is a main database and the opposite database is a standby database:
if the DBAgent of the backup database fails to connect the database service for more than a specified number of times (for example, 10 times, which may be preset by the system or modified by the user according to the actual use condition), determining that the current primary database is abnormal, analyzing the log file of the primary database, obtaining the last position of the primary database, and executing the primary/backup switching operation according to the last position, wherein the following method can be adopted in the concrete implementation:
1. acquiring the last gtid _ current _ pos position of a main database in a mode of analyzing a binlog log file of the database, and recording the position as a last _ pos position;
2. the DBagent of the main database sends a switching request message to the DBagent of the standby database, wherein the message comprises a last _ pos position of the main database;
3. the DBagent of the main database continues to detect whether the database service is normal or not at regular time until the database service is recovered;
4. and the DBagent of the main database initiates a main/standby switching process according to the last _ pos position of the main database in the switching request, wherein:
if the last position of the main database is not empty, comparing the last position of the main database with the current position of the standby database, wherein if the current position of the standby database is equal to the last position of the main database, the automatic switching operation is executed, and if the current position of the standby database is smaller than the last position of the main database, whether the automatic switching operation is executed or not is decided according to business requirements;
and if the last position of the main database is empty, comparing the current position of the main database with the current position of the standby database, wherein if the current position of the standby database is greater than or equal to the current position of the main database, the automatic switching operation is executed, and otherwise, whether the automatic switching operation is executed or not is decided according to the service requirement.
The automatic switching operation specifically includes: and switching the standby database to be the main database, and restoring the main-standby copying relation after the service of the standby database is restored.
Or, when the host database is the main database and the opposite database is the standby database:
and if the DBagent of the master database does not receive the heartbeat message sent by the DBagent of the master database for more than the specified times, executing alarm operation.
When the host database is a standby database and the opposite database is a main database:
if the DBagent of the standby database exceeds the specified times and does not receive the heartbeat message sent by the DBagent of the main database, the MYSQL protocol is connected with a service port of the main database to obtain the current position gtid _ current _ pos of the main database, if the DBagent process of the main database is successfully obtained, the database service is normal, the main/standby switch is not required to be initiated, only an alarm is required to be sent and a relevant log is recorded, if the DBagent process of the main database is failed to be obtained, the DBagent process and the database service of the main database are abnormal, and at the moment, a main/standby switch request can be initiated.
In addition, as shown in fig. 2, a master-slave copy monitoring management platform architecture diagram is specifically described as follows:
when a master database executes DDL statements or SQL updating transactions, a globally unique sequence GTID number is distributed for the transactions, binary binlog log files are written into the slave database before the transactions are submitted and are synchronized, binlog log information synchronization data of the master database are read from the slave database, and the corresponding SQL statements are replied and executed, so that data synchronization with the master database is achieved;
database agent software DBAgents monitor the service operation condition of a local database at fixed time, access the local database through MYSQL protocol connection to monitor whether the database is normal, simultaneously acquire the current gtid _ current _ pos position of the local database and send heartbeat messages to an opposite terminal DBAgent at fixed time, after receiving the heartbeat messages of the opposite terminal, the DBAgents analyze and record the gtid _ current _ pos position of the opposite terminal database, compare the gtid _ current _ pos positions of the local database and the opposite terminal database, judge the data synchronization condition of a master database and a slave database according to the difference, and perform related measures such as real-time alarm when the difference between the gtid _ current _ pos of the master database and the slave database exceeds a set range.
Further, after the database agent software DBAgent cannot receive heartbeat messages sent by the opposite end DBAgent within a certain time, actively connecting the opposite end database service through an MYSQL protocol, judging whether the opposite end database service is normal, simultaneously acquiring the gtid _ current _ pos position of the opposite end database, judging that the opposite end database is abnormal if the connection fails, and deciding whether to initiate the main-standby switching according to the main role and the slave role of the database.
By the scheme, the invention provides a master-slave database switching device, and the method can monitor and master the master-slave data synchronization difference situation in real time and indirectly know the write transaction load change situation of the system; a third-party management component is removed, and the problem that the whole system is unavailable due to the fault of the third-party management component is directly avoided; the installation and deployment are simplified while the processing logic is simplified, the DBagent can be bundled with the DB, and the deployment of each database node is uniform and simple; the data inconsistency of the main and standby machine databases is quickly identified, the main and standby machine switching can be automatically carried out when the data is consistent, and a new main and standby copy relationship is established. When the data are inconsistent, the differential SQL script can be automatically generated, so that the time is saved for manual intervention, and the data consistency is highly ensured; and under the condition that the DBagent per se has normal process abnormal database service, unnecessary error switching is avoided.
As shown in fig. 10, a second embodiment of the present invention provides a device for switching between a main database and a standby database, where:
the connection module 100 is further configured to perform a service operation of connecting the local database, and periodically obtain a current location of the local database.
Specifically, the dbagents of the master and slave databases regularly monitor the running conditions of the local databases and the latest GTID in the current binlog log, and simultaneously, the master and slave databases share the running conditions of the master and slave databases with each other.
When the local database is a main database and the opposite database is a standby database:
the DBAgent of the main database is connected with a service port of the main database by a specific user through a MYSQL protocol at regular time, inquires and obtains the current position (current gtid _ current _ pos) of the local database, performs record updating, and sends the record updating to the DBAgent of the standby database through a heartbeat message interface for recording.
When the local database is a standby database and the opposite database is a main database:
the DBAgent of the standby database is connected with a service port of the standby database by a specific user through a MYSQL protocol at regular time, inquires and obtains the current position (current gtid _ current _ pos) of the local database, performs record updating, and sends the record updating to the DBAgent of the main database through a heartbeat message interface for recording.
The location module 200 is further configured to receive a current location of the peer database sent by the peer database, and compare the current location of the local database with the current location of the peer database.
Specifically, when the local database is the primary database and the opposite database is the backup database:
the DBagent of the main database receives the heartbeat message sent by the DBagent of the standby database, records the current gtid _ current _ pos position of the standby database, and compares the current gtid _ current _ pos position of the main database with the current gtid _ current _ pos position of the standby database.
When the local database is a standby database and the opposite database is a main database:
the DBagent of the standby database receives the heartbeat message sent by the DBagent of the main database, records the current gtid _ current _ pos position of the main database, and compares the current gtid _ current _ pos position of the main database with the current gtid _ current _ pos position of the standby database.
The device also includes:
and an alarm module 400, configured to execute an alarm operation when a difference between the current location of the local database and the current location of the opposite-end database is greater than a preset threshold.
Specifically, when the difference between the current gtid _ current _ pos location of the primary database and the current gtid _ current _ pos location of the secondary database is greater than a configured threshold (the threshold may be preset by the system or modified by the user according to the actual use condition), a corresponding measure, such as an alarm, is taken.
By the scheme, the invention provides a master-slave database switching device, and the method can monitor and master the master-slave data synchronization difference situation in real time and indirectly know the write transaction load change situation of the system; a third-party management component is removed, and the problem that the whole system is unavailable due to the fault of the third-party management component is directly avoided; the installation and deployment are simplified while the processing logic is simplified, the DBagent can be bundled with the DB, and the deployment of each database node is uniform and simple; the data inconsistency of the main and standby machine databases is quickly identified, the main and standby machine switching can be automatically carried out when the data is consistent, and a new main and standby copy relationship is established. When the data are inconsistent, the differential SQL script can be automatically generated, so that the time is saved for manual intervention, and the data consistency is highly ensured; and under the condition that the DBagent per se has normal process abnormal database service, unnecessary error switching is avoided.
Further, in order to better ensure the consistency of data during the switching of the reserve databases, the executing module 300 is further configured to execute an alarm operation and continue to execute a connection detection operation if the local database is the reserve database and the opposite database is the primary database; if the local database is a main database and the opposite-end database is a standby database, analyzing the log file of the local database, acquiring the last position of the local database, and executing the main-standby switching operation according to the last position.
Specifically, when the local database is the standby database and the opposite database is the main database:
if the DBAgent of the backup database fails to connect the database service for more than a specified number of times (for example, 10 times, which can be preset by the system or modified by the user according to the actual use condition), it is determined that the current main database is abnormal, and the connection detection operation is continuously executed.
Then, when the local database is the main database and the opposite database is the standby database:
if the DBAgent of the backup database fails to connect the database service for more than a specified number of times (for example, 10 times, which may be preset by the system or modified by the user according to the actual use condition), determining that the current primary database is abnormal, analyzing the log file of the primary database, obtaining the last position of the primary database, and executing the primary/backup switching operation according to the last position, wherein the following method can be adopted in the concrete implementation:
1. acquiring the last gtid _ current _ pos position of a main database in a mode of analyzing a binlog log file of the database, and recording the position as a last _ pos position;
2. the DBagent of the main database sends a switching request message to the DBagent of the standby database, wherein the message comprises a last _ pos position of the main database;
3. the DBagent of the main database continues to detect whether the database service is normal or not at regular time until the database service is recovered;
4. and the DBagent of the main database initiates a main/standby switching process according to the last _ pos position of the main database in the switching request, wherein:
if the last position of the main database is not empty, comparing the last position of the main database with the current position of the standby database, wherein if the current position of the standby database is equal to the last position of the main database, the automatic switching operation is executed, and if the current position of the standby database is smaller than the last position of the main database, whether the automatic switching operation is executed or not is decided according to business requirements;
and if the last position of the main database is empty, comparing the current position of the main database with the current position of the standby database, wherein if the current position of the standby database is greater than or equal to the current position of the main database, the automatic switching operation is executed, and otherwise, whether the automatic switching operation is executed or not is decided according to the service requirement.
Through the scheme, the invention provides a main and standby database switching device, and the method better realizes real-time monitoring and mastering of the synchronous difference condition of the main and standby databases through different processing modes executed by the main and standby databases when the local database is abnormal, simplifies operation, ensures data consistency and avoids error switching at the same time.
Further, in order to better ensure the consistency of data during the switching of the backup database, the executing module 300 is further configured to compare the last location of the primary database with the current location of the backup database if the last location of the primary database is not empty, wherein if the current location of the backup database is equal to the last location of the primary database, an automatic switching operation is executed, and if the current location of the backup database is smaller than the last location of the primary database, a decision is made as to whether to execute the automatic switching operation according to a service requirement; and if the last position of the main database is empty, comparing the current position of the main database with the current position of the standby database, wherein if the current position of the standby database is greater than or equal to the current position of the main database, the automatic switching operation is executed, otherwise, whether the automatic switching operation is executed or not is decided according to the service requirement.
Specifically, if the last _ pos position of the primary database is not empty, comparing whether last GTID position last _ pos of the primary database is consistent with current GTID _ current _ pos of the secondary database, if the GTID _ current _ pos of the secondary database is equal to last _ pos of the primary database, performing automatic switching operation, and if the GTID _ current _ pos of the secondary database is smaller than last _ pos of the primary database, it indicates that data of the primary database is not completely synchronized to the primary database, and data of the primary database is inconsistent, at this time, it may be decided whether to automatically switch and record related log information according to service requirements.
According to the MariaDB replication principle, the situation that the gtid _ current _ pos of the standby database is larger than the last _ pos of the main database can not occur.
Then, if the main database last _ pos position is empty, comparing whether the current GTID _ current _ pos position of the main database is consistent with the current GTID _ current _ pos position of the standby database, if the standby database is larger than or equal to the GTID _ current _ pos position of the main database, performing automatic switching operation, otherwise, indicating that the data of the main database is not completely synchronized to the standby database and the data of the main and standby databases are inconsistent, and at this time, deciding whether to automatically switch and record related log information according to service requirements.
Through the scheme, the invention provides the main and standby database switching device, and the method better realizes real-time monitoring and mastering of the synchronous difference condition of the main and standby databases by executing corresponding main and standby switching operation according to different conditions, simplifies the operation, ensures the data consistency and avoids error switching at the same time.
Further, in order to better ensure the consistency of data during the switching of the backup database, the execution module 300 is further configured to switch the backup database to be the primary database, and resume the primary/backup copy relationship after the service of the backup database is restored.
Specifically, the DBagent switches the standby database into a main database, sets the main original database into a standby database, and restores the copying relation of the main database and the standby database again after the service of the standby database is restored; when the data of the main database and the standby database are inconsistent, a binlog log file of the original main database needs to be analyzed, all SQL statements related to GTID after the analysis is carried out at the position of the terminal _ current _ pos of the standby database generate a forward SQL statement, and the data of the original main database and the standby database are recovered to be consistent after the supplementary execution of the original standby database is finished.
Through the scheme, the invention provides the main and standby database switching device, and the method better realizes real-time monitoring and mastering of the synchronous difference condition of the main and standby databases by executing automatic switching operation, simplifies operation, ensures data consistency and avoids error switching.
Further, in order to better ensure the consistency of data during the switching of the storage databases, the execution module 300 is further configured to obtain a current log file list of the local database; scanning is started from the last log file in the current log file list, and the last position of the local database is obtained from the last event of the last log file; and if the last event cannot be found from the last log file, scanning the last log file of the last log file until the last event is found.
Specifically, when the host database service is abnormal, the DBAgent acquires the last GTID position last _ pos of the database by calling the mysql brinlog tool.
And acquiring a current binlog file list through index files under a database binlog binary log folder.
Then, scanning is started from the last binlog LOG file, the last GTID EVENT (GTID _ LOG _ EVENT) of the binlog file is found through scanning, and the obtained last GTID is last _ pos.
Then, if the current binlog does not find a gtid event, i.e., no transaction commit is recorded in this binlog file, then the last binlog file is scanned and the loop is repeated until the last gtid is found.
Further, the method for searching the designated gtid _ current _ pos location through the binlog log file is the same as the method for searching last _ pos described above.
In addition, as shown in fig. 8, the change of the binlog log position corresponding to the master-slave database before and after the maridb master-slave switching operation is performed.
Wherein, PosA is the position of gtid _ current _ pos in the normal heartbeat message reported last time by the original main database; last _ pos is the maximum GTID position where the Last write transaction of the original master database is actually written into the binlog binary log; PosB is a GTID position corresponding to the last transaction of the binlog log copied from the main database by the database of the original standby machine; when the Last _ pos and the PosB are overlapped and equal, the main data and the standby data are completely consistent, and the main data and the standby data can be automatically switched; when Last _ pos is unknown (abnormal conditions of physical machines such as power failure and network disconnection), and when PosB is larger than or equal to PosA, whether the main data and the standby data are consistent or unknown; after the main and standby machines are switched, the relationship between the Last _ pos position and the PosB needs to be confirmed again after the original host machine recovers the abnormity, if the Last _ pos is larger than the PosB, the binlog difference exists between the main and standby machines, otherwise, the data of the main and standby machines are consistent; when Last _ pos is unknown (abnormal conditions of physical machines such as power failure, network disconnection and the like) and PosB is smaller than PosA, the data of the original standby machine is obviously lagged behind the data of the original host, and the data of the main standby machine is inconsistent.
Through the scheme, the invention provides the main and standby database switching device, which better realizes real-time monitoring and mastering of the synchronous difference condition of the main and standby databases, simplifies operation, ensures data consistency and avoids error switching.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.