CN112486718A - Database fault automatic switching method and device and computer storage medium - Google Patents

Database fault automatic switching method and device and computer storage medium Download PDF

Info

Publication number
CN112486718A
CN112486718A CN202011387011.1A CN202011387011A CN112486718A CN 112486718 A CN112486718 A CN 112486718A CN 202011387011 A CN202011387011 A CN 202011387011A CN 112486718 A CN112486718 A CN 112486718A
Authority
CN
China
Prior art keywords
library
server
database
slave
master library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011387011.1A
Other languages
Chinese (zh)
Inventor
刘颖麒
赖军寿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yeahka Technology Co ltd
Original Assignee
Shenzhen Yeahka Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yeahka Technology Co ltd filed Critical Shenzhen Yeahka Technology Co ltd
Priority to CN202011387011.1A priority Critical patent/CN112486718A/en
Publication of CN112486718A publication Critical patent/CN112486718A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0775Content or structure details of the error report, e.g. specific table structure, specific error fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database

Abstract

The invention discloses a database fault automatic switching method, which comprises the following steps: when the server where the main library is located is detected to have a fault, performing fault switching; and at least one of the following steps: calling a shutdown API to close the master library server while performing fault switching; operations in the master library server associated with writing data are blocked. The invention also discloses a device and a computer readable storage medium, which solve the problem of data confusion caused by the condition of double writing of data when the new and old master libraries survive simultaneously in the prior art.

Description

Database fault automatic switching method and device and computer storage medium
Technical Field
The invention relates to the technical field of databases, in particular to a database fault automatic switching method, a database fault automatic switching device and a computer storage medium.
Background
The existing MySQL database has a plurality of highly available management schemes, and the following two schemes are commonly used:
1. keepalive. The scheme can be used for switching to the slave library when the master library is down, but the switching cannot be carried out under the condition that the database is hung and the server normally survives. If there are multiple slave libraries in a master library, it is not possible to switch other slave libraries to the new master library for master-slave replication. And for data inconsistency existing in master-slave synchronization delay, data completion cannot be carried out, so that the data consistency is achieved.
2. MHA. MHA is the most common MySQL database high availability management solution. MHA solves the disadvantage of keepalive, can switch when the database is hung and the server is alive, and can switch other slave libraries to a new master library to perform master-slave synchronization and complement data to keep consistency by using the data when a plurality of slave libraries exist. MHAs are commonly used with VIPs (virtual IP) to allow applications to switch traffic to a new master pool after switching without requiring a reboot to modify the master pool address configuration. Besides that scripts such as switching VIP, shutting down and sending notifications need to be developed by themselves, MHA has a major disadvantage that it is not guaranteed that the main library is actually down for switching. When the judgment is wrong, the new and old master libraries can survive at the same time, and the situation of data double writing occurs, so that the data is disordered.
Therefore, in the prior art, the problem of data confusion is also caused by the situation that data double writing occurs when the new and old master libraries survive at the same time.
Disclosure of Invention
The invention mainly aims to provide a method and a device for automatically switching database failures and a computer storage medium, and aims to solve the problem that data confusion is caused by the situation that double writing of data occurs when a new master library and an old master library live simultaneously in the prior art.
In order to achieve the above object, the present invention provides an automatic database failure switching method, which includes the following steps:
when the server where the main library is located is detected to have a fault, performing fault switching; and at least one of the following steps:
calling a shutdown API to close the master library server while performing fault switching;
operations in the master library server associated with writing data are blocked.
In one embodiment, detecting whether a server in which the master library is located fails includes:
the MHA management node detects whether a main library server is reachable in real time;
when detecting that the main library server is not reachable, logging in a secondary check server to detect whether the main library server is reachable again;
and when the secondary check server detects that the primary library server is not reachable, judging that the primary library server fails.
In one embodiment, the step of performing failover includes:
saving the binary log event of the master library;
selecting a new master library from a preset number of slave libraries according to a preset rule;
reading a difference relay log from the new master library and copying the difference relay log to the rest slave libraries;
copying the binary log event into the new master library;
and performing complement data operation on the residual slave libraries according to the data in the new master library, and switching to the synchronous data of the new slave master library.
In one embodiment, the step of selecting a new master library from a preset number of slave libraries according to a preset rule includes:
and selecting the slave library containing the latest data from a preset number of slave libraries as a new master library.
In one embodiment, the step of selecting a new master library from a preset number of slave libraries according to a preset rule includes:
one slave library is designated in advance from a preset number of slave libraries as a new master library.
In one embodiment, the step of calling the shutdown API to close the master library server comprises:
the MHA management node concurrently calls a preset number of shutdown APIs to remotely shutdown the main library server;
wherein the shutdown operations of the plurality of shutdown APIs are idempotent.
In one embodiment, the step of blocking operations in the master library server related to writing data comprises:
the suicide process concurrently calls the health check api of the health check services in a preset number according to a preset time interval to carry out communication health check;
when the health examination api of the health examination services with the preset number detects that the suicide process fails to communicate with the outside for continuous preset times, preliminarily judging that the suicide process is in the predicament of a network island;
the suicide process disconnects all database instances on the master library server, waits for a preset time and then checks whether connection from an application exists;
when no connection from the application is detected, finally judging that the suicide process is in the island predicament of the network;
setting all database instances on the main library server as read-only, and not allowing data to be changed;
executing the following steps, when the execution of a certain action fails, continuing to execute the action downwards until one action is successfully executed:
picking up all virtual IPs on the main library server;
closing all database instances on the master library server;
and closing the main library server.
In one embodiment, the method further comprises:
and sending a fault switching result notice after the fault switching is finished.
To achieve the above object, the present invention further provides an apparatus, which includes a memory, a processor, and a database failover program stored in the memory and executable on the processor, wherein the database failover program, when executed by the processor, implements the steps of the database failover method as described above.
To achieve the above object, the present invention further provides a computer readable storage medium storing a database failover program, which when executed by a processor, implements the steps of the database failover method as described above.
When the MHA management node detects that the main library server has a fault, the method, the device and the computer storage medium select one slave library from the slave libraries to be switched into a new main library to perform slave-master-slave synchronous data operation. The MHA management node calls a shutdown API to close the original master library during switching; meanwhile, the operation related to the data writing of the master library server is prevented through the suicide process; the two measures can be successful only one, and when one measure fails and the other measure succeeds, the purpose can be achieved, so that the problem of data confusion caused by the fact that double writing of data occurs when a new master library and an old master library live at the same time in the prior art is solved.
Drawings
FIG. 1 is a schematic diagram of an apparatus according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a database failover method according to a first embodiment of the present invention;
FIG. 3 is a flowchart illustrating a database failover method according to a second embodiment of the present invention;
FIG. 4 is a flowchart illustrating a database failover method according to a third embodiment of the present invention;
FIG. 5 is a flowchart illustrating a database failover method according to a fourth embodiment of the present invention;
fig. 6 is a schematic flow chart of a database failover method according to a fifth embodiment of the present invention.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The main solution of the embodiment of the invention is as follows: when the server where the main library is located is detected to have a fault, performing fault switching; and at least one of the following steps: calling a shutdown API to close the master library server while performing fault switching; operations in the master library server associated with writing data are blocked. When the MHA management node detects that the master library server fails, one slave library is selected from the slave libraries and switched to a new master library to perform slave-master-slave synchronous data operation. The MHA management node calls a shutdown API to close the original master library during switching; meanwhile, the operation related to the data writing of the master library server is prevented through the suicide process; the two measures can be successful only one, and when one measure fails and the other measure succeeds, the purpose can be achieved, so that the problem of data confusion caused by the fact that double writing of data occurs when a new master library and an old master library live at the same time in the prior art is solved.
As an implementation manner, fig. 1 may be shown, where fig. 1 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
Processor 1100 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 1100. The processor 1100 described above may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 1200, and the processor 1100 reads the information in the memory 1200 and performs the steps of the above method in combination with the hardware thereof.
It will be appreciated that memory 1200 in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (ddr Data Rate SDRAM, ddr SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory 1200 of the systems and methods described in connection with the embodiments of the invention is intended to comprise, without being limited to, these and any other suitable types of memory.
For a software implementation, the techniques described in this disclosure may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described in this disclosure. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Based on the structure, the embodiment of the database fault automatic switching method is provided.
Referring to fig. 2, fig. 2 is a first embodiment of the database failure automatic switching method of the present invention, which includes the following steps:
step S110, when the server where the main library is located is detected to be in failure, failure switching is carried out.
In this embodiment, the automatic database failover is mainly completed based on the High Availability management of the MySQL database of the MHA, and the MHA (Master High Availability, High Availability of the Master database) is a High Availability solution for solving the MySQL database failover, which is developed by the DeNA corporation of japan, youshimat, and is an excellent set of High Availability software for performing failover and Master-slave promotion in the MySQL High Availability environment. In the MySQL fault switching process, the MHA can automatically complete the fault switching operation of the database within 0-30 seconds, and in the fault switching process, the MHA can ensure the consistency of data to the maximum extent so as to achieve high availability in the true sense.
When the MHA is applied, the MHA consists of an MHA Manager (management Node) and an MHA Node (data Node), wherein the MHA Manager management Node is used for periodically detecting the running state of a master (master library) Node in the MySQL database cluster, in the MHA application, the master can carry out reading and writing data operations, the slave can only carry out reading data operations, and the MHA Manager can only monitor the running condition of the master.
MHA Manager: multiple master/slave clusters are typically managed deployed individually on a single machine, each master/slave cluster being referred to as an application. Or be deployed on a slave node.
MHA Node: running on each Mysql server/Master/slave/Manager, it expedites failover by monitoring scripts that have the ability to parse and clean logs.
So when MHA Manager detects that the server where the master library is located fails, the failure includes the failure of the hardware of the server of the master library and the failure of the network that cannot be accessed through ssh (secure shell protocol). Network failures are classified as recoverable and non-recoverable. When the MHA Manager (management node) detects that a hardware failure and a network unrecoverable failure occur in a server where the main library is located, the MHA Manager (management node) calls the MHA main program to perform failover.
The step of performing a failover comprises the steps of:
and step S111, saving the binary log event of the master library.
In this implementation, the MHA Node (data Node) stores the binary log events of the main library when the main library server fails, i.e., the main library goes down.
In step S112, a new master library is selected from the preset number of slave libraries according to a preset rule.
In this embodiment, in an application of the MHA, a slave library containing the latest data may be selected from a preset number of slave libraries as a new master library, for example, in the application of the MHA, the application includes a master library and three slave libraries, where the master library has a log to 102, and when only three slave libraries receive 100, 101, and 99, respectively, the master library has a fault, and the MHA Node (data Node) may identify the state of each slave library, and select the slave library slave2 corresponding to 101 with the newest log as the new master library. It is also possible to pre-assign a slave library as the new master library from a preset number of slave libraries, for example, in MHA applications, comprising a master library and three slave libraries, the first slave library slave1 is pre-assigned as the new master library.
And step S113, reading the difference relay log from the new master library and copying the difference relay log to the rest slave libraries.
In this embodiment, the difference relay log read from the new master library is copied to the remaining slave libraries, where the MHA is preferably combined with the semi-synchronous copy. If only one slave has received the latest binary log, the MHA can apply the latest binary log to all other slave servers, thereby ensuring data consistency of all nodes. For example, according to the fact that the slave2 contains the latest log 101, the slave1 is 100, and the slave3 is 99, the log 101 is copied into the slave 1; the logs 100 and 101 are copied into the slave3, so that the consistency of data is guaranteed.
Step S114, copying the binary log event to the new master library.
In this embodiment, the binary log events saved in the slave master library are copied to the new master library, for example, according to the log to 102 of the master library when the master library server fails, when the selected new master library is the slave2, the log 102 is copied to the slave library slave2, and when the selected new master library is the slave1, the log 101 and the log 102 are copied to the slave library slave 1.
And step S115, performing complement data operation on the residual slave libraries according to the data in the new master library, and switching to the synchronous data of the slave new master library.
In this embodiment, the remaining slave banks are subject to a completion data operation based on the data of the new master bank, e.g., the log 102 is copied to the slave bank slave1 and the slave bank slave 3. And pointing the slave library slave1 and the slave library slave3 to the new master library to perform the operation of synchronizing data from the new master library.
And at least one of the following steps:
step S120, call shutdown API to close the master library server while performing failover.
In this embodiment, the MHA Manager (management node) calls a shutdown API in the management server through an mhafailaover program (failover program) to close the master library server while performing failover.
Step S130, blocking the operation related to writing data in the master library server.
In this embodiment, the main library server includes a killmysql program (suicide process), and when the main library server fails, the killmysql program will prevent operations related to the main library server and write data.
In the technical solution provided in this embodiment, when the MHA management node detects that the master library server fails, one slave library is selected from the slave libraries and switched to a new master library to perform a slave-master-slave synchronization data operation. The MHA management node calls a shutdown API to close the original master library during switching; meanwhile, the operation related to the data writing of the master library server is prevented through the suicide process; the two measures can be successful only one, and when one measure fails and the other measure succeeds, the purpose can be achieved, so that the problem of data confusion caused by the fact that double writing of data occurs when a new master library and an old master library live at the same time in the prior art is solved.
Referring to fig. 3, fig. 3 is a second embodiment of the database failure automatic switching method of the present invention, including:
compared with the first embodiment, the second embodiment includes step S210, step S220, and step S230, and other steps are the same as the first embodiment and are not repeated.
Step S210, the MHA management node detects whether the main library server is reachable in real time.
In this embodiment, the MHA management node detects whether data of a main library server ssh (secure shell protocol) is reachable in real time, and if the data of the main library server ssh is reachable, it indicates that the main library server is working normally; if the data of the main library server ssh is not reachable, the possible reason is that hardware of the main library server fails or a network fails, so that the main library is down.
Step S220, when the primary library server is detected to be unreachable, logging in a secondary check server to detect whether the primary library server is reachable again.
In this embodiment, when the MHA management node detects that the data of the main library server ssh is not reachable, the MHA main program in the MHA management node logs in the secondary check server to detect whether the data of the main library server ssh is reachable again, and if the data of the main library server ssh is reachable, a network problem occurs during the first detection, and the network problem is recovered at present; if the data of the main library server ssh is not reachable, the possible reason is that hardware of the main library server fails or an unrecoverable network fails, so that the main library is down.
Step S230, when the secondary check server detects that the primary library server is not reachable, it is determined that the primary library server fails, and a failover is performed.
In this embodiment, when the secondary inspection server detects that the data of the main library server ssh is not reachable, it is determined that the main library server fails, which causes a downtime of the main library, and a failover is performed.
And at least one of the following steps:
in step S240, a shutdown API is called to close the master library server while performing failover.
Step S250, blocking operations related to writing data in the master library server.
In the technical scheme provided by this embodiment, whether the main library server is reachable is detected in real time by the MHA management node, when it is detected that the main library server is not reachable for the first time, the MHA main program in the MHA management node is logged in the secondary check server to detect whether the main library server is reachable again, and when it is detected that the main library server is not reachable for the second time, it is determined that the main library server fails, which causes the main library to be down, and the fault switching is performed. Whether the main library server can be reached or not is judged through twice detection, the judgment accuracy is guaranteed, and the problem that fault switching is directly carried out when the first detection and judgment is wrong is avoided.
Referring to fig. 4, fig. 4 is a third embodiment of the database failure automatic switching method of the present invention, including:
step S310, when the server where the main library is located is detected to be in failure, failure switching is carried out.
And at least one of the following steps:
compared with the first embodiment, the third embodiment includes step S320, and other steps are the same as those of the first embodiment and are not repeated.
Step S320, the MHA management node concurrently calls a preset number of shutdown APIs to remotely shutdown the master library server; wherein the shutdown operations of the plurality of shutdown APIs are idempotent.
In this embodiment, the shutdown API service (shutdown API) provides the following functions: 1. authenticating and checking whether the user password is correct or not, whether the user password has the authority to inquire and manage the IP or not, and whether the user password has the authority to shut down the computer or not; 2. inquiring a management IP of the server; 3. and remotely shutting down the server.
Each database server deploys a getserverinfo program (information program of the acquisition server), the program regularly collects the local IP and the management IP, inquires whether the local IP and the management IP exist in a CMDB (configuration management database), inserts the local IP and the management IP if the local IP and the management IP do not exist in the CMDB, checks whether the local IP and the management IP are consistent if the local IP and the management IP do not exist in the CMDB, and updates the record in the CMDB if the local IP and the management IP are inconsistent. The records inserted and updated in the CMDB are mailed and informed of the DBA (database administrator).
And the shutdown API can regularly and totally pull the corresponding relation between the CMDB database server IP and the management IP and cache the corresponding relation in the local. When receiving the request of inquiring management IP or shutting down, it will inquire management IP from CMDB first, if failing, it will take out management IP from local buffer.
When receiving the request of inquiring management IP or shutting down, the password and authority of the user can be checked. And if the test is successful, executing the request, if a shutdown request is received, taking out the management IP, deploying the shutdown API service by a plurality of different servers respectively, and simultaneously and concurrently calling all configured shutdown APIs through an mhafailalover program (fault switching program) in the MHA management node to perform remote shutdown through an ipmi (intelligent management platform interface), wherein the shutdown operations of the shutdown APIs are idempotent. And the ipmi remote shutdown network is independent from the external communication network of the database, and when the external communication network of the database fails, the ipmi independent network can still communicate with the server where the database is located, so that the remote shutdown can be realized. For example, it is preferable to have 3 shutdown API services to prevent unsuccessful shutdown with a single point of failure, as long as one API in the three is successfully shutdown.
Step S330, the operation related to the write data in the master library server is prevented.
In the technical scheme provided by this embodiment, an mhafailalover program (failover program) in an MHA management node simultaneously and concurrently calls all configured shutdown APIs to perform remote shutdown through an ipmi (intelligent management platform interface), where shutdown operations of multiple shutdown APIs are idempotent. The purpose of closing the master library server can be achieved as long as one API is successfully closed.
Referring to fig. 5, fig. 5 is a fourth embodiment of the database failure automatic switching method of the present invention, including:
step S410, when the server where the main library is located is detected to have a fault, the fault switching is carried out.
In step S420, a shutdown API is called to close the master library server while performing failover.
Compared with the first embodiment, the fourth embodiment includes step S430, step S440, step S450, step S460, step S470, step S480, step S490, and step S4100, and other steps are the same as those of the first embodiment and are not repeated.
In step S430, the suicide process concurrently calls the health check api of the health check services in the preset number according to the preset time interval to perform communication health check.
In this embodiment, the killmysql program (suicide process) provides the following functions: 1. checking the dilemma of whether the user is in a network island; 2. setting all MySQL on the native machine as read-only; 3. picking up all VIPs (virtual IP) on the machine; 4, closing all MySQL instances on the native machine; 5. shutdown the machine.
The suicide process killmysql concurrently calls a preset number of health check api of health check service (health check service) according to a preset time interval, each round is a health check api concurrently requesting all health check services, the preset time interval may preferably be 3 seconds, and the preset number may preferably be 3, that is, 3 health check services are configured.
Step S440, when the health examination api of the health examination services with the preset number detects that the communication between the suicide process and the outside fails for the continuous preset times, preliminarily judging that the suicide process is in the predicament of a network island.
In this embodiment, it may be preferable to preliminarily determine that the suicide process is in a trouble of network island when the health check api of the 3 health check services fails to communicate with the outside world through the killmysql program (suicide process) after 5 consecutive checks.
Step S450, the suicide process disconnects all database instances on the master library server, waits for a preset time and then checks whether connection from the application exists.
In this embodiment, the killmysql program (suicide process) disconnects all database (MySQL) instances on the master library server, and may preferably wait for 30 seconds to check whether there is a connection from the application, and if so, the dilemma that the suicide process is in the network island is cancelled; and if not, finally determining that the self is in the predicament of network island.
And step S460, when no connection from the application is detected, finally judging that the suicide process is in the islanding predicament of the network.
In this embodiment, when it is checked within the preset time that the suicide process has no connection from the application, it is finally determined that the suicide process is in the trouble of network island.
Step S470, setting all database instances on the master library server as read-only, and not allowing to change data.
In this embodiment, the MHA Manager (management node) sets all database (MySQL) instances on the master library server to read only, and does not allow data to be changed.
Executing the following steps, when the execution of a certain action fails, continuing to execute the action downwards until one action is successfully executed:
and step S480, picking up all the virtual IPs on the main library server.
In this embodiment, the MHA Manager (management node) picks up all VIPs (virtual IPs) on the master library server.
Step S490, closing all database instances on the master library server.
In this embodiment, the MHA Manager closes all database (MySQL) instances on the master library server.
Step S4100, the master library server is closed.
In this embodiment, the MHA Manager (management node) closes the master library server.
In the technical scheme provided by the embodiment, the suicide process concurrently calls the health check api of the health check services in a preset number according to a preset time interval to perform communication health check; when the health examination api of the health examination services with the preset number detects that the communication between the suicide process and the outside fails for the continuous preset times, preliminarily judging that the suicide process is in the predicament of a network island; the suicide process disconnects all database instances on the master library server, waits for a preset time and then checks whether connection from an application exists; when no connection from the application is detected, finally judging that the suicide process is in a network island predicament; setting all database instances on the main library server as read-only, and not allowing data to be changed; executing the following steps, when the execution of a certain action fails, continuing to execute the action downwards until one action is successfully executed: picking up all virtual IP on the main library server; closing all database instances on the master library server; the master library server is shut down. The main library data can be prevented from being writable to the maximum extent possible by the steps.
Referring to fig. 6, fig. 6 is a fifth embodiment of the database failover method of the present invention, including:
step S510, when detecting that the server where the master library is located fails, performing failover.
In step S520, a shutdown API is called to close the master library server while performing failover.
Step S530, blocking the operation related to writing data in the master library server.
Compared with the first embodiment, the fifth embodiment includes step S540, and other steps are the same as those of the first embodiment and are not repeated.
And step S540, after the fault switching is finished, sending a fault switching result notice.
In this embodiment, after the failover step is completed, a failover result database administrator (DBA) may be sent via mail, WeChat, or the like. For example, according to the above, the slave library slave2 is switched to the new master library, the slave library slave1 and the slave library slave3 point to the new master library slave2, and the switching result is sent to the database administrator by mail and wechat.
In the technical solution provided in this embodiment, after the failover step is completed, a database administrator (DBA) of the failover result may be sent via mail, WeChat, and the like. Database management personnel can receive the fault switching result at the first time, so that subsequent management work can be conveniently carried out.
The present invention also provides an apparatus comprising a memory, a processor, and a database failover program stored in the memory and operable on the processor, the database failover program, when executed by the processor, implementing the steps of the database failover method as described above.
The present invention also provides a computer-readable storage medium storing a database failover program that, when executed by a processor, implements the steps of the database failover method as described above.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. The automatic database fault switching method is characterized by comprising the following steps of:
when the server where the main library is located is detected to have a fault, performing fault switching; and at least one of the following steps:
calling a shutdown API to close the master library server while performing fault switching;
operations in the master library server associated with writing data are blocked.
2. The database failure automatic switching method according to claim 1, wherein detecting whether the server where the master library is located fails comprises:
the MHA management node detects whether a main library server is reachable in real time;
when detecting that the main library server is not reachable, logging in a secondary check server to detect whether the main library server is reachable again;
and when the secondary check server detects that the primary library server is not reachable, judging that the primary library server fails.
3. The database failover method of claim 1 wherein the step of performing failover comprises:
saving the binary log event of the master library;
selecting a new master library from a preset number of slave libraries according to a preset rule;
reading a difference relay log from the new master library and copying the difference relay log to the rest slave libraries;
copying the binary log event into the new master library;
and performing complement data operation on the residual slave libraries according to the data in the new master library, and switching to the synchronous data of the new slave master library.
4. The database failover method of claim 3 wherein the step of selecting a new master library from a predetermined number of slave libraries according to a predetermined rule comprises:
and selecting the slave library containing the latest data from a preset number of slave libraries as a new master library.
5. The database failover method of claim 3 wherein the step of selecting a new master library from a predetermined number of slave libraries according to a predetermined rule comprises:
one slave library is designated in advance from a preset number of slave libraries as a new master library.
6. The database failover method of claim 1 wherein the step of calling a shutdown API to shutdown the primary library server comprises:
the MHA management node concurrently calls a preset number of shutdown APIs to remotely shutdown the main library server;
wherein the shutdown operations of the plurality of shutdown APIs are idempotent.
7. The database failover method of claim 1 wherein the step of blocking operations in the primary library server that relate to write data comprises:
the suicide process concurrently calls the health check api of the health check services in a preset number according to a preset time interval to carry out communication health check;
when the health examination api of the health examination services with the preset number detects that the suicide process fails to communicate with the outside for continuous preset times, preliminarily judging that the suicide process is in the predicament of a network island;
the suicide process disconnects all database instances on the master library server, waits for a preset time and then checks whether connection from an application exists;
when no connection from the application is detected, finally judging that the suicide process is in the island predicament of the network;
setting all database instances on the main library server as read-only, and not allowing data to be changed;
executing the following steps, when the execution of a certain action fails, continuing to execute the action downwards until one action is successfully executed:
picking up all virtual IPs on the main library server;
closing all database instances on the master library server;
and closing the main library server.
8. The database failover method of claim 1, further comprising:
and sending a fault switching result notice after the fault switching is finished.
9. An apparatus comprising a memory, a processor, and a database failover program stored in the memory and executable on the processor, the database failover program when executed by the processor implementing the steps of the database failover method of any one of claims 1-8.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a database failover program, which when executed by a processor implements the steps of the database failover method according to any one of claims 1-8.
CN202011387011.1A 2020-11-30 2020-11-30 Database fault automatic switching method and device and computer storage medium Pending CN112486718A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011387011.1A CN112486718A (en) 2020-11-30 2020-11-30 Database fault automatic switching method and device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011387011.1A CN112486718A (en) 2020-11-30 2020-11-30 Database fault automatic switching method and device and computer storage medium

Publications (1)

Publication Number Publication Date
CN112486718A true CN112486718A (en) 2021-03-12

Family

ID=74938696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011387011.1A Pending CN112486718A (en) 2020-11-30 2020-11-30 Database fault automatic switching method and device and computer storage medium

Country Status (1)

Country Link
CN (1) CN112486718A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114785762A (en) * 2022-03-23 2022-07-22 深圳市飞泉云数据服务有限公司 Method and device for realizing cloud computing system, terminal equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102194009A (en) * 2011-06-09 2011-09-21 北京新媒传信科技有限公司 Database hosting method and database hosting platform system
CN103064860A (en) * 2011-10-21 2013-04-24 阿里巴巴集团控股有限公司 Database high availability implementation method and device
CN104036043A (en) * 2014-07-01 2014-09-10 浪潮(北京)电子信息产业有限公司 High availability method of MYSQL and managing node
US9984140B1 (en) * 2015-02-05 2018-05-29 Amazon Technologies, Inc. Lease based leader election system
CN109871369A (en) * 2018-12-24 2019-06-11 天翼电子商务有限公司 Database switching method, system, medium and device
CN111400285A (en) * 2020-03-25 2020-07-10 杭州浮云网络科技有限公司 MySQ L data fragment processing method, apparatus, computer device and readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102194009A (en) * 2011-06-09 2011-09-21 北京新媒传信科技有限公司 Database hosting method and database hosting platform system
CN103064860A (en) * 2011-10-21 2013-04-24 阿里巴巴集团控股有限公司 Database high availability implementation method and device
CN104036043A (en) * 2014-07-01 2014-09-10 浪潮(北京)电子信息产业有限公司 High availability method of MYSQL and managing node
US9984140B1 (en) * 2015-02-05 2018-05-29 Amazon Technologies, Inc. Lease based leader election system
CN109871369A (en) * 2018-12-24 2019-06-11 天翼电子商务有限公司 Database switching method, system, medium and device
CN111400285A (en) * 2020-03-25 2020-07-10 杭州浮云网络科技有限公司 MySQ L data fragment processing method, apparatus, computer device and readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘钊等: "MySQL数据库故障转移工具MHA的研究与应用", 广西民族大学学报(自然科学版), vol. 18, no. 03, 30 September 2012 (2012-09-30), pages 62 - 65 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114785762A (en) * 2022-03-23 2022-07-22 深圳市飞泉云数据服务有限公司 Method and device for realizing cloud computing system, terminal equipment and storage medium

Similar Documents

Publication Publication Date Title
US11194679B2 (en) Method and apparatus for redundancy in active-active cluster system
RU2751551C1 (en) Method and apparatus for restoring disrupted operating ability of a unit, electronic apparatus and data storage medium
CN108958970B (en) Data recovery method, server and computer readable medium
KR102145136B1 (en) Data processing method and device
CN104679611B (en) Data resource clone method and device
CN109308227B (en) Fault detection control method and related equipment
CN108345617B (en) Data synchronization method and device and electronic equipment
CN104036043A (en) High availability method of MYSQL and managing node
CN110807064A (en) Data recovery device in RAC distributed database cluster system
CN111355600B (en) Main node determining method and device
CN108600284B (en) Ceph-based virtual machine high-availability implementation method and system
CN111342986B (en) Distributed node management method and device, distributed system and storage medium
CN106533751B (en) SDN controller cluster merging method and device
CN112486718A (en) Database fault automatic switching method and device and computer storage medium
CN110858168A (en) Cluster node fault processing method and device and cluster node
CN107291575B (en) Processing method and equipment for data center fault
CN108156203B (en) Storage system and storage node management method
CN113596195B (en) Public IP address management method, device, main node and storage medium
CN115604088A (en) Main/standby switching method, device, equipment and storage medium of component cluster system
CN112905696B (en) Multi-computer-room synchronization method based on transaction identifier, computing device and storage medium
CN114297182A (en) Industrial model data management method, device, equipment and readable storage medium
CN115686951A (en) Fault processing method and device for database server
CN112612652A (en) Distributed storage system abnormal node restarting method and system
CN110113187B (en) Configuration updating method and device, configuration server and configuration system
CN108628701B (en) Cache data protection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination