CN111460035A - Database system and disaster tolerance method thereof - Google Patents

Database system and disaster tolerance method thereof Download PDF

Info

Publication number
CN111460035A
CN111460035A CN202010227680.6A CN202010227680A CN111460035A CN 111460035 A CN111460035 A CN 111460035A CN 202010227680 A CN202010227680 A CN 202010227680A CN 111460035 A CN111460035 A CN 111460035A
Authority
CN
China
Prior art keywords
equipment
data
slave
cross
log file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010227680.6A
Other languages
Chinese (zh)
Inventor
尹恺雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202010227680.6A priority Critical patent/CN111460035A/en
Publication of CN111460035A publication Critical patent/CN111460035A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2082Data synchronisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application is applicable to the technical field of computers, and provides a database system and a disaster recovery method thereof, wherein the database system comprises: the system comprises local host equipment, same-machine-room slave equipment, same-ground cross-machine-room slave equipment, cross-ground slave equipment and a data verification module; the local host machine equipment and the same-machine-room slave machine equipment are located at a first position of a first city, the same-ground cross-machine-room slave machine equipment is located at a second position of the first city, and the cross-ground slave machine equipment is located at any position of a second city; and the data checking module is used for checking data consistency based on the log file of the local host equipment and the log file of the slave equipment in the same machine room when the local host equipment fails. By arranging a plurality of slave devices locally and remotely, the disaster tolerance capability of the database system is improved, and when the master device fails, the data consistency is checked based on the log file of the local master device and the log file of the slave device, so that the accuracy of the data of the device which is used as the master can be ensured.

Description

Database system and disaster tolerance method thereof
Technical Field
The application belongs to the technical field of computers, and particularly relates to a database system and a disaster recovery method thereof.
Background
With the rapid development of information technology, data information becomes an increasingly important enterprise asset. Accordingly, there is also an increasing interest in information systems and data security for businesses. Most enterprises adopt a disaster recovery mode of local data protection to provide basic guarantee for data information safety in the whole production system. However, the existing disaster recovery method generally cannot prevent the machine room from being damaged by a major disaster, so that the security of data information is reduced, and thus a disaster recovery method for cross-ground data protection is created, while the disaster recovery method for cross-ground data protection may cause data synchronization errors due to unreliability of physical backup and logical backup of a database, and the existing disaster recovery method for cross-ground data protection cannot reliably detect replication abnormality of system data, so that the accuracy of data information cannot be ensured.
Disclosure of Invention
The embodiment of the application provides a database system and a disaster recovery method thereof, which can solve the problem that the accuracy of data information cannot be ensured due to the fact that the current disaster mode cannot reliably detect the copy abnormality of system data.
In a first aspect, an embodiment of the present application provides a database system, including a local host device, a same-machine-room slave device, a same-ground cross-machine-room slave device, a cross-ground slave device, and a data verification module;
the local host equipment is respectively connected with the same-machine-room slave equipment and the same-ground cross-machine-room slave equipment, and the same-ground cross-machine-room slave equipment is mounted with the cross-ground slave equipment; the local host machine equipment and the same machine room slave machine equipment are located at a first position of a first city, the same-ground cross-machine room slave machine equipment is located at a second position of the first city, and the cross-ground slave machine equipment is located at any position of a second city;
when the local host equipment fails, selecting one equipment from the same-machine-room slave equipment, the same-ground cross-machine-room slave equipment and the cross-ground slave equipment as a target slave equipment according to a preset lifting authority;
the data checking module is used for performing data consistency checking based on the log file of the local host device and the log file of the target slave device when the local host device fails, and upgrading the target slave device to a host device when the data consistency checking is passed.
In a possible implementation manner of the first aspect, data synchronization is implemented between the local host device and the same-machine-room slave device by using a semi-synchronous replication manner;
the local host equipment and the co-location cross-machine room slave equipment realize data synchronization by using an asynchronous replication mode;
and the co-location cross-machine room slave equipment and the cross-location slave equipment realize data synchronization by using an asynchronous replication mode.
In a possible implementation manner of the first aspect, the co-location inter-machine-room slave device mounts the inter-ground slave device in a chain mounting manner; the local host equipment is connected with the same machine room slave equipment through a physical special line; and the local host equipment and the co-location cross-machine room slave equipment are connected through a wireless local area network.
In a possible implementation manner of the first aspect, the database system further includes a memory database, and the data verification module is disposed in the memory database.
In a second aspect, an embodiment of the present application provides a disaster recovery method, where the disaster recovery method is applied to a database system according to the first aspect, and the disaster recovery method includes:
monitoring whether the local host equipment fails;
if the local host equipment is monitored to be in fault, selecting one piece of slave equipment from the same machine room slave equipment, the same-place cross-machine room slave equipment and the cross-place slave equipment as target slave equipment according to preset lifting permission;
performing data consistency check on the local host equipment and the target slave equipment;
and if the data consistency is checked to be passed, automatically promoting the authority of the target slave equipment to be the host authority.
In a possible implementation manner of the second aspect, the performing data consistency check on the local master device and the target slave device includes;
controlling the target slave device to replay the log file;
after the log file of the target slave equipment is replayed, third-party service is deployed in the memory database, and data consistency check is conducted on the log file of the local host equipment and the log file of the target slave equipment on the basis of the third-party service.
Further, after the log file of the target slave device is replayed, deploying a third-party service in the memory database, and performing data consistency check on the log file of the local host device and the log file of the target slave device based on the third-party service, including:
deploying a first probe in the local host device through the third-party service to monitor a log file of the local host device;
deploying a second probe in the target slave device through the third-party service to monitor a log file of the target slave device;
setting the keyword of the third-party service as a transaction ID, and setting the value as a log file, wherein the transaction ID in the log file of the local host equipment and the transaction ID in the log file of the target slave equipment are compared by the third-party service to determine whether the transaction ID is consistent;
if the transaction ID in the log file of the local host equipment is consistent with the transaction ID in the log file of the target slave equipment, the data consistency check is passed;
and if the transaction ID in the log file of the local host equipment is inconsistent with the transaction ID in the log file of the target slave equipment, the data consistency check is failed.
Further, when the local host device works normally, after the local host device completes a transaction, writing the transaction into a receipt confirmation table in a log file of the same-machine-room slave device to realize semi-synchronous replication of data between the local host device and the same-machine-room slave device;
the local host equipment writes the transaction into a log file of the local host equipment and sends transaction information to a Dump thread to inform the co-located cross-machine room slave equipment to realize asynchronous replication of data between the local host equipment and the co-located cross-machine room slave equipment;
after the same-place cross machine room equipment completes a transaction, the transaction is written into a receipt confirmation table in a log file of the cross-place slave equipment, so that asynchronous replication of data between the same-place cross machine room slave equipment and the cross-place slave equipment is achieved.
In a possible implementation manner of the second aspect, after the performing the data consistency check on the local master device and the target slave device, the method further includes:
and if the data consistency check fails, copying the backup data of the first backup database to the target slave equipment.
Further, the first backup database stores incremental backup data, where the incremental backup data refers to data updated from after the last full backup to before the next full backup; if the data consistency check fails, copying the backup data of the first backup database to the target slave device, including:
determining full backup data and backup updating records during the last full backup according to the incremental backup information;
searching for newly added data according to the backup updating record, and adding the newly added data into full backup data in the last full backup to obtain target backup data;
and copying the target backup data to the target slave equipment.
It is understood that the beneficial effects of the second aspect can be referred to the related description of the first aspect, and are not described herein again.
Compared with the prior art, the embodiment of the application has the advantages that: by arranging a plurality of slave devices locally and remotely, the disaster tolerance capability of the database system is improved, and when the master device fails, the data consistency is checked based on the log file of the local master device and the log file of the slave device, so that the accuracy of the data of the device which is promoted to be the master can be ensured, and the problem that the data information accuracy cannot be ensured due to the fact that the replication abnormality of the system data cannot be reliably detected in the existing disaster tolerance mode is effectively solved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic structural diagram of a database system according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a database system according to another embodiment of the present application;
fig. 3 is a schematic flow chart of a disaster recovery method according to an embodiment of the present application;
fig. 4 is a schematic flow chart of a disaster recovery method according to another embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.
As shown in fig. 1, an embodiment of the present application provides a database system, which includes a local master device 10, a same-room slave device 20, a same-ground cross-room slave device 30, a cross-ground slave device 40, and a data check module.
The local host device 10 is respectively connected with the same-machine-room slave device 20 and the same-ground cross-machine-room slave device 30, and the same-ground cross-machine-room slave device 30 is mounted with the cross-ground slave device 40; local master device 10 and co-machine room slave device 20 are located in a first location in a first city, co-local cross-machine room slave device 30 is located in a second location in the first city, and cross-ground slave device 40 is located in any location in a second city.
When the local master device 10 fails, one device is selected as a target slave device from the same-machine-room slave devices 20, the same-ground cross-machine-room slave devices 30 and the cross-ground slave devices 40 according to a preset lifting right.
The data checking module is used for performing data consistency checking based on the log file of the local host device and the log file of the target slave device when the local host device fails, and upgrading the target slave device to be the host device when the data consistency checking is passed.
Specifically, the same-place cross-machine-room slave device 30 mounts the cross-place slave device 40 in a chain mounting manner. The local master device 10 is connected with the same machine room slave device 20 through a physical special line. The local host device 10 and the co-located cross-machine room slave device 30 are connected through a wireless local area network.
Specifically, the co-ground cross-machine-room slave device 30 mounts the cross-ground slave device 40 in a chain mounting manner, and data synchronization between the co-ground cross-machine-room slave device 30 and the cross-ground slave device 40 can be ensured based on the chain cluster.
Connecting the local host device 10 and the same-room slave device 20 by a physical dedicated line can ensure low time delay of data transmission. Since the local master device 10 is co-located with the same room slave device 20, the connection can be made by a physical dedicated line.
The connection between the local host device 10 and the co-located cross-machine-room slave device 30 is realized through wireless local area network networking, so that the networking cost can be reduced, and the data transmission between the local host device 10 and the co-located cross-machine-room slave device 30 is realized.
Specifically, a semi-synchronous replication method is used between the local master device 10 and the slave device 20 in the same machine room to implement data synchronization; the data synchronization between the local host device 10 and the co-local cross-machine room slave device 30 is realized by using an asynchronous replication mode; the same-place cross-machine room slave device 30 and the cross-place slave device 40 realize data synchronization by using an asynchronous replication mode.
Specifically, when the local master device is working normally, after the local master device 10 completes a transaction (Commit), the transaction (Commit-Binlog) is written into a receipt confirmation table (receipt) in a log file (Relay L og) of the same-room slave device 20 to implement semi-synchronous replication of data between the local master device and the same-room slave device.
The local master device 10 writes a transaction (Commit) into a log file of the local master device (i.e., executes the transaction), and sends transaction information to the Dump thread to notify the peer-to-peer cross-room slave device 30 to implement asynchronous replication of data between the local master device and the peer-to-peer cross-room slave device, the local master device does not ensure that the transaction information will be written into a receipt acknowledgement table (receipt) in the log file (Relay L og) of the peer-to-peer cross-room slave device 30, but only plays a role in notifying the peer-to-peer cross-room slave device 30 to read and write the transaction information.
After completing a transaction, the co-located cross-machine room equipment 20 writes the transaction into a receipt acknowledgement table in a log file of the cross-located slave equipment 40, so as to implement asynchronous replication of data between the co-located cross-machine room slave equipment 20 and the cross-located slave equipment.
In a possible implementation manner of this embodiment, the database system further includes a memory database, and the data verification module is disposed in the memory database. Specifically, the memory database is a redis memory database. By deploying a data verification module in the redis memory database, log file monitoring is performed on the local host device 10 and log files of each slave device through the redis memory database, so that data consistency verification is performed on the master device and the slave device.
Specifically, the log file of the local master device 10 and the log files of the respective slave devices are monitored by setting probes. And then, comparing data based on gitd of the redis memory database and the corresponding log file to obtain the data difference between the log files of the master device and the slave device. If the log files of the master device and the slave device have no data difference, the data consistency check is passed, and if the log files of the master device and the slave device have data difference, the data consistency check is not passed.
Specifically, when the local master device 10 fails, the slave device with the highest authority (i.e., the preset authority for comparing the same-machine-room slave device 20, the same-ground cross-machine-room slave device 30, and the cross-ground slave device 40) needs to be determined as the target slave device according to the preset authority. In this embodiment, the lifting authority of the slave device using the semi-synchronous copy (i.e., the same-room slave device 20) is set to be the highest, the lifting authority of the cross-ground slave device 40 is set to be the second highest, and the lifting authority of the same-ground cross-room slave device 30 is set to be the lowest.
Specifically, each slave device may monitor whether the local master device 10 fails in real time, if it is monitored that the local master device 10 fails, further determine whether the same-machine-room slave device 20 is working normally, if the same-machine-room slave device 20 is working normally, control the same-machine-room slave device 20 to replay all log files (replay) thereof, and after the replay of all log files is completed, perform data consistency check on data of the same-machine-room slave device 20 and data of the local master device 10, after the data consistency check is passed, promote the authority of the same-machine-room slave device 20 to the master authority, and use the same-machine-room slave device 20 as the master to continue to execute the service task.
Specifically, if the same-room slave device 20 has a fault, it is further determined whether the cross-ground slave device 40 is operating normally, and if the cross-ground slave device 40 is operating normally, the cross-ground slave device 40 is controlled to replay all log files (replay), and after the replay of all log files is completed, data consistency check is performed on data of the cross-ground slave device 40 and data of the local master device 10, and after the data consistency check is passed, the authority of the cross-ground slave device 40 is raised to the master authority, and the cross-ground slave device 40 continues to execute the service task as the master.
Specifically, if the cross-ground slave device 40 also has a fault, it is further determined whether the same-ground cross-machine-room slave device 30 is working normally, and if the same-ground cross-machine-room slave device 30 is working normally, the same-ground cross-machine-room slave device is controlled to replay all log files thereof, and after the replay of all log files is completed, data consistency check is performed on data of the same-ground cross-machine-room slave device 30 and data of the local host device 10, and after the data consistency check is passed, the authority of the same-ground cross-machine-room slave device 30 is raised to the host authority, and the same-ground cross-machine-room slave device 30 is used as the host to continue to execute the service task.
It should be noted that, the cases of simultaneous failures of the local master device 10, the same-room slave device 20, the same-ground cross-room slave device 30, and the cross-ground slave device 40 are not substantially simultaneous, so that high availability of the database system is substantially ensured.
According to the database system provided by the embodiment of the application, the plurality of slave devices are arranged locally and in different places, so that the disaster recovery capability of the database system is improved, when the host device breaks down, the data consistency is verified based on the log file of the local host device and the log file of the slave device, the accuracy of the data of the host device can be ensured, and the problem that the data information accuracy cannot be ensured due to the fact that the replication abnormality of the system data cannot be reliably detected in the existing disaster recovery mode is effectively solved.
As shown in fig. 2, another embodiment of the present application provides a database system, which is different from the previous embodiment, the database system of the present embodiment further includes a first backup database 11, a second backup database 12, a third backup database 13, and a fourth backup database 14.
The first backup database 11 is connected with the local host device 10; the second backup database 21 is connected with the same machine room slave device 20; the third backup database 31 is connected to the co-located cross-machine-room slave device 30, and the fourth backup database 41 is connected to the cross-located slave device 40.
Specifically, the first backup database 11 stores backup data of the local master device 10, the second backup database 21 stores backup data of the same-room slave device 20, the third backup database 31 stores backup data of the same-place cross-room slave device 30, and the fourth backup database 41 stores backup data of the cross-room slave device 40. The first backup database 11, the second backup database 21, the third backup database 31, and the fourth backup database 41 perform data recovery after performing the deletion error operation, and may perform data recovery when the data consistency check fails. The backup data of each backup database comprises full physical backup and incremental physical backup.
Specifically, after the failure of each device is resolved, data recovery may be performed by the backup database connected thereto. That is, after the local host device 10 recovers from a failure, the data is recovered through the first backup database 11, and the host authority is acquired again, and related business tasks (read, write, add, delete, change) are executed. (ii) a After the same-machine-room slave equipment 20 is recovered from the fault, data recovery is carried out through the second backup database 21; after the fault recovery of the cross-machine room slave device 30 on the same site, the data recovery is performed through the third backup database 31, and after the fault recovery of the cross-machine room slave device 40, the data recovery is performed through the fourth backup database 41.
According to the database system provided by the embodiment of the application, the backup database connected with each device is used for data recovery, so that the efficiency of database system fault recovery can be improved, and data recovery can be performed when the master data and the slave data are inconsistent on the basis of the backup database, so that the accuracy of the data is ensured.
Referring to fig. 3, fig. 3 shows a flowchart of an implementation of a disaster recovery method according to an embodiment of the present application, where the disaster recovery method is applied to a database system according to the previous embodiment, as shown in fig. 3, the disaster recovery method includes the following steps, which are detailed as follows:
s101: monitoring whether the local host device fails.
Specifically, after a heartbeat connection test is performed between the slave device and the local host device in the same machine room, and heartbeat alarm information of the local host device is received in a preset time period, it is determined that the local host device fails. Or the heartbeat information of the local host device is not received within a preset time period, it may be determined that the local host device sends a failure. Understandably, other slave devices may also monitor the operating state of the master device, and may also monitor the operating state of the slave devices connected thereto.
S102: and if the local host equipment is monitored to be in fault, selecting one slave equipment from the same machine room slave equipment, the same-place cross-machine room slave equipment and the cross-place slave equipment as target slave equipment according to preset lifting permission.
Specifically, when the local master device fails, the slave device with the highest authority needs to be selected as the target slave device from the same-machine-room slave devices, the same-ground cross-machine-room slave devices and the cross-ground slave devices according to the preset lifting authority. In this embodiment, the lifting authority of the slave device using the semi-synchronous copy (i.e., the same-room slave device) is set to be the highest, the lifting authority of the cross-ground slave device is set to be the second highest, and the lifting authority of the same-ground cross-room slave device is set to be the lowest.
In an embodiment, when the local master device works normally, after the local master device completes an transaction, the transaction is written into a receipt confirmation table in a log file of the same-machine-room slave device, so as to implement semi-synchronous replication of data between the local master device and the same-machine-room slave device;
the local host equipment writes the transaction into a log file of the local host equipment and sends transaction information to a Dump thread to inform the co-located cross-machine room slave equipment to realize asynchronous replication of data between the local host equipment and the co-located cross-machine room slave equipment;
after the same-place cross machine room equipment completes a transaction, the transaction is written into a receipt confirmation table in a log file of the cross-place slave equipment, so that asynchronous replication of data between the same-place cross machine room slave equipment and the cross-place slave equipment is achieved.
Specifically, the preset lifting permission may be set according to a data synchronization manner between the local master device and the local master device, between the co-located cross-machine-room slave device, and between the cross-ground slave devices. It can be understood that the data reliability of asynchronous replication is lower than that of semi-synchronous replication, and therefore, the promotion authority of the slave device using semi-synchronous replication (i.e., the same-room slave device) is set to be the highest, the promotion authority of the cross-ground slave device is set to be the second highest, and the promotion authority of the same-ground cross-room slave device is set to be the lowest.
S103: and carrying out data consistency check on the local host equipment and the target slave equipment.
Specifically, after the target slave device is determined, the target slave device needs to be controlled to perform log file playback, and the purpose of the log file playback is to ensure that the target slave device completes synchronization of data of the local master device. And after synchronization, data consistency check is carried out to verify whether data synchronization is abnormal or not, so that the accuracy of the data is ensured.
In an implementation manner of this embodiment, the step S103 includes the following steps:
controlling the target slave device to replay the log file;
after the log file of the target slave equipment is replayed, third-party service is deployed in the memory database, and data consistency check is conducted on the log file of the local host equipment and the log file of the target slave equipment on the basis of the third-party service.
Specifically, each slave device monitors whether the local master device fails in real time, if so, further determines whether the same-machine-room slave device normally works, if so, controls the same-machine-room slave device to replay all log files (replay) thereof, performs data consistency check on data of the same-machine-room slave device and data of the local master device after the replay of all log files is completed, and raises the authority of the same-machine-room slave device to the master authority after the data consistency check is passed, and the same-machine-room slave device continues to execute service tasks as a master.
Specifically, if the same machine room slave device fails, whether the cross-ground slave device normally works is further determined, if the cross-ground slave device normally works, the cross-ground slave device is controlled to replay all log files (replay) of the cross-ground slave device, after the replay of all the log files is completed, data consistency check is carried out on data of the cross-ground slave device and data of the local host device, after the data consistency check is passed, the authority of the cross-ground slave device is promoted to be the host authority, and the cross-ground slave device is used as the host to continuously execute the service task.
Specifically, if the cross-ground slave equipment also has a fault, whether the same-ground cross-machine room slave equipment normally works is further determined, if the same-ground cross-machine room slave equipment normally works, the same-ground cross-machine room slave equipment is controlled to replay all log files thereof, after the replay of all the log files is completed, data consistency check is performed on data of the same-ground cross-machine room slave equipment and data of the local host equipment, after the data consistency check is passed, the permission of the same-ground cross-machine room slave equipment is promoted to be the host permission, and the same-ground cross-machine room slave equipment is used as the host to continuously execute the service task.
Specifically, by setting a third-party service in a memory database (redis), data consistency check is performed on a log file of the local master device and a log file of the target slave device based on the third-party service.
In an implementation manner of this embodiment, deploying a third-party service in the memory database after the log file of the target slave device is replayed, and performing data consistency check on the log file of the local host device and the log file of the target slave device based on the third-party service includes the following steps:
deploying a first probe in the local host equipment through the third-party service to monitor a log file of the local host equipment;
deploying a second probe in the target slave device through the third-party service to monitor a log file of the target slave device;
setting the keyword of the third-party service as a transaction ID, and setting the value as a log file, wherein the transaction ID in the log file of the local host equipment and the transaction ID in the log file of the target slave equipment are compared by the third-party service to determine whether the transaction ID is consistent;
if the transaction ID in the log file of the local host equipment is consistent with the transaction ID in the log file of the target slave equipment, the data consistency check is passed;
and if the transaction ID in the log file of the local host equipment is inconsistent with the transaction ID in the log file of the target slave equipment, the data consistency check is failed.
Specifically, a first probe is deployed to monitor the log file (binlog) of the local host device 10 and a second probe is deployed to monitor the log file (relaylog) of the target slave device. The first probe and the second probe report monitored information to a third-party service in real time, and then set a keyword of the third-party service as a transaction ID (key) and a value as a log file (value) based on a redis memory database, so that the third-party service compares whether the transaction ID in the log file (binlog) of the local host equipment is consistent with the transaction ID in the log file (relaylog) of the target slave equipment; if the transaction ID in the log file of the local host equipment is consistent with the transaction ID in the log file of the target slave equipment, the data consistency check is passed; and if the transaction ID in the log file of the local host equipment is inconsistent with the transaction ID in the log file of the target slave equipment, the data consistency check is failed. Since Redis supports list objects and the single machine performance is excellent, the method is particularly suitable for accessing the difference of log files of the master equipment and the slave equipment under the same gtid. And second-level verification of master-slave data consistency is realized.
S104: and if the data consistency is checked to be passed, automatically promoting the authority of the target slave equipment to be the host authority.
Specifically, as long as the data consistency check passes, it indicates that the data of the target slave device is completely consistent with the data of the local master device, so that the authority of the target slave device can be upgraded to the master authority, and the target slave device continues to execute the task as the master.
The disaster recovery method provided by the embodiment can improve the disaster recovery capability of the database system by arranging a plurality of slave devices locally and remotely, and can ensure the accuracy of the data promoted to the master device when the master device fails and the log file of the local master device and the log file of the slave device are subjected to data consistency verification based on the log file of the local master device and the log file of the slave device, thereby effectively solving the problem that the current disaster recovery mode cannot reliably detect the copy abnormality of the system data and cannot ensure the accuracy of the data information.
Referring to fig. 4, fig. 4 is a flowchart illustrating an implementation of a disaster recovery method according to another embodiment of the present application, different from the previous embodiment, the disaster recovery method further includes the following steps:
s105: and if the data consistency check fails, copying the backup data of the first backup database to the target slave equipment.
Specifically, the first backup database stores backup data of the local master device, and if the data consistency check between the local master device and the target slave device to be promoted as the master device fails, it indicates that synchronization of the master data and the slave data is faulty, so that the backup data stored in the first backup database needs to be synchronized into the target slave device, that is, the backup data in the first backup database is copied into the target slave device, so as to ensure the correctness of the data.
The first backup database can store full backup data, the full backup refers to the backup of all data in the database (local host equipment), and for the full backup, the accuracy of the data can be ensured only by synchronizing the backup data to the target slave equipment.
In a possible implementation manner of this embodiment, the first backup database stores incremental backup data, and the S105 includes:
determining full backup data and backup updating records during the last full backup according to the incremental backup information;
searching for newly added data according to the backup updating record, and adding the newly added data into full backup data in the last full backup to obtain target backup data;
and copying the target backup data to the target slave equipment.
Specifically, incremental backup data refers to data that is updated from after the last full backup to before the next full backup. When the incremental backup data is stored, incremental backup information such as a full backup version corresponding to the last full backup and a backup update record corresponding to the updated data each time can be recorded at the same time. For the incremental backup data, the corresponding full backup version needs to be searched according to the incremental backup information, then the full backup version is restored to the latest version (namely, the target backup data) according to the data in the update record, and the target backup data contained in the latest version is synchronized to the target slave device, so that the accuracy of the data is ensured.
The disaster recovery method provided by the embodiment of the application can be used for recovering data when the master data and the slave data are inconsistent based on the backup database, and the accuracy of the data is effectively ensured.
The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.
The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A database system is characterized by comprising local host equipment, same-machine-room slave equipment, same-ground cross-machine-room slave equipment, cross-ground slave equipment and a data verification module;
the local host equipment is respectively connected with the same-machine-room slave equipment and the same-ground cross-machine-room slave equipment, and the same-ground cross-machine-room slave equipment is mounted with the cross-ground slave equipment; the local host machine equipment and the same machine room slave machine equipment are located at a first position of a first city, the same-ground cross-machine room slave machine equipment is located at a second position of the first city, and the cross-ground slave machine equipment is located at any position of a second city;
when the local host equipment fails, selecting one equipment from the same-machine-room slave equipment, the same-ground cross-machine-room slave equipment and the cross-ground slave equipment as a target slave equipment according to a preset lifting authority;
the data checking module is used for performing data consistency checking based on the log file of the local host device and the log file of the target slave device when the local host device fails, and upgrading the target slave device to a host device when the data consistency checking is passed.
2. The database system of claim 1, wherein data synchronization is achieved between the local master device and the slave device of the same machine room using a semi-synchronous replication method;
the local host equipment and the co-location cross-machine room slave equipment realize data synchronization by using an asynchronous replication mode;
and the co-location cross-machine room slave equipment and the cross-location slave equipment realize data synchronization by using an asynchronous replication mode.
3. The database system of claim 1, wherein the co-located cross-room slave device mounts the cross-ground slave device by means of chain mounting; the local host equipment is connected with the same machine room slave equipment through a physical special line; and the local host equipment and the co-location cross-machine room slave equipment are connected through a wireless local area network.
4. The database system according to any one of claims 1 to 3, wherein the database system further comprises an in-memory database, the data check module being deployed in the in-memory database.
5. A disaster recovery method applied to the database system according to any one of claims 1 to 4, the disaster recovery method comprising:
monitoring whether the local host equipment fails;
if the local host equipment is monitored to be in fault, selecting one piece of slave equipment from the same machine room slave equipment, the same-place cross-machine room slave equipment and the cross-place slave equipment as target slave equipment according to preset lifting permission;
performing data consistency check on the local host equipment and the target slave equipment;
and if the data consistency is checked to be passed, automatically promoting the authority of the target slave equipment to be the host authority.
6. The disaster recovery method according to claim 5, wherein performing a data consistency check on said local master device and said target slave device comprises;
controlling the target slave device to replay the log file;
after the log file of the target slave equipment is replayed, third-party service is deployed in an internal memory database, and data consistency check is conducted on the log file of the local host equipment and the log file of the target slave equipment on the basis of the third-party service.
7. The disaster recovery method of claim 6, wherein deploying a third party service in an in-memory database after completion of replay of the log file of the target slave device, and performing a consistency check on the log file of the local master device and the log file of the target slave device based on the third party service comprises:
deploying a first probe in the local host device through the third-party service to monitor a log file of the local host device;
deploying a second probe in the target slave device through the third-party service to monitor a log file of the target slave device;
setting the keyword of the third-party service as a transaction ID, and setting the value as a log file, wherein the transaction ID in the log file of the local host equipment and the transaction ID in the log file of the target slave equipment are compared by the third-party service to determine whether the transaction ID is consistent;
if the transaction ID in the log file of the local host equipment is consistent with the transaction ID in the log file of the target slave equipment, the data consistency check is passed;
and if the transaction ID in the log file of the local host equipment is inconsistent with the transaction ID in the log file of the target slave equipment, the data consistency check is failed.
8. The disaster recovery method according to claim 5, wherein when the local master device is operating normally, after the local master device completes a transaction, the transaction is written into a receipt confirmation table in a log file of the same-room slave device, so as to implement semi-synchronous replication of data between the local master device and the same-room slave device;
the local host equipment writes the transaction into a log file of the local host equipment and sends transaction information to a Dump thread to inform the co-located cross-machine room slave equipment to realize asynchronous replication of data between the local host equipment and the co-located cross-machine room slave equipment;
after the same-place cross machine room equipment completes a transaction, the transaction is written into a receipt confirmation table in a log file of the cross-place slave equipment, so that asynchronous replication of data between the same-place cross machine room slave equipment and the cross-place slave equipment is achieved.
9. The disaster recovery method according to claim 5, wherein after said checking data consistency between said local master device and said target slave device, further comprising:
and if the data consistency check fails, copying the backup data of the first backup database to the target slave equipment.
10. The disaster recovery method according to claim 9, wherein said first backup database stores incremental backup data, said incremental backup data being updated from after a last full backup until a next full backup; if the data consistency check fails, copying the backup data of the first backup database to the target slave device, including:
determining full backup data and backup updating records during the last full backup according to the incremental backup information;
searching for newly added data according to the backup updating record, and adding the newly added data into full backup data in the last full backup to obtain target backup data;
and copying the target backup data to the target slave equipment.
CN202010227680.6A 2020-03-27 2020-03-27 Database system and disaster tolerance method thereof Pending CN111460035A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010227680.6A CN111460035A (en) 2020-03-27 2020-03-27 Database system and disaster tolerance method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010227680.6A CN111460035A (en) 2020-03-27 2020-03-27 Database system and disaster tolerance method thereof

Publications (1)

Publication Number Publication Date
CN111460035A true CN111460035A (en) 2020-07-28

Family

ID=71685721

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010227680.6A Pending CN111460035A (en) 2020-03-27 2020-03-27 Database system and disaster tolerance method thereof

Country Status (1)

Country Link
CN (1) CN111460035A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI779624B (en) * 2020-08-24 2022-10-01 大陸商中國銀聯股份有限公司 Data read/write processing method, data center, disaster recovery system and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI779624B (en) * 2020-08-24 2022-10-01 大陸商中國銀聯股份有限公司 Data read/write processing method, data center, disaster recovery system and storage medium

Similar Documents

Publication Publication Date Title
US5615329A (en) Remote data duplexing
CN107291787B (en) Main and standby database switching method and device
US8495019B2 (en) System and method for providing assured recovery and replication
CN101535961B (en) Apparatus, system, and method for detection of mismatches in continuous remote copy using metadata
CN102033786B (en) Method for repairing consistency of copies in object storage system
US10976942B2 (en) Versioning a configuration of data storage equipment
US7418564B2 (en) Storage controller, storage control system and storage control method for mirroring volumes
CN111158955B (en) High-availability system based on volume replication and multi-server data synchronization method
CN113885809B (en) Data management system and method
CN113535665B (en) Method and device for synchronizing log files between main database and standby database
CN111460035A (en) Database system and disaster tolerance method thereof
CN103092719B (en) A kind of power-off protection method of file system
WO2012131868A1 (en) Management method and management device for computer system
CN111240903A (en) Data recovery method and related equipment
CN114490570A (en) Production data synchronization method and device, data synchronization system and server
CN107544868B (en) Data recovery method and device
CN104407932A (en) Data backup method and device
CN116545845B (en) Redundant backup device, system and method for production server
CN102549550B (en) Method and system for data access
US11841734B1 (en) Synchronous block level replication across availability zones
CN113868003B (en) Verification and fault positioning method and device for server flash system
CN112562774B (en) Storage device mounting method and device, computer device and storage medium
JP4893180B2 (en) Failure recovery method for storage device, failure recovery program, and control device
CN114116885A (en) Database synchronization method, system, device and medium
CN116431386A (en) Data processing method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination