CN110635950A

CN110635950A - Double-data-center disaster recovery system

Info

Publication number: CN110635950A
Application number: CN201910939003.4A
Authority: CN
Inventors: 陈辉; 强春雨; 薛文娟; 罗文洁; 颜旭乐; 谭秀瑶; 时琛
Original assignee: Shenzhen Power Supply Co ltd
Current assignee: Shenzhen Power Supply Co ltd
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2019-12-31

Abstract

The invention provides a double-data-center disaster recovery system, which comprises a first data center, a second data center and a centralized disaster recovery switching device, wherein the first data center is connected with the second data center through a network; the first data center and the second data center monitor the fault state of the local data center and obtain the running state of the data center of the opposite side according to the heartbeat response condition fed back by the opposite side; the centralized disaster recovery switching device compares the fault state and the operation state of each data center to form a first comparison result and a second comparison result respectively, identifies a fault data center and a normal data center in the first data center and the second data center according to the first comparison result and the second comparison result, and further enables the normal data center to take over all data services of the fault data center. The invention can automatically identify abnormal conditions of the double data centers and carry out corresponding switching operation.

Description

Double-data-center disaster recovery system

Technical Field

The invention relates to the technical field of data centers, in particular to a disaster recovery system with double data centers.

Background

95598 the power supply service faces thousands of households, and has high service requirement and great social influence. The customer service center is used as a window department of a company, the reliability of an information system of the customer service center is very important, and particularly, the construction of a service continuity guarantee system of a 95598 core service system is very important. Through the construction of the business continuity guarantee system, the capability of a core business system of the customer service center for resisting disasters and major accidents can be improved, the loss caused by disaster attack and major accidents is reduced, the data safety and the operation continuity of an important information system of the customer service center are ensured, the serious interruption of important social service functions is avoided, and the stability of social economy is guaranteed.

The service continuity guarantee is the target of disaster recovery construction of a 95598 core service system, a client service center can adopt a framework of a double-active data center, the double centers simultaneously accept service access of users in different areas, service operation is completed in the center, and data between the double centers are mutually prepared through a database logic copying technology. When a disaster or failure event occurs in one center, in order to meet the service continuity guarantee, the two data centers must be able to provide access services to remote users of the failure center respectively.

Therefore, a disaster recovery system capable of automatically identifying abnormal conditions and performing corresponding switching operations for dual data centers is needed.

Disclosure of Invention

The technical problem to be solved by the embodiments of the present invention is to provide a disaster recovery system for dual data centers, which can automatically identify abnormal situations and perform corresponding switching operations for the dual data centers.

In order to solve the above technical problem, an embodiment of the present invention provides a dual data center disaster recovery system, including a first data center and a second data center that are connected to each other, and a centralized disaster recovery switching device that is connected to both the first data center and the second data center; wherein,

the first data center is used for monitoring the fault state of a local data center and obtaining the running state of the second data center by receiving the heartbeat response condition fed back by the second data center after sending heartbeat request information to the second data center;

the second data center is used for monitoring the fault state of a local data center and obtaining the running state of the first data center by receiving the heartbeat response condition fed back by the first data center after sending heartbeat request information to the first data center;

the centralized disaster recovery switching device is configured to compare a fault state of the first data center with an operating state of the first data center obtained by the second data center to form a first comparison result, compare a fault state of the second data center with an operating state of the second data center obtained by the first data center to form a second comparison result, identify a fault data center and a normal data center among the first data center and the second data center according to the first comparison result and the second comparison result, and further allow the normal data center to take over all data services of the fault data center.

And after the first data center and the second data center are connected, the provided data services are the same or different.

The first data center comprises a first local fault state monitoring module and a first opposite end running state monitoring module which are both connected with the concentrated disaster recovery switching device; the first local fault state monitoring module is used for monitoring the fault state of the first data center; the first peer operation state monitoring module is configured to obtain an operation state of the second data center by receiving a heartbeat response condition fed back by the second data center after sending heartbeat request information to the second data center;

the second data center comprises a second local fault state monitoring module and a second opposite end running state monitoring module which are both connected with the concentrated disaster backup switching device, and the second opposite end running state monitoring module is also in channel connection with the first opposite end running state monitoring module; the second local fault state monitoring module is used for monitoring the fault state of the second data center; the second peer operating state monitoring module is configured to obtain an operating state of the first data center by receiving a heartbeat response condition fed back by the first data center after sending the heartbeat request information to the first data center.

The first local fault state monitoring module comprises a first equipment state monitoring submodule and a first environment monitoring submodule; the first equipment state monitoring submodule is used for monitoring equipment health data in the first data center to obtain an equipment health state in the first data center; the first environment monitoring submodule is used for monitoring environment data in the first data center to obtain an environment state of the first data center;

the second local fault state monitoring module comprises a second equipment state monitoring submodule and a second environment monitoring submodule; the second equipment state monitoring submodule is used for monitoring equipment health data in the second data center to obtain the equipment health state in the second data center; and the second environment monitoring submodule is used for monitoring the environment data in the second data center to obtain the environment state of the second data center.

The equipment health data of the first data center and the second data center respectively comprise an equipment current value and an equipment voltage value; the environmental data of the first data center and the second data center each include a humidity and a temperature.

The first peer-to-peer operation state monitoring module comprises a first heartbeat request information sending submodule, a first heartbeat response information receiving submodule and a first heartbeat monitoring management submodule connected with the concentrated disaster recovery switching device; the first heartbeat request information sending submodule is used for sending heartbeat request information to the second data center; the first heartbeat response information receiving submodule is used for receiving a heartbeat response condition fed back by the second data center; the first heartbeat monitoring management submodule is used for obtaining the running state of the second data center according to the heartbeat response condition fed back by the second data center;

the second peer-to-peer operation state monitoring module comprises a second heartbeat request information sending submodule, a second heartbeat response information receiving submodule and a second heartbeat monitoring management submodule connected with the concentrated disaster recovery switching device; the second heartbeat request information sending submodule is used for sending heartbeat request information to the first data center; the second heartbeat response information receiving submodule is used for receiving a heartbeat response condition fed back by the first data center; and the second heartbeat monitoring management submodule is used for obtaining the running state of the first data center according to the heartbeat response condition fed back by the first data center.

The first heartbeat monitoring management submodule comprises a first timing counting unit and a first running state monitoring management unit; the first timing counting unit is configured to start timing when the first heartbeat request information sending module sends heartbeat request information to the second data center, and start counting if the first heartbeat response information receiving module does not receive heartbeat corresponding information fed back by the second data center after a preset time is exceeded, and add 1 to a numerical value; or if the first heartbeat response information receiving module receives heartbeat corresponding information fed back by the second data center within the preset time, resetting the counted numerical value; the first running state monitoring management unit is used for marking the running state of the second data center as a fault if the counting numerical value of the first timing counting unit is greater than a threshold value; otherwise, marking the running state of the second data center as normal;

the second heartbeat monitoring management submodule comprises a second timing counting unit and a second running state monitoring management unit; the second timing and counting unit is configured to start timing when the second heartbeat request information sending module sends heartbeat request information to the first data center, and start counting if the second heartbeat response information receiving module does not receive heartbeat corresponding information fed back by the first data center after the preset time is exceeded, and add 1 to a numerical value; or if the second heartbeat response information receiving module receives heartbeat corresponding information fed back by the first data center within the preset time, resetting the counted numerical value; the second running state monitoring management unit is used for marking the running state of the first data center as a fault if the counting numerical value of the second timing counting unit is greater than the threshold value; and otherwise, marking the running state of the first data center as normal.

The first heartbeat request information sending submodule or the second heartbeat request information sending submodule sends heartbeat request information to the other party at regular intervals so as to periodically detect the heartbeat connection condition between the first data center and the second data center.

The centralized disaster recovery switching device comprises a monitoring information receiving module, a monitoring information processing module, a fault information management module and a take-over module; wherein,

the monitoring information receiving module is used for receiving the fault state of the first data center and the obtained running state of the second data center, and receiving the fault state of the second data center and the obtained running state of the first data center;

the monitoring information processing module is used for comparing the fault state of the first data center with the running state of the first data center obtained by the second data center to form a first comparison result and comparing the fault state of the second data center with the running state of the second data center obtained by the first data center to form a second comparison result according to preset fault characteristic data;

the fault information management module is used for identifying a fault data center and a normal data center in the first data center and the second data center according to the first comparison result and the second comparison result;

and the take-over module is used for generating a corresponding take-over instruction in a preset fault logic principle to enable the normal data center to take over all data services of the fault data center.

The centralized disaster recovery switching device also comprises a correction module; wherein,

and the correcting module is used for correcting and updating the preset fault logic principle.

The embodiment of the invention has the following beneficial effects:

according to the invention, the local fault state monitoring module and the opposite end running state monitoring module of each data center respectively monitor the fault state of the data center and the running state of the opposite data center, so that analysis data is provided for the centralized disaster recovery switching device, and the centralized disaster recovery switching device achieves the purposes of automatically identifying abnormal conditions of the double data centers and performing corresponding switching operation.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is within the scope of the present invention for those skilled in the art to obtain other drawings based on the drawings without inventive exercise.

Fig. 1 is a schematic structural diagram of a dual data center disaster recovery system according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a first data center of FIG. 1;

FIG. 3 is a schematic diagram of a second data center of FIG. 1;

fig. 4 is a schematic structural diagram of the centralized disaster recovery switching device in fig. 1.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.

As shown in fig. 1, a dual data center disaster recovery system provided in an embodiment of the present invention includes a first data center 1 and a second data center 2 connected to each other, and a centralized disaster recovery switching device 3 connected to both the first data center 1 and the second data center 2; wherein,

the first data center 1 is used for monitoring the fault state of the local data center and obtaining the running state of the second data center 2 by receiving the heartbeat response condition fed back by the second data center 2 after sending heartbeat request information to the second data center 2;

the second data center 2 is used for monitoring the fault state of the local data center and obtaining the running state of the first data center 1 by receiving the heartbeat response condition fed back by the first data center 1 after sending the heartbeat request information to the first data center 1;

and the centralized disaster recovery switching device 3 is configured to compare the fault state of the first data center 1 with the operating state of the first data center 1 obtained by the second data center 2 to form a first comparison result, compare the fault state of the second data center 2 with the operating state of the second data center 2 obtained by the first data center 1 to form a second comparison result, identify a faulty data center and a normal data center among the first data center 1 and the second data center 2 according to the first comparison result and the second comparison result, and further enable the normal data center to take over all data services of the faulty data center.

It should be noted that, after the first data center 1 and the second data center 2 are connected, the provided data services may be the same or different, and once any one of the data services fails, all the data services are concentrated on the normal data center through the concentrated disaster recovery switching device 3, so that normal operation of all the data services is ensured, and a disaster recovery effect is achieved.

In the embodiment of the present invention, as shown in fig. 2, the first data center 1 includes a first local failure state monitoring module 11 and a first peer operation state monitoring module 12, both of which are connected to the concentrated disaster recovery switching device 3; the first local fault state monitoring module 11 is configured to monitor a fault state of the first data center 1; the first peer operation state monitoring module 12 is configured to obtain an operation state of the second data center 2 by receiving a heartbeat response condition fed back by the second data center 2 after sending the heartbeat request information to the second data center 2;

the first local fault state monitoring module 11 includes a first device state monitoring submodule 111 and a first environment monitoring submodule 112; the first equipment state monitoring submodule 111 is configured to monitor equipment health data in the first data center 1 to obtain an equipment health state in the first data center 1; the first environment monitoring submodule 112 is configured to monitor environment data in the first data center 1 to obtain an environment state of the first data center 1; wherein the device health data comprises a device current value and a device voltage value; environmental data includes humidity and temperature;

the first peer operation state monitoring module 12 includes a first heartbeat request information sending submodule 121, a first heartbeat response information receiving submodule 122, and a first heartbeat monitoring management submodule 123 connected to the concentrated disaster recovery switching device 3; the first heartbeat request information sending submodule 121 is configured to send heartbeat request information to the second data center 2; the first heartbeat response information receiving submodule 122 is configured to receive a heartbeat response condition fed back by the second data center 2; the first heartbeat monitoring management submodule 123 is configured to obtain an operating state of the second data center 2 according to a heartbeat response condition fed back by the second data center 2;

the first heartbeat monitoring management submodule 123 includes a first timing counting unit 1231 and a first operation state monitoring management unit 1232; the first timing and counting unit 1231 is configured to start timing when the first heartbeat request information sending module 121 sends heartbeat request information to the second data center 2, and start counting if the first heartbeat response information receiving module 122 does not receive heartbeat corresponding information fed back by the second data center 2 after a preset time (for example, 10S) is exceeded, and add 1 to the value; or if the receiving module of the first heartbeat response information 122 receives heartbeat corresponding information fed back by the second data center 2 within a preset time (for example, 10S), clearing the counted value; the first operation state monitoring management unit 1232 is configured to mark the operation state of the second data center 2 as a fault if the counted value of the first timing counting unit 1231 is greater than a threshold (e.g., 3); otherwise, the operation state of the second data center 2 is marked as normal.

In the embodiment of the present invention, as shown in fig. 3, the second data center 2 includes a second local failure state monitoring module 21 and a second peer operation state monitoring module 22 both connected to the concentrated disaster backup switching device 3, and the second peer operation state monitoring module 22 further establishes a channel connection with the first peer operation state monitoring module 21; the second local fault state monitoring module 21 is configured to monitor a fault state of the second data center 2; the second peer operating state monitoring module 22 is configured to obtain an operating state of the first data center 1 by receiving a heartbeat response condition fed back by the first data center 1 after sending the heartbeat request information to the first data center 1;

the second local fault status monitoring module 21 includes a second device status monitoring submodule 211 and a second environment monitoring submodule 212; the second equipment state monitoring submodule 211 is configured to monitor the equipment health data in the second data center 2 to obtain the equipment health state in the second data center 2; the second environment monitoring submodule 212 is configured to monitor environment data in the second data center 2 to obtain an environment state of the second data center 2; the equipment health data also comprises an equipment current value and an equipment voltage value; environmental data also includes humidity and temperature;

the second peer operating state monitoring module 22 includes a second heartbeat request information sending submodule 221, a second heartbeat response information receiving submodule 222, and a second heartbeat monitoring management submodule 223 connected to the concentrated disaster recovery switching device 3; the second heartbeat request information sending submodule 221 is configured to send heartbeat request information to the first data center 1; the second heartbeat response information receiving submodule 222 is configured to receive a heartbeat response condition fed back by the first data center 1; the second heartbeat monitoring management submodule 223 is configured to obtain an operating state of the first data center 1 according to a heartbeat response condition fed back by the first data center 1;

the second heartbeat monitoring management sub-module 223 includes a second timing counting unit 2231 and a second operation state monitoring management unit 2232; the second timing and counting unit 2231 is configured to start timing when the second heartbeat request information sending module 221 sends the heartbeat request information to the first data center 1, and start counting if the second heartbeat response information receiving module 222 does not receive the heartbeat corresponding information fed back by the first data center 1 after a preset time (for example, 10S) is exceeded, and add 1 to the value; or if the second heartbeat response information receiving module 222 receives heartbeat corresponding information fed back by the first data center 1 within a preset time (for example, 10S), clearing the counted value; a second operation state monitoring management unit 2232, configured to mark the operation state of the first data center 1 as a fault if the counted value of the second time counting unit 2231 is greater than a threshold (e.g., 3); otherwise, the operation state of the first data center 1 is marked as normal.

It should be noted that the first heartbeat request information sending sub-module 121 or the second heartbeat request information sending sub-module 221 sends heartbeat request information to the other party at regular intervals to periodically detect the heartbeat connection between the first data center 1 and the second data center 2, that is, periodically and automatically identify the abnormal condition of the dual data centers.

In the embodiment of the present invention, as shown in fig. 4, the centralized disaster recovery switching device 3 includes a monitoring information receiving module 31, a monitoring information processing module 32, a fault information management module 33, and a takeover module 34; wherein,

the monitoring information receiving module 31 is configured to receive a fault state of the first data center 1 and an obtained operating state of the second data center 2, and receive a fault state of the second data center 2 and an obtained operating state of the first data center 1;

the monitoring information processing module 32 is configured to compare the fault state of the first data center 1 with the operating state of the first data center 1 obtained by the second data center 2 to form a first comparison result, and compare the fault state of the second data center 2 with the operating state of the second data center 2 obtained by the first data center 1 to form a second comparison result, according to preset fault feature data;

the fault information management module 33 is configured to identify a fault data center and a normal data center of the first data center and the second data center according to the first comparison result and the second comparison result; it should be noted that the first data center and the second data center have at most one failure, otherwise, the whole data center is broken down;

and the takeover module 34 is configured to generate a corresponding takeover instruction in a preset fault logic principle, so that the normal data center takes over all data services of the fault data center.

Furthermore, the centralized disaster recovery switching device 3 further includes a correction module 35; the correcting module 35 is configured to correct and update a preset fault logic principle.

The embodiment of the invention has the following beneficial effects:

It should be noted that, in the foregoing system embodiment, each included module is only divided according to functional logic, but is not limited to the above division as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by relevant hardware instructed by a program, and the program may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. A double-data-center disaster recovery system is characterized by comprising a first data center and a second data center which are connected with each other, and a centralized disaster recovery switching device which is connected with the first data center and the second data center; wherein,

2. The dual-data-center disaster recovery system according to claim 1, wherein the first data center and the second data center provide the same or different data services after establishing the connection.

3. The dual-data-center disaster recovery system according to claim 1, wherein the first data center includes a first local failure status monitoring module and a first peer operation status monitoring module both connected to the centralized disaster recovery switching device; the first local fault state monitoring module is used for monitoring the fault state of the first data center; the first peer operation state monitoring module is configured to obtain an operation state of the second data center by receiving a heartbeat response condition fed back by the second data center after sending heartbeat request information to the second data center;

4. The dual data center disaster recovery system of claim 3 wherein said first local failure status monitoring module comprises a first equipment status monitoring submodule and a first environmental monitoring submodule; the first equipment state monitoring submodule is used for monitoring equipment health data in the first data center to obtain an equipment health state in the first data center; the first environment monitoring submodule is used for monitoring environment data in the first data center to obtain an environment state of the first data center;

5. The dual-data-center disaster recovery system of claim 4, wherein the equipment health data of the first data center and the second data center each comprise an equipment current value and an equipment voltage value; the environmental data of the first data center and the second data center each include a humidity and a temperature.

6. The dual-data-center disaster recovery system according to claim 3, wherein the first peer-to-peer operation status monitoring module comprises a first heartbeat request information sending sub-module, a first heartbeat response information receiving sub-module, and a first heartbeat monitoring management sub-module connected to the centralized disaster recovery switching device; the first heartbeat request information sending submodule is used for sending heartbeat request information to the second data center; the first heartbeat response information receiving submodule is used for receiving a heartbeat response condition fed back by the second data center; the first heartbeat monitoring management submodule is used for obtaining the running state of the second data center according to the heartbeat response condition fed back by the second data center;

7. The dual data center disaster recovery system according to claim 6, wherein the first heartbeat monitoring management sub-module comprises a first timing counting unit and a first operation status monitoring management unit; the first timing counting unit is configured to start timing when the first heartbeat request information sending module sends heartbeat request information to the second data center, and start counting if the first heartbeat response information receiving module does not receive heartbeat corresponding information fed back by the second data center after a preset time is exceeded, and add 1 to a numerical value; or if the first heartbeat response information receiving module receives heartbeat corresponding information fed back by the second data center within the preset time, resetting the counted numerical value; the first running state monitoring management unit is used for marking the running state of the second data center as a fault if the counting numerical value of the first timing counting unit is greater than a threshold value; otherwise, marking the running state of the second data center as normal;

8. The dual-data-center disaster recovery system according to claim 6, wherein the first heartbeat request information sending sub-module or the second heartbeat request information sending sub-module sends heartbeat request information to the other at regular intervals to periodically detect a heartbeat connection condition between the first data center and the second data center.

9. The dual data center disaster recovery system according to claim 1, wherein the centralized disaster recovery switching device comprises a monitoring information receiving module, a monitoring information processing module, a fault information management module and a takeover module; wherein,

10. The dual data center disaster recovery system according to claim 9, wherein said centralized disaster recovery switching device further comprises a modification module; wherein,