CN111158962A

CN111158962A - Remote disaster recovery method, device, system, electronic equipment and storage medium

Info

Publication number: CN111158962A
Application number: CN201811320107.9A
Authority: CN
Inventors: 徐海勇; 王德朋; 陶涛; 黄岩; 尚晶; 郭志伟; 谢帆; 魏瑗珍; 段云峰; 余钦水
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd; China Mobile Information Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd; China Mobile Information Technology Co Ltd
Priority date: 2018-11-07
Filing date: 2018-11-07
Publication date: 2020-05-15
Anticipated expiration: 2038-11-07
Also published as: CN111158962B

Abstract

The invention discloses a remote disaster recovery method, a device, a system, electronic equipment and a storage medium, which are used for solving the problems of inflexible data center switching and higher construction cost in the prior art. The method comprises the following steps: the method comprises the steps that a master node receives first service unavailable information sent by a slave node, wherein the first service unavailable information is sent when the slave node fails, and the master node and the slave node are both located in a first data center; the master node judges whether the number of the fault slave nodes is larger than a set number threshold value or not; if so, the main node sends a request for calling an HTTP interface of the DNS to the DNS, so that the DNS switches the serving data center to a second data center, and the second data center and the first data center are located in different areas.

Description

Remote disaster recovery method, device, system, electronic equipment and storage medium

Technical Field

The present invention relates to the field of data disaster recovery technologies, and in particular, to a method, an apparatus, a system, an electronic device, and a storage medium for remote disaster recovery.

Background

With the development of the mobile internet, the data size of users grows exponentially, and the stability and the availability of background services are more and more important. In the traditional single data center, under the extreme conditions of earthquake, fire, terrorist attack and the like, service paralysis can be caused, the business of a user can not be normally accepted, and the service recovery also needs a longer time. For this reason, the construction of multiple data centers is also being proposed by many companies.

Nowadays, in order to improve the stability of the service, the adopted technical scheme comprises: firstly, a single data center scheme is adopted, all nodes of a cluster are in an intranet environment, the intranet is used for load balancing and high availability, and the virtual IP and a high availability and load balancing component are adopted for realizing the load balancing and the load balancing.

And secondly, a two-place-three-center mode is adopted, two data centers in the same city, one data center in different places are synchronized with the data center in the same city, and the data centers in different places are synchronized with asynchronous data.

And thirdly, in a double-data-center mode, high availability and load balance are achieved by realizing a virtual IP (Internet protocol) mode inside each data center, databases of different data centers are synchronized by adopting a master-slave mode, and the data centers are manually switched when a fault occurs. Although the method can solve the remote disaster recovery and has controllable requirements on the network and the cost, the method cannot solve the problem of manual switching, and the recovery time of the fault needs 30 minutes to 1 hour.

Disclosure of Invention

The embodiment of the invention provides a remote disaster recovery method, a remote disaster recovery device, a remote disaster recovery system, electronic equipment and a storage medium, which are used for solving the problems of inflexible data center switching and higher construction cost in the prior art.

The embodiment of the invention provides a remote disaster recovery method, which comprises the following steps:

the method comprises the steps that a master node receives first service unavailable information sent by a slave node, wherein the first service unavailable information is sent when the slave node fails, and the master node and the slave node are both located in a first data center;

the master node judges whether the number of the fault slave nodes is larger than a set number threshold value or not;

if so, the main node sends a request for calling an HTTP interface of the DNS to the DNS, so that the DNS switches the serving data center to a second data center, and the second data center and the first data center are located in different areas.

Further, the sending, by the master node to a DNS server, a request to invoke an HTTP interface of the DNS server includes: and the master node sends calling information for calling an HTTP interface of the DNS to the DNS, wherein the calling information carries an access key.

Further, wherein the master node is elected for each node in the first data center.

Further, the method further comprises: and the main node sends data center fault prompt information.

Further, the method further comprises: and when the master node judges that the master node has a fault, sending second service unavailable information to each non-fault slave node in the first data center, so that each non-fault slave node in the first data center reselects a new master node.

the method comprises the steps that a DNS server receives a request of a main node for calling an HTTP interface of the DNS server, wherein the main node is located in a first data center;

and the DNS server switches the data center for service into a second data center, wherein the second data center and the first data center are positioned in different areas.

Further, the DNS server switching the serving data center to the second data center includes: and the DNS server modifies the IP address of the first data center in the record A of the DNS server into the IP address of the second data center.

Further, the receiving, by the DNS server, the request for the host node to invoke the HTTP interface thereof includes:

the DNS server receives calling information which is sent by a main node and used for calling an HTTP interface of the DNS server, wherein the calling information carries an access key;

the DNS server switching the serving data center to a second data center comprises:

and the DNS server verifies the main node according to the Access Key, and when the verification is passed, the data center for service is switched to a second data center.

The embodiment of the invention provides a remote disaster recovery device, which comprises:

a first receiving module, configured to receive first service unavailability information sent from a node, where the first service unavailability information is sent when a slave node fails, where the remote disaster recovery apparatus and the slave node are both located in a first data center;

the judging module is used for judging whether the number of the fault slave nodes is larger than a set number threshold value or not;

and the calling module is used for sending a request for calling an HTTP interface of the DNS to the DNS if the number of the fault slave nodes is larger than a set number threshold value, so that the DNS switches a data center for service to a second data center, wherein the second data center and the first data center are located in different areas.

the second receiving module is used for receiving a request of a main node for calling an HTTP interface of the main node, wherein the main node is positioned in the first data center;

and the switching module is used for switching the data center for service into a second data center, and the second data center and the first data center are positioned in different areas.

The embodiment of the invention provides electronic equipment, which comprises a memory and a processor;

the processor is used for reading the program in the memory and executing the following processes: receiving first service unavailability information sent from a node, the first service unavailability information being sent when a slave node fails, wherein the electronic device and the slave node are both located in a first data center; judging whether the number of the fault slave nodes is larger than a set number threshold value or not; if so, sending a request for calling an HTTP interface of the DNS to the DNS, and enabling the DNS to switch the data center for service to a second data center, wherein the second data center and the first data center are located in different areas.

Further, the processor is configured to send second service unavailability information to each non-failed slave node in the first data center when it is determined that the first data center itself fails, so that each non-failed slave node in the first data center reselects a new master node.

the processor is used for reading the program in the memory and executing the following processes: receiving a request of a main node for calling an HTTP interface of the main node, wherein the main node is positioned in a first data center; and switching the data center for service into a second data center, wherein the second data center and the first data center are located in different areas.

Further, the processor is specifically configured to modify the IP address of the first data center in the record of the processor a to the IP address of the second data center.

An embodiment of the present invention further provides an electronic device, where the electronic device includes: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus;

the memory has stored therein a computer program which, when executed by the processor, causes the processor to perform the method steps of any of the above applied to a master node of a data center.

the memory has stored therein a computer program which, when executed by the processor, causes the processor to perform the method steps of any of the above applied to a DNS server.

Embodiments of the present invention further provide a computer-readable storage medium, which stores a computer program executable by an electronic device, and when the program runs on the electronic device, the electronic device is caused to execute any of the method steps applied to a master node of a data center.

An embodiment of the present invention further provides a computer-readable storage medium, which stores a computer program executable by an electronic device, and when the program runs on the electronic device, the electronic device is caused to execute any one of the method steps applied to a DNS server.

The embodiment of the invention provides a remote disaster recovery system which comprises the remote disaster recovery device applied to the main node of a data center, a slave node and the remote disaster recovery device applied to a DNS server.

Further, the slave node is further configured to send first service unavailability information to the load balancing server when a failure occurs; and the load balancing server is used for receiving the first service unavailable information sent by the slave nodes and removing the fault slave nodes when the number of the fault slave nodes is judged not to be larger than the set number threshold.

The embodiment of the invention provides a remote disaster recovery method, a device, a system, electronic equipment and a storage medium, wherein the method comprises the following steps: the method comprises the steps that a master node receives first service unavailable information sent by a slave node, wherein the first service unavailable information is sent when the slave node fails, and the master node and the slave node are both located in a first data center; the master node judges whether the number of the fault slave nodes is larger than a set number threshold value or not; if so, the main node sends a request for calling an HTTP interface of the DNS to the DNS, so that the DNS switches the serving data center to a second data center, and the second data center and the first data center are located in different areas. The method comprises the steps that a main node receives first service unavailable information sent by each slave node, whether the number of the fault slave nodes of the data center exceeds a set number threshold value or not is judged, and when the number of the fault slave nodes exceeds the set number threshold value, an HTTP interface of a DNS server is automatically called to enable the DNS server to complete the switching of the data center, so that the data center can be automatically switched to complete the remote disaster recovery.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic process diagram of a remote disaster recovery method according to embodiment 1 of the present invention;

fig. 2 is a schematic process diagram of a remote disaster recovery method according to embodiment 5 of the present invention;

fig. 3 is a schematic process diagram of a remote disaster recovery method according to embodiment 7 of the present invention;

fig. 4 is a schematic structural diagram of a remote disaster recovery device according to embodiment 8 of the present invention;

fig. 5 is a schematic structural diagram of a remote disaster recovery device according to embodiment 9 of the present invention;

fig. 6 is a schematic structural diagram of a remote disaster recovery system according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a remote disaster recovery system according to an embodiment of the present invention;

fig. 8 is an electronic device according to embodiment 11 of the present invention;

fig. 9 is an electronic device provided in embodiment 12 of the present invention;

fig. 10 is an electronic device provided in embodiment 13 of the present invention;

fig. 11 is an electronic device provided in embodiment 14 of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the attached drawings, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1:

fig. 1 is a schematic process diagram of a remote disaster recovery method according to an embodiment of the present invention, where the process includes the following steps:

s101: the method comprises the steps that a master node receives first service unavailable information sent by a slave node, wherein the first service unavailable information is sent when the slave node fails, and the master node and the slave node are both located in a first data center.

In the embodiment of the invention, in order to prevent the problem that the data center causes service paralysis under extreme conditions such as earthquake, fire, terrorist attack and the like, the problem can be solved by adopting a remote disaster recovery technology.

When the remote disaster recovery technology is adopted, a remote disaster recovery system needs to be constructed, the remote disaster recovery system comprises two data centers, and the two data centers are located in different areas in order to ensure that the two data centers cannot encounter serious disasters such as earthquake, fire and the like at the same time. In order to realize the automatic switching of the two data centers, the remote disaster recovery system also comprises a DNS server for providing DNS service.

Each data center contains a master node and a plurality of slave nodes. In order to implement centralized management of nodes, a master node is elected from all nodes through an election mechanism, where when the master node is elected through the election mechanism, the master node may be determined according to the performance of the node or the load of the node, and a specific election process is the prior art, and is not described in detail in the embodiments of the present invention. After the master node is elected, the other remaining nodes are determined to be slave nodes. Each node located in the data center can detect the service availability of the node, namely, whether the node has a fault or not is judged, and due to the connection relationship among the nodes, when the node detects that the node has the fault, the node can send service unavailable information to other nodes.

Specifically, since the Consul software supports the detection of whether multiple nodes of the data center fail, that is, the health check, the Consul software may be deployed in each node, so as to implement a cluster of multiple nodes of the data center and form a Consul cluster, where the deployment of the Consul software in each node is prior art, and is not described in detail in the embodiment of the present invention.

Each node monitors the service availability of the node through the Consul software configured by the node. When a slave node monitors that the slave node fails through a health check function of Consul software and results in unavailable service, first service unavailable information of the node failure is sent to a master node, wherein the first service unavailable information can carry identification information of the slave node, so that the master node can perform subsequent statistics.

S102: and the master node judges whether the number of the fault slave nodes is greater than a set number threshold value.

When the master node receives the node failure service unavailable information sent by the slave nodes, the number of the failed slave nodes is counted.

A quantity threshold is preset, and the quantity threshold is smaller than the quantity of all nodes in the first data center. In addition, the number threshold may be proportional to the number of all nodes in the first data center in which the master node is located. For example, the number threshold may be one third, one half, etc. of the number of all nodes in the first data center, and preferably, the number threshold may be determined to be one half of the number of all nodes in the first data center.

After the main node acquires the number of the fault slave nodes, the acquired number of the fault slave nodes is compared with a preset number threshold value, and whether the number of the fault slave nodes is larger than the set number threshold value or not is judged.

S103: if so, the main node calls an HTTP interface of the DNS server to enable the DNS server to switch the serving data center to a second data center, and the second data center and the first data center are located in different areas.

If the number of the fault slave nodes is larger than the set number threshold value as a result of the judgment, the first data center which is currently in service cannot continue to provide service. In this case, a major disaster such as an earthquake or a fire may occur at the location of the first data center, so that a machine failure or a network failure may occur in the data center, and at this time, for stability and availability of the background service, a remote disaster recovery technique needs to be adopted to switch from the currently failed first data center to another available data center.

To accomplish the automatic switchover from the currently serving first data center to the second data center, the master node causes the DNS server to accomplish the switchover of data centers by sending a request to the DNS server to invoke the HTTP interface of the DNS server.

After the DNS server receives a request for calling an HTTP interface of the DNS server sent by a main node, the DNS server modifies the IP address of the first data center which is currently served in the record A of the DNS server into the IP address of the second data center by calling the HTTP interface, and the switching from the first data center which is currently served to the second data center is completed. In order to ensure that the switched data center is usable and will not encounter extreme situations such as earthquake, fire and the like with the first data center, the second data center is located in a different area from the first data center, preferably, the physical distance between the two data centers is longer, and a certain distance threshold value can be set, so that the distance between the two data centers is greater than the set distance threshold value.

If the number of the fault slave nodes is not larger than the set number threshold value according to the judgment result, the major disasters such as earthquake or fire do not occur, and the master node does not need to process the major disasters.

In the embodiment of the invention, the main node receives the first service unavailable information sent by each slave node, judges whether the number of the fault slave nodes of the data center exceeds the set number threshold, and automatically calls the HTTP interface of the DNS server to enable the DNS server to complete the switching of the data center when the number of the fault slave nodes exceeds the set number threshold, so that the data center can be automatically switched to complete the remote disaster recovery.

Example 2:

in order to ensure security of invoking the HTTP interface of the DNS server, on the basis of the foregoing embodiment, in an embodiment of the present invention, the sending, by the master node, the request for invoking the HTTP interface of the DNS server to the DNS server includes:

the main node sends first calling information for calling an HTTP interface of the DNS to the DNS, and the first calling information carries an access key.

When the number of the fault slave nodes of the first data center is larger than a set number threshold, the master node sends first calling information for calling an HTTP interface of the DNS to the DNS. In order to ensure the security of the HTTP interface, the first calling information carries the access key, the DNS server calls the HTTP interface of the DNS server only when the access key passes the verification of the DNS server, and the switching of the data center is completed through the HTTP interface.

In the embodiment of the invention, when the main node calls the HTTP interface of the DNS server, the safety of the DNS server in data center switching is improved by sending the calling information carrying the access key.

Example 3:

in order to facilitate centralized management of all nodes of the data center, on the basis of the foregoing embodiments, in an embodiment of the present invention, the master node elects and determines for each node in the first data center.

In order to ensure that the operation of the whole data center is not affected when the current master node in the data center has a problem, the method further comprises the following steps:

and when the master node judges that the master node has a fault, sending second service unavailable information to each non-fault slave node in the first data center, so that each non-fault slave node in the first data center reselects a new master node.

In order to facilitate centralized management of all nodes of the data center, one node is selected from all nodes of the first data center as a master node, and the rest nodes are used as slave nodes. The specific process of selecting the master node is the prior art, and is not described in detail in the embodiment of the invention.

Because connection relations exist among all nodes in the whole data center, and the slave nodes send first service unavailable information to the master node when a fault occurs, when the master node fails, the slave nodes which do not fail can be identified, and second service unavailable information of the fault of the master node is sent to each other slave node which does not fail. At this time, after the non-failed slave node receives the second service unavailability information of the master node failure sent by the current master node, all the non-failed slave nodes select a new node as the master node again through the election mechanism.

In the embodiment of the invention, one node is selected from all nodes in the data center providing the service as the main node, so that the centralized management of all nodes in the data center is facilitated, and if the main node fails, the main node is still selected from the rest non-failure slave nodes, so that the availability of the service of the data center is ensured.

Example 4:

in order to enable a worker to timely maintain and repair the data center, on the basis of the foregoing embodiments, in an embodiment of the present invention, after the master node determines that the number of the failed slave nodes is greater than a set number threshold, the method further includes:

and the main node sends data center fault prompt information.

When the master node judges that the number of the fault slave nodes is larger than the set number threshold, the situation shows that the service is unavailable due to the fact that the fault occurs in the data center which is currently in service. And at the moment, the main node of the data center with the current fault sends prompt information of the fault of the data center. The fault node restoration method can be used for sending the fault node to system maintenance personnel through mails or short messages and informing the system maintenance personnel to carry out manual restoration on the fault node. The specific host node can record a mobile phone number or an email address of system maintainers, and when a data center fault occurs, the host node of the data center sends prompt information of the data center fault to the stored mobile phone number or email address of the system maintainers to inform the system maintainers of manual recovery.

In the embodiment of the invention, the main node sends the prompt information of the data center fault to the staff, so that the staff can obtain the information that the data center has the problem in time and take maintenance measures in time.

Example 5:

fig. 2 is a schematic process diagram of a remote disaster recovery method according to an embodiment of the present invention, where the process includes the following steps:

s201: the DNS server receives a request of a main node for calling an HTTP interface of the DNS server, wherein the main node is located in a first data center.

When the master node of the first data center determines that the number of the failed slave nodes is greater than the set number threshold, it indicates that the first data center currently performing the service cannot continue to provide the service, and a major disaster such as an earthquake or a fire occurs at the location of the first data center, so that a machine failure or a network failure occurs in the data center, and at this time, a remote disaster recovery technology needs to be adopted for the stability and the availability of the background service.

When the disaster recovery technology is adopted, a first data center which is currently served needs to be converted into another available data center, and a main node in the data center which is currently served sends a request for calling an HTTP interface of a DNS server to the DNS server.

After receiving a request of the host node for calling the HTTP interface, the DNS server calls the HTTP interface to complete the switching of the data center.

S202: and the DNS server switches the data center for service into a second data center, wherein the second data center and the first data center are positioned in different areas.

When the DNS server receives a request of a main node for calling an HTTP interface of the DNS server, the DNS server modifies an A record of the DNS server by calling the HTTP interface of the DNS server, and the switching from the first data center to the second data center is completed by modifying the IP address of a first data center which is currently served in the A record into the IP address of the second data center. In order to ensure that the switched data center is available and not simultaneously encounter extreme situations such as earthquake, fire and the like with the first data center, the second data center is located in a different area from the first data center.

In the embodiment of the invention, the DNS server switches the first data center which is currently served to the second data center by modifying the A record of the DNS server, so that the switching from the failed data center to the available data center is automatically completed, the remote disaster recovery is realized, and the problem that the data center can only be manually switched at present is solved.

Example 6:

in order to switch the failed data center for performing the service to the available data center, on the basis of the foregoing embodiments, in an embodiment of the present invention, the switching, by the DNS server, the data center for performing the service to the second data center includes:

and the DNS server modifies the IP address of the first data center in the record A of the DNS server into the IP address of the second data center.

Because two data centers are constructed in order to realize disaster recovery in different places when the data centers are constructed, the two data centers have the same function and are positioned in different areas, the DNS server stores the identification information of the two data centers, and the identification information of each data center is unique. Specifically, a corresponding identification bit is set in the server for identification information of a first data center currently being served, the first data center is identified as the data center currently being served, if the first data center currently being served has a fault, the DNS server receives a request for calling a DNS server HTTP interface sent by a master node of the first data center, at this time, the DNS server changes the data center currently being served from the first data center currently being served to a second data center, and clears the identification bit set for the identification information of the first data center, and sets the identification bit for the identification information of the second data center. Alternatively, because only the master node in the currently served data center may send a request to the server to call the HTTP interface of the DNS server, the correspondence between the master node and the data center may be known in advance, or because the correspondence between the master node and the data center is not generally changed. Therefore, when the master node sends a request for calling the HTTP interface of the DNS server, the request may carry the identification information of the first data center where the master node is located, and after the server receives the request, the DNS server changes the data center that is currently served from the first data center that is currently served to the second data center according to the identification information of the first data center.

When the first data center which is currently performing service is unavailable due to failure, the main node of the first data center sends a request for calling an HTTP interface to the DNS server, and the DNS server modifies the A record of the DNS server through the HTTP interface of the DNS server. Because the DNS server stores the IP address of the switched second data center, when a request that the main node of the first data center calls the HTTP interface of the main node of the first data center is received, the IP address of the first data center in the record of the main node A of the first data center is modified into the IP address of the second data center, so that the data center which is served is switched from the first data center which has a fault to the available second data center.

In the embodiment of the invention, the DNS server modifies the IP address of the first data center in the record A of the DNS server into the IP address of the second data center, so that the switching from the failed data center to the available data center is completed.

Example 7:

in order to ensure the security of invoking the HTTP interface of the DNS server, on the basis of the foregoing embodiments, in an embodiment of the present invention, the receiving, by the DNS server, the request for the host node to invoke the HTTP interface of the DNS server includes:

the method comprises the steps that a DNS server receives first calling information which is sent by a main node and used for calling an HTTP interface of the DNS server, wherein the first calling information carries an access key;

In order to ensure the security of calling the HTTP interface of the DNS server, the DNS server can automatically generate an access key with an identity authentication function and send the access key to a main node with the requirement of calling the HTTP interface of the DNS server.

When the main node needs to call the HTTP interface of the DNS server, the main node sends calling information for calling the HTTP interface of the DNS server to the DNS server. In order to successfully call the HTTP interface, the first call information carries an access key for authentication.

The DNS server receives first calling information which is sent by a main node and used for calling an HTTP interface of the DNS server, and verifies the authenticity of the first calling information sent by the main node. And verifying whether the access key carried by the first calling information is the access key generated and sent by the DNS server, if so, calling an HTTP interface by the DNS server, and realizing the switching of the data center through the HTTP interface.

The above embodiments are described below as a specific embodiment, and as shown in fig. 3, the method includes the following steps:

s301: the data center has a machine failure or a network failure.

When an extreme condition such as an earthquake, a fire and the like occurs in a certain place, a data center located in the place has a problem that a service is unavailable such as a machine failure or a network failure, and particularly, each node in the data center has a failure.

S302: the master node receives first service unavailable information transmitted from the node.

Since all nodes in the data center are configured with the Consul software, when a slave node fails, the slave node with the Consul software is deployed and sends first service unavailability information of the slave node failure to the master node of the data center by triggering a self failure processing program.

S303: and judging whether the number of the data center fault nodes exceeds one half of the number of the data center nodes.

The main node of the data center counts the number of the fault slave nodes of the data center according to the received service unavailable information sent by the slave nodes, and judges whether the number of the fault slave nodes exceeds half of the number of the data center nodes.

S304: if yes, an HTTP interface of the DNS server is automatically called, the DNS server is informed to update the A record of the DNS server, and the A record is redirected to an available data center.

If the number of the fault slave nodes exceeds half of the number of the data center nodes, the master node automatically calls an HTTP interface of the DNS server to enable the DNS server to modify the A record of the DNS server, and the current fault data center is switched to an available data center which is reserved in advance.

The load balancing server located in the data center also receives first service unavailable information sent by the slave nodes, and if the load balancing server judges that the number of the fault slave nodes does not exceed half of the number of the data center nodes, the corresponding fault slave nodes are directly removed, so that the load balancing in the data center is achieved, and the service availability of the data center is realized.

S305: and e-mail short messages inform system maintenance personnel to recover the fault machine node.

When the main node judges that the number of the fault slave nodes exceeds half of the number of the nodes of the data center, the data center is indicated to have faults and the service is unavailable, at the moment, the main node of the fault data center informs system maintenance personnel by sending mails or short messages, and the system maintenance personnel can manually recover the fault nodes of the fault data center.

After the first data center is manually restored to enable the service to be available, in order to reduce the service pressure of another second data center, the staff can manually update the record A of the DNS server, and the data center which is subjected to the service is switched from the second data center to the first data center, so that the restored first data center provides the service to the outside.

The steps of the embodiments of the present invention are described in detail in the embodiments of the present invention, and are not described herein again.

Example 8:

based on the same technical concept, the embodiment of the invention provides a remote disaster recovery device which is applied to a host node. As shown in fig. 4, an apparatus provided in an embodiment of the present invention includes:

a first receiving module 401, configured to receive first service unavailability information sent from a node, where the first service unavailability information is sent when a slave node fails, where the remote disaster recovery apparatus and the slave node are both located in a first data center;

a judging module 402, configured to judge whether the number of failed slave nodes is greater than a set number threshold;

a calling module 403, configured to send a request for calling an HTTP interface of the DNS server to the DNS server if it is determined that the number of faulty slave nodes is greater than the set number threshold, so that the DNS server switches a data center for service to a second data center, where the second data center is located in a different area from the first data center.

Further, the calling module 403 is specifically configured to send, to the DNS server, calling information for calling an HTTP interface of the DNS server, where the calling information carries an access key.

Further, the apparatus further comprises: and the prompt module 404 is configured to send data center fault prompt information.

Further, the apparatus further comprises: and an election module 405, configured to, when it is determined that a failure occurs in the slave node, send second service unavailability information to each non-failed slave node in the first data center, so that each non-failed slave node in the first data center reselects a new master node.

Example 9:

based on the same technical concept, the embodiment of the invention provides a remote disaster recovery device which is applied to a DNS server. As shown in fig. 5, the apparatus provided in the embodiment of the present invention includes:

a second receiving module 501, configured to receive, by a DNS server, a request for a host node to invoke a self HTTP interface, where the host node is located in a first data center;

a switching module 502, configured to switch a data center performing a service to a second data center by the DNS server, where the second data center and the first data center are located in different areas.

Further, the switching module 502 is specifically configured to modify the DNS server to modify the IP address of the first data center in the record of the DNS server a to the IP address of the second data center.

Further, the second receiving module 501 is specifically configured to receive, by the DNS server, call information for calling an HTTP interface of the DNS server, where the call information is sent by a master node, and the call information carries an access key AccessKey; the switching module 502 is further specifically configured to verify the master node according to the AccessKey, and switch the serving data center to a second data center when the verification is passed.

Example 10:

fig. 6 is a schematic structural diagram of a remote disaster recovery system according to an embodiment of the present invention, where the remote disaster recovery system includes a host node 601 and a DNS server 602. Wherein:

the master node 601 is configured to receive first service unavailability information sent by a slave node, where the first service unavailability information is sent when the slave node fails, and both the master node and the slave node are located in a first data center; the master node judges whether the number of the fault slave nodes is larger than a set number threshold value or not; if so, the main node sends a request for calling an HTTP interface of the DNS to the DNS, so that the DNS switches the serving data center to a second data center, and the second data center and the first data center are located in different areas.

The DNS server 602 is configured to receive a request for a host node to invoke a HTTP interface of the host node, where the host node is located in a first data center; and the DNS server switches the data center for service into a second data center, wherein the second data center and the first data center are positioned in different areas.

In order to solve the problem that switching from a failed data center to an available data center cannot be automatically completed in the prior art, the embodiment of the invention provides a remote disaster recovery system.

The master node is specifically configured to send, to the DNS server, call information for calling an HTTP interface of the DNS server, where the call information carries an access key.

The master node elects for each node in the first data center.

And the main node is used for sending the fault prompt information of the data center.

And the master node is used for sending second service unavailable information to each non-fault slave node in the first data center when judging that the master node has a fault, so that each non-fault slave node in the first data center reselects a new master node.

The DNS server is specifically configured to modify the IP address of the first data center in the record a of the DNS server to the IP address of the second data center.

The DNS server is specifically used for receiving calling information which is sent by a main node and used for calling an HTTP interface of the DNS server, wherein the calling information carries an access key; and the DNS server verifies the main node according to the Access Key, and when the verification is passed, the data center for service is switched to a second data center.

When the remote disaster recovery technology is adopted, a remote disaster recovery system as shown in fig. 7 is constructed.

Firstly, the remote disaster recovery system comprises two data centers, wherein each data center comprises a load layer, an application layer and a database layer. The load layer comprises a load balancing server, in order to realize load balancing in a single data center, the LVS is adopted for request forwarding, and the virtual IP + keepalive is used for ensuring that a control node does not have single-point failure; the application layer is mainly used for application system deployment and comprises a plurality of application systems, and one application system is deployed and installed on a plurality of nodes by adopting a fragmentation deployment method, so that the load balance and high availability of the application system are realized; the database layer is used for installing distributed databases, the databases are synchronous or asynchronous to ensure the consistency of data, a uniform access mode is provided, and transparent access to applications is realized.

In order to realize remote disaster recovery, the remote disaster recovery system also comprises a DNS server for providing DNS service, and the DNS server completes the switching between two data centers by changing the A record of the DNS server, thereby realizing remote disaster recovery. In order to ensure high availability of the DNS service, an additional set of data center is required to be built for building the DNS service, and a main DNS server and a standby DNS server are built to ensure that the DNS service is not unavailable due to machine failure.

The system further includes a load balancing server 603, configured to receive the first service unavailability information sent from the node, and remove the failed slave node when it is determined that the number of failed slave nodes is not greater than the set number threshold.

Example 11:

on the basis of the above embodiments, the embodiment of the present invention further provides an electronic device 800, as shown in fig. 8, including a memory 801 and a processor 802;

the processor 802 is configured to read the program in the memory 801 and execute the following processes:

receiving first service unavailable information sent by a slave node, wherein the first service unavailable information is sent when the slave node fails, and the master node and the slave node are both located in a first data center;

judging whether the number of the fault slave nodes is larger than a set number threshold value or not;

if so, sending a request for calling an HTTP interface of the DNS to the DNS, and enabling the DNS to switch the data center for service to a second data center, wherein the second data center and the first data center are located in different areas.

In FIG. 8, the bus architecture may include any number of interconnected buses and bridges, with one or more processors represented by processor 802 and various circuits of memory represented by memory 801 being linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein.

Alternatively, the processor 802 may be a CPU (central processing unit), an ASIC (Application specific integrated Circuit), an FPGA (Field Programmable Gate Array), or a CPLD (Complex Programmable Logic Device).

The processor is specifically configured to send, to the DNS server, call information for calling an HTTP interface of the DNS server, where the call information carries an access key.

And the processor is used for sending the fault prompt information of the data center.

And the processor is used for sending second service unavailable information to each non-fault slave node in the first data center when judging that the slave node has a fault, so that each non-fault slave node in the first data center reselects a new master node.

Example 12:

on the basis of the foregoing embodiments, an electronic device 900 is further provided in an embodiment of the present invention, as shown in fig. 9, and includes a memory 901 and a processor 902;

the processor 902 is configured to read the program in the memory 901, and execute the following processes:

In fig. 9, the bus architecture may include any number of interconnected buses and bridges, with one or more processors, represented by processor 902, and various circuits, represented by memory 901, linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein.

Alternatively, the processor 901 may be a CPU (central processing unit), an ASIC (Application specific integrated Circuit), an FPGA (Field Programmable Gate Array), or a CPLD (Complex Programmable Logic Device).

The processor is specifically configured to modify the IP address of the first data center in the record of the processor a to the IP address of the second data center.

The processor is specifically configured to receive call information for calling an HTTP interface of a DNS server, where the call information carries an access key; and verifying the main node according to the AccessKey, and switching the data center for service to a second data center when the verification is passed.

Example 13:

on the basis of the foregoing embodiments, an embodiment of the present invention further provides an electronic device 1000, as shown in fig. 10, including: the system comprises a processor 1001, a communication interface 1002, a memory 1003 and a communication bus 1004, wherein the processor 1001, the communication interface 1002 and the memory 1003 are communicated with each other through the communication bus 1004;

the memory 1003 has stored therein a computer program which, when executed by the processor 1001, causes the processor 1001 to perform the steps of:

Further, the processor 1001DNS server sends call information for calling an HTTP interface of the DNS server, where the call information carries an access key AccessKey.

Further, the processor 1001 sends a data center failure prompt message.

Further, when the processor 1001 determines that the master node where the processor is located has a fault, the processor sends second service unavailability information to each non-faulty slave node in the first data center, so that each non-faulty slave node in the first data center reselects a new master node.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface 1002 is used for communication between the electronic apparatus and other apparatuses.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor.

The processor may be a general-purpose processor, including a central processing unit, a Network Processor (NP), and the like; but may also be a Digital instruction processor (DSP), an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.

Example 14:

on the basis of the foregoing embodiments, an embodiment of the present invention further provides an electronic device 1100, as shown in fig. 11, including: the system comprises a processor 1101, a communication interface 1102, a memory 1103 and a communication bus 1004, wherein the processor 1101, the communication interface 1102 and the memory 1103 are communicated with each other through the communication bus 1104;

the memory 1103 has stored therein a computer program that, when executed by the processor 1201, causes the processor 1101 to perform the steps of:

receiving a request of a main node for calling an HTTP interface of the main node, wherein the main node is positioned in a first data center;

and switching the data center for service into a second data center, wherein the second data center and the first data center are located in different areas.

Further, the processor 1101 modifies the IP address of the first data center in the self a record to the IP address of the second data center.

Further, the processor 1101 receives call information for calling an HTTP interface of the DNS server, which is sent by the master node, where the call information carries an access key;

the processor 1101 verifies the master node according to the AccessKey, and when the verification is passed, the data center performing the service is switched to a second data center.

The communication interface 1102 is used for communication between the electronic apparatus and other apparatuses.

Example 15:

on the basis of the foregoing embodiments, an embodiment of the present invention further provides a computer-readable storage medium, in which a computer program executable by an electronic device is stored, and when the program is run on the electronic device, the electronic device is caused to execute the following steps:

the memory having stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of:

Further, the processor sends calling information for calling an HTTP interface of the DNS to the DNS, and the calling information carries an access key.

Further, the processor sends data center fault prompt information.

Further, when the processor judges that the main node where the processor is located has a fault, second service unavailability information is sent to each non-fault slave node in the first data center, so that each non-fault slave node in the first data center reselects a new main node.

The computer readable storage medium may be any available medium or data storage device that can be accessed by a processor in an electronic device, including but not limited to magnetic memory such as floppy disks, hard disks, magnetic tape, magneto-optical disks (MO), etc., optical memory such as CDs, DVDs, BDs, HVDs, etc., and semiconductor memory such as ROMs, EPROMs, EEPROMs, nonvolatile memories (NANDFLASH), Solid State Disks (SSDs), etc.

Example 16:

the memory 1103 has stored therein a computer program that, when executed by the processor 1101, causes the processor 1101 to perform the steps of:

Further, the processor switching the serving data center to the second data center comprises:

and the processor modifies the IP address of the first data center in the record of the processor A into the IP address of the second data center.

Further, the receiving, by the processor, a request for the host node to invoke its own HTTP interface includes:

receiving calling information for calling an HTTP interface of a DNS (domain name server) sent by a main node, wherein the calling information carries an access key;

the processor switching the serving data center to a second data center comprises:

and the processor verifies the main node according to the Access Key, and when the verification is passed, the data center for service is switched to a second data center.

In summary, the present invention provides a remote disaster recovery method, device, system, electronic device and storage medium, wherein the method includes: the method comprises the steps that a master node receives first service unavailable information sent by a slave node, wherein the first service unavailable information is sent when the slave node fails, and the master node and the slave node are both located in a first data center; the master node judges whether the number of the fault slave nodes is larger than a set number threshold value or not; if so, the main node sends a request for calling an HTTP interface of the DNS to the DNS, so that the DNS switches the serving data center to a second data center, and the second data center and the first data center are located in different areas. The method comprises the steps that a main node receives first service unavailable information sent by each slave node, whether the number of fault slave nodes of the data center exceeds a set number threshold value or not is judged, and when the number of the fault slave nodes exceeds the set number threshold value, an HTTP interface of a DNS server is automatically called to enable the DNS server to complete the switching of the data center, so that the data center is automatically switched to complete the remote disaster recovery.

For the system/apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.

It is to be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or operation from another entity or operation without necessarily requiring or implying any actual such relationship or order between such entities or operations.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely application embodiment, or an embodiment combining application and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A remote disaster recovery method is characterized by comprising the following steps:

2. The method of claim 1, wherein the master node sending a request to a DNS server to invoke an HTTP interface of the DNS server comprises:

and the master node sends calling information for calling an HTTP interface of the DNS to the DNS, wherein the calling information carries an access key.

3. The method of claim 1, wherein the master node is elected for each node in the first data center.

4. The method of claim 1, wherein the method further comprises:

and the main node sends data center fault prompt information.

5. The method of claim 1, wherein the method further comprises:

6. A remote disaster recovery method is characterized by comprising the following steps:

7. The method of claim 6, wherein the DNS server switching the serving data center to a second data center comprises:

8. The method of claim 6 or 7, wherein the DNS server receiving the request by the master node to invoke the own HTTP interface comprises:

9. A remote disaster recovery device, comprising:

10. A remote disaster recovery device, comprising:

11. An electronic device, comprising a memory and a processor;

12. The electronic device of claim 11, wherein the processor is configured to send second service unavailability information to each non-failed slave node in the first data center to cause each non-failed slave node in the first data center to reselect a new master node when determining that it has failed.

13. An electronic device, comprising a memory and a processor;

14. The electronic device of claim 13, wherein the processor is specifically configured to modify the IP address of the first data center in the self a record to the IP address of the second data center.

15. An electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus;

the memory has stored therein a computer program which, when executed by the processor, causes the processor to carry out the steps of the method of any one of claims 1-5.

16. An electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus;

the memory has stored therein a computer program which, when executed by the processor, causes the processor to carry out the steps of the method of any one of claims 6 to 8.

17. A computer-readable storage medium, characterized in that it stores a computer program executable by an electronic device, which program, when run on the electronic device, causes the electronic device to carry out the steps of the method according to any one of claims 1-5.

18. A computer-readable storage medium, having stored thereon a computer program executable by an electronic device, for causing the electronic device to perform the steps of the method of any one of claims 6-8, when the program is run on the electronic device.

19. A remote disaster recovery system, characterized in that it comprises a remote disaster recovery device applied to a master node of a data center according to claim 9, a slave node and a remote disaster recovery device applied to a DNS server according to claim 10.

20. The system of claim 19, wherein the slave node is further configured to send first service unavailability information to a load balancing server in the event of a failure;

and the load balancing server is used for receiving the first service unavailable information sent by the slave nodes and removing the fault slave nodes when the number of the fault slave nodes is judged not to be larger than the set number threshold.