CN115695155A

CN115695155A - Disaster recovery switching management system, method, terminal and storage medium

Info

Publication number: CN115695155A
Application number: CN202211205590.2A
Authority: CN
Inventors: 杨梅; 陈世亮; 王震; 杨磊; 唐苏; 刘勇
Original assignee: China Telecom Digital Intelligence Technology Co Ltd
Current assignee: China Telecom Digital Intelligence Technology Co Ltd
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2023-02-03

Abstract

The application provides a disaster recovery switching management system, a disaster recovery switching management method, a terminal and a storage medium, wherein the disaster recovery switching management system comprises a client data center service system, a service server and a service server, wherein the client data center service system is used for supporting the operation and maintenance of a client service system with low time delay requirement; the edge cloud production node is used for deploying a customer service system with high time delay requirement and providing operation maintenance; the disaster-backup center is used for backing up the service data of the service system of the client data center and recovering the disaster of the service; the edge cloud disaster backup node is used for backing up the service data of the edge cloud production node and recovering the disaster of the service of the edge cloud production node; the disaster recovery switching management platform is used for carrying out unified disaster recovery management on each unit module/node; the disaster recovery backup center takes over when a business system of the customer data center fails, and the edge cloud backup node takes over when an edge cloud production node fails. By the scheme, disaster recovery switching management is realized, the fault takeover efficiency is improved, and the takeover requirements of part of applications with higher requirements on time delay and bandwidth are met.

Description

Disaster recovery switching management system, method, terminal and storage medium

Technical Field

The invention relates to the technical field of cloud computing information, in particular to a disaster recovery switching management system, a disaster recovery switching management method, a terminal and a storage medium.

Background

In a traditional production and disaster recovery architecture, application-level disaster recovery is realized in the same city or in a different place, a disaster recovery management platform for disaster recovery switching is generally deployed in a disaster recovery center, and switching and taking over of a disaster recovery system are performed through the disaster recovery management platform. However, with the development of 5G networks, enterprises and institutions have many applications with very high requirements on ultra-low time delay and ultra-large bandwidth, and a business system can be constructed by combining an edge cloud mode on the basis of a traditional single data center in a deployment mode. When an application system is switched under a service architecture of a single data center and an edge cloud, the traditional management platform deployed in the remote disaster recovery cannot meet the requirements of unified management and low time delay of service disaster recovery of edge cloud nodes, so that an intelligent disaster recovery management system based on a 5G edge cloud is needed to uniformly manage the switching process and the takeover process of the service system of a production center and the edge cloud nodes.

Disclosure of Invention

The invention provides a disaster recovery switching management system, a method, a terminal and a storage medium aiming at the defects in the prior art; aiming at a business system of a single data center + edge cloud production architecture, the business system switching process and the taking-over process of the production center and the edge cloud node are managed in a unified mode, the business taking-over efficiency based on the edge cloud business system architecture is improved, and the taking-over requirements of part of applications with high requirements on time delay and bandwidth are met.

In order to achieve the purpose, the invention adopts the following technical scheme:

a disaster recovery handover management system, comprising:

the client data center service system is used for supporting the operation and maintenance of the client service system with low time delay requirement, wherein the low time delay requirement means that the time delay requirement is greater than an initially set time delay threshold value;

the edge cloud production node is used for deploying a customer service system with high time delay requirement and providing operation and maintenance; the high delay requirement means that the delay requirement is less than or equal to an initially set delay threshold;

the disaster recovery center is used for backing up the service data of the service system of the client data center and recovering the disaster of the service;

the edge cloud disaster backup node is used for backing up the service data of the edge cloud production node and performing disaster recovery of the service;

the disaster recovery switching management platform is used for carrying out unified disaster recovery switching management and scheduling on the customer data center service system, the edge cloud production nodes, the disaster recovery center and the edge cloud disaster recovery nodes; when the customer data center service system or the edge cloud production node has a fault, the disaster recovery switching management platform alarms management personnel through fault information monitored in real time, the management personnel initiates the disaster recovery center service take-over and the edge cloud service disaster backup node take-over according to the alarm information to provide a function service for replacing the fault part of the customer data center service system and the edge cloud production node, after the fault part is repaired, the disaster recovery management platform stops the service support function of the disaster recovery center or the edge cloud disaster backup center node and continues to reschedule the customer data center service system and the edge cloud production node to provide the service to the outside.

In order to optimize the technical scheme, the specific measures adopted further comprise:

further, a disaster recovery switching management method based on the disaster recovery switching management system includes the following steps:

s1: the disaster recovery switching management platform monitors the working states of a customer data center service system and an edge cloud production node in real time, and when a fault of any subsystem of the customer data center service system and the edge cloud production node is detected, the platform gives an alarm to a manager;

s2: the management personnel locate the specific occurrence position of the fault according to the alarm information, namely judge the specific fault subsystem, select the switching script corresponding to the fault position in the disaster backup center or the edge cloud disaster backup node, perform service switching according to the preset switching flow, and provide data support service by the switching script of the disaster backup center or the edge cloud disaster backup node after switching;

s3: after the fault position is repaired, the manager performs service back-switching through the disaster recovery switching management platform, namely, the disaster recovery center or the edge cloud disaster recovery center node service support function is stopped, and the customer data center service system and the edge cloud production node continue to provide services to the outside.

Further, in step S2, if a fault scenario of the edge cloud production node is involved, the disaster recovery switching system selects an edge cloud backup node that is less than or equal to the initial set delay threshold, that is, selects an edge cloud backup node with a low delay requirement.

Further, a terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method as claimed in any one of the above when executing the computer program.

Further, a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of the preceding claims.

The invention has the beneficial effects that: by means of cooperation of the disaster recovery switching platform and the 5G MEC edge cloud (edge cloud service production node and edge cloud disaster backup node), the 5G MEC edge cloud disaster backup node which can meet experience requirements of service on time delay, bandwidth and the like is matched according to the matching degree of service important information during disaster recovery by utilizing the characteristics that the 5G MEC edge cloud is wide in distribution and close to a production center, and switching and back switching are performed according to a preset switching flow, so that while service continuity is guaranteed, requirements of service on experience aspects of ultra-low time delay, ultra-high bandwidth and the like are guaranteed, and use experience of a service system is guaranteed.

Drawings

FIG. 1 is a schematic flow diagram of the overall scheme of the present invention.

Detailed Description

The present invention will now be described in further detail with reference to the accompanying drawings.

Referring to fig. 1, the main scheme of the present application is as follows:

a disaster recovery handover management system, comprising:

module 1: production center (customer data center business system). And supporting the operation and maintenance of the customer service system. And simultaneously, data backup is carried out with the disaster backup center.

And (3) module 2: disaster recovery center. The method is used for backing up business data of the production center and performing disaster recovery of the business. And if the 5G MEC edge cloud node performs service recovery, sending the service backup data and the backup system to the 5G MEC edge cloud node.

And a module 3: and 5G MEC edge cloud service production nodes. The method is used for deploying the service system with higher requirement on time delay.

And (4) module: and a disaster recovery switching platform. Unified disaster tolerance switching management and scheduling are realized for nodes of a production center, an edge cloud production center, a disaster recovery center and an edge cloud disaster recovery center. The production center business is taken over by the disaster recovery center, the edge cloud production node is taken over by the edge cloud business disaster backup node, the specific taking over sequence and taking over script are configured in advance by the disaster recovery management platform, when the production center and the edge cloud production node have faults, the disaster recovery switching management platform carries out alarming after monitoring the faults through monitoring data, and a manager initiates the disaster recovery center business taking over and the edge cloud business disaster backup node taking over according to alarming information. After the fault is repaired, the disaster recovery management platform can initiate the switch-back according to the preset flow.

And a module 5: and an MEC management platform. Unified management and scheduling of the MEC edge cloud nodes are achieved, including state monitoring and service pulling of edge cloud production nodes and disaster recovery backup nodes, and specific pulling scripts are configured on a disaster recovery switching management platform.

And a module 6: and the plurality of 5G MEC edge cloud service disaster recovery nodes. The method is used for deploying the disaster recovery system of the service system with higher time delay requirement. When a disaster occurs, the suitable edge cloud disaster backup nodes are automatically recommended to provide services for the outside through the management of the disaster recovery switching management platform and the requirements on time delay and bandwidth.

A disaster recovery switching management method, the specific disaster recovery switching process is:

1. and the disaster recovery switching management platform monitors the states of the production center service system and the edge cloud production node.

2. And when the disaster recovery switching management platform detects that the client production center or the edge cloud production node has a fault, the platform gives an alarm to a manager.

3. And the manager positions the fault according to the alarm information, selects a disaster recovery switching process and performs system switching.

4. And the disaster recovery management platform respectively executes disaster recovery center connection and edge cloud disaster recovery backup node connection according to a preset flow and a switching script, and acquires a switching state. Under the scene of edge cloud faults, the disaster recovery switching system can automatically select edge cloud disaster recovery nodes meeting the low time delay requirement and automatically execute the switching script execution. After switching, the display service in the disaster backup architecture diagram of the disaster recovery management platform is providing service to the outside by the disaster backup center and the edge cloud disaster backup node.

5. After the fault of the production center is repaired, managers perform service switching back through the disaster recovery management platform, stop disaster recovery centers and edge cloud service disaster recovery nodes, and start the production center and the edge cloud production center nodes. The disaster tolerance management platform architecture graph display business is providing services from a production center and edge cloud production nodes.

The following is further detailed by way of specific embodiments:

1. the management personnel configure the service taking-over flow and the script of the disaster recovery center and the edge cloud disaster recovery center in advance through the system, and the service taking-over flow and the script specifically comprise the scripts and state feedback of the types of middleware, a database, an application server and the like. The back-cut script comprises a production center, a business takeover flow and a script of an edge cloud production node, and specifically comprises a middleware, a database, an application server and other types of scripts and state feedback.

2. And the customer production center reports service monitoring information including service flow, service to experience requirements such as time delay, bandwidth and the like to the switching management platform.

3. And the customer edge cloud production node reports service monitoring information including service flow, service requirements on time delay, bandwidth and the like to the switching management platform.

4. And the disaster recovery switching management platform receives the service monitoring information of the production center and the edge cloud production node and evaluates the service state.

5. When a fault occurs in a customer production center and an edge cloud, the disaster recovery switching management platform generates alarm information according to a preset rule, and the alarm information is sent to the disaster recovery management platform of related management personnel through the platform, mails and short messages.

6. And the management personnel initiates automatic switching according to a switching script automatically recommended by the system, the disaster recovery switching platform switches the service system according to a preset script, and displays an access flow chart, a switching flow and an execution state of the service system through a visual large screen. Under the scene of edge cloud faults, the disaster recovery switching system can automatically select edge cloud disaster recovery nodes meeting the low time delay requirement and automatically execute the switching scripts.

7. After the production fault is repaired, a manager initiates a back switch according to a switching script automatically recommended by the system, the disaster recovery switching platform switches the service system according to a preset script, and displays an access flow chart, a switching flow and an execution state of the service system through a visual large screen.

It should be noted that the terms "upper", "lower", "left", "right", "front", "back", etc. used in the present invention are for clarity of description only, and are not intended to limit the scope of the present invention, and the relative relationship between the terms and the terms is not limited by the technical contents of the essential changes.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to those skilled in the art without departing from the principles of the present invention may be apparent to those skilled in the relevant art and are intended to be within the scope of the present invention.

Claims

1. A disaster recovery handover management system, comprising:

the disaster recovery switching management platform is used for carrying out unified disaster recovery switching management and scheduling on a customer data center service system, edge cloud production nodes, a disaster recovery center and edge cloud disaster recovery nodes; when the customer data center service system or the edge cloud production node has a fault, the disaster recovery switching management platform alarms management personnel through fault information monitored in real time, the management personnel initiates the disaster recovery center service take-over and the edge cloud service disaster backup node take-over according to the alarm information to provide a function service for replacing the fault part of the customer data center service system and the edge cloud production node, after the fault part is repaired, the disaster recovery management platform stops the service support function of the disaster recovery center or the edge cloud disaster backup center node and continues to reschedule the customer data center service system and the edge cloud production node to provide the service to the outside.

2. A disaster recovery switching management method based on the disaster recovery switching management system as claimed in claim 1, comprising the steps of:

s2: the management personnel position the specific occurrence position of the fault according to the alarm information, namely judge the specific fault subsystem, select a switching script corresponding to the fault position in the disaster recovery center or the edge cloud disaster recovery node, perform service switching according to a preset switching flow, and provide data support service by the switching script of the disaster recovery center or the edge cloud disaster recovery node after switching;

3. The method according to claim 2, wherein in step S2, if a fault scenario of the edge cloud production node is involved, the disaster recovery switching system selects an edge cloud backup node that is less than or equal to an initial set delay threshold, that is, selects an edge cloud backup node with a low delay requirement.

4. A terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 2 to 3 when executing the computer program.

5. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of a method according to any one of claims 2 to 3.