CN114095343A - Disaster recovery method, device, equipment and storage medium based on double-active system - Google Patents

Disaster recovery method, device, equipment and storage medium based on double-active system Download PDF

Info

Publication number
CN114095343A
CN114095343A CN202111369384.0A CN202111369384A CN114095343A CN 114095343 A CN114095343 A CN 114095343A CN 202111369384 A CN202111369384 A CN 202111369384A CN 114095343 A CN114095343 A CN 114095343A
Authority
CN
China
Prior art keywords
execution
task node
acquiring
machine room
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111369384.0A
Other languages
Chinese (zh)
Inventor
骆国军
尹彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202111369384.0A priority Critical patent/CN114095343A/en
Publication of CN114095343A publication Critical patent/CN114095343A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention relates to the technical field of digital medical treatment and artificial intelligence, and particularly discloses a disaster recovery method, a disaster recovery device, disaster recovery equipment and a storage medium based on a dual-active system. The method comprises the following steps: when a disaster of a first machine room is detected, acquiring configuration information of the first machine room, matching a preset fault scene according to the configuration information, and acquiring a corresponding directed acyclic execution flow according to the fault scene; acquiring an execution request sent by a client, and generating an execution instruction of each task node according to directed acyclic execution flow based on the execution request; acquiring a mapping relation between a corresponding task node and an execution server and corresponding execution script information according to the execution instruction; and determining a target execution server according to the mapping relation and running execution script information on the target execution server to realize switching the first machine room to the second machine room. Through the mode, the invention can improve the fault response speed and solve the problem of data loss and damage caused by man-made or natural disasters.

Description

Disaster recovery method, device, equipment and storage medium based on double-active system
Technical Field
The invention relates to the technical field of digital medical treatment and artificial intelligence, in particular to a disaster recovery method, a disaster recovery device, disaster recovery equipment and a storage medium based on a dual-active system.
Background
With the rapid development of big data and cloud computing, data shows a big explosive growth, and the growth of data and the requirement of a user on the safety of data are increased. When a data system fails due to human or natural disasters, a technical person in the technical field of operation and maintenance needs to solve the problem of how to quickly respond to the system failure to ensure the safety of service data of a user.
The existing operation and maintenance technology depends on open-source Apache airflow middleware, script configuration is needed to be carried out on each system in the existing operation and maintenance system, the script configuration amount is very large, the existing script configuration usually adopts a manual configuration mode, the configuration workload is very large, errors are easy to occur, and if no tool is used for supporting the arrangement and configuration of scripts, and a preset disaster is met, the operation and maintenance technology cannot respond quickly. In addition, each system involves a plurality of components, each component needs to be configured with fixed operation and maintenance personnel, and in the scene of manual script execution, the operation and maintenance personnel of each component are needed to execute cooperatively, so that the automation degree is low, and the cost is high and the efficiency is low.
Disclosure of Invention
The invention provides a disaster recovery method, a disaster recovery device, equipment and a storage medium based on a double-active system, which can improve the response speed of faults and solve the problem of data loss and damage caused by artificial or natural disasters.
In order to solve the technical problems, the invention adopts a technical scheme that: the disaster recovery method based on the double-active system comprises the following steps:
when detecting that a disaster occurs in a first machine room providing data service, acquiring configuration information of the first machine room, matching a preset fault scene according to the configuration information, and acquiring a corresponding directed acyclic execution flow according to the fault scene;
acquiring an execution request sent by a client, and generating an execution instruction of each task node according to the directed acyclic execution flow based on the execution request;
acquiring a mapping relation between the corresponding task node and an execution server and execution script information of the corresponding task node according to the execution instruction;
and determining a target execution server according to the mapping relation, sending the execution script information to the target execution server, and running the execution script information on the target execution server to realize switching the first machine room to the second machine room.
According to an embodiment of the present invention, before the obtaining configuration information of a first machine room providing data services when a disaster occurs in the first machine room is detected, matching a preset fault scenario according to the configuration information, and obtaining a corresponding directed acyclic execution flow according to the fault scenario, the method further includes:
pre-constructing a plurality of fault scenes;
a plurality of task nodes and an execution sequence of the task nodes are pre-configured for each fault scene, and a directed acyclic execution flow corresponding to the fault scene is generated according to the task nodes and the execution sequence;
and carrying out association mapping on each fault scene and the corresponding directed acyclic execution flow.
According to an embodiment of the present invention, the pre-configuring, for each of the failure scenarios, a plurality of task nodes and an execution sequence of the task nodes, and the generating a directed acyclic execution flow corresponding to the failure scenario according to the task nodes and the execution sequence includes:
for a fault scene that the first machine room has a disaster, pre-configuring a first task node, a second task node, a third task node, a fourth task node and a fifth task node with execution script information of each task node;
and configuring the execution sequence according to the arrangement sequence of the first task node, the second task node, the third task node, the fourth task node and the fifth task node, and generating the directed acyclic execution flow according to the task nodes and the execution sequence.
According to an embodiment of the present invention, the obtaining an execution request sent by a client, and generating an execution instruction of each task node according to the directed acyclic execution flow based on the execution request includes:
acquiring an execution request sent by a client, splitting the directed acyclic execution flow according to the execution request, and acquiring a plurality of task nodes and an execution sequence of the task nodes;
and generating a corresponding execution instruction according to each task node and sequentially transmitting the execution instructions according to the execution sequence.
According to an embodiment of the present invention, the generating a corresponding execution instruction according to each task node and transmitting the execution instruction according to the execution sequence further includes:
acquiring an execution state of a current task node, and judging whether the execution state is a completion state;
and if the execution state of the current task node is the completion state, transmitting a next execution instruction according to the execution sequence until the execution state of the last task node of the execution sequence is the completion state.
According to an embodiment of the present invention, before obtaining the execution state of the current task node, the method further includes:
and after the execution script information is operated on the target execution server, acquiring an operation result, determining the execution state of the task node according to the operation result, and storing the execution state in a database.
According to an embodiment of the present invention, the passing the next execution instruction according to the execution order until the execution state of the last task node in the execution order is the completion state further includes:
and acquiring the transmission state of each execution instruction, and storing the transmission state in a database.
In order to solve the technical problem, the invention adopts another technical scheme that: the disaster recovery device based on the double-active system comprises:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring configuration information of a first machine room providing data service when detecting that a disaster occurs in the first machine room, matching a preset fault scene according to the configuration information and acquiring a corresponding directed acyclic execution flow according to the fault scene;
the generating module is used for acquiring an execution request sent by a client and generating an execution instruction of each task node according to the directed acyclic execution flow based on the execution request;
the second acquisition module is used for acquiring the mapping relation between the corresponding task node and the execution server and the execution script information of the corresponding task node according to the execution instruction;
and the sending and switching module is used for determining a target execution server according to the mapping relation, sending the execution script information to the target execution server, and running the execution script information on the target execution server to realize switching the first machine room to the second machine room.
In order to solve the technical problems, the invention adopts another technical scheme that: there is provided a computer device comprising: the disaster recovery method based on the dual-active system comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the disaster recovery method based on the dual-active system when executing the computer program.
In order to solve the technical problems, the invention adopts another technical scheme that: there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described disaster recovery method based on a dual active system.
The invention has the beneficial effects that: when a disaster occurs, the fault scene can be matched in time and directed acyclic execution streams can be automatically acquired, the automatic arrangement requirements under large-scale servers and complex service scenes in the operation and maintenance field can be met, the arrangement and configuration of field scripts are not needed, the level requirements on operation and maintenance personnel are reduced, the fault response speed is effectively improved, and the problem of data loss and damage caused by artificial or natural disasters is solved.
Drawings
Fig. 1 is a schematic flow chart of a disaster recovery method based on a dual-active system according to a first embodiment of the present invention;
fig. 2 is a schematic flowchart of step S102 in the disaster recovery method based on dual active systems according to the first embodiment of the present invention;
fig. 3 is a schematic flow chart of a disaster recovery method based on a dual-active system according to a second embodiment of the present invention;
fig. 4 is a schematic flowchart of step S302 in the disaster recovery method based on dual active systems according to the second embodiment of the present invention;
fig. 5 is a schematic structural diagram of a disaster recovery device based on a dual-active system according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a computer device according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a computer storage medium according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first", "second" and "third" in the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise. All directional indicators (such as up, down, left, right, front, and rear … …) in the embodiments of the present invention are only used to explain the relative positional relationship between the components, the movement, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicator is changed accordingly. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The disaster recovery method is realized based on a dual-active system. The double-active system comprises a disaster recovery module, a first machine room and a second machine room, wherein the first machine room and the second machine room are used for providing data service, and the first machine room and the second machine room are both connected with the disaster recovery module. Further, the disaster recovery module comprises: onedrive, Airflow, RabbitMQ, battery Worker, executive server, and Mysql. The onedrive can perform operations such as component management, script configuration, execution drive configuration, directed acyclic execution flow configuration and the like. In this embodiment, the Airflow is a programmable, scheduling and monitoring workflow platform, and based on a directed acyclic execution flow (i.e. directed acyclic graph, DAG), the Airflow can define a group of dependent tasks, which are executed in sequence according to the dependency. The Airflow provides a rich command line tool for system management and control, and a Web management interface of the Airflow can conveniently manage and control scheduling tasks, and real-time monitoring is carried out on the running state of the tasks, so that operation, maintenance and management of the system are facilitated. In the embodiment, ordered task nodes, corresponding execution script information and the mapping relation between the task nodes and the execution server can be arranged by using onedrive. The Airflow is a task flow execution engine and is used for receiving an execution request, generating an execution instruction according to directed acyclic execution flow, transmitting the execution instruction to a RabbitMQ, wherein the RabbitMQ is a message forwarding middleware and is used for receiving the execution instruction and forwarding the execution instruction to a cell Worker, the cell Worker is a work node and is used for receiving the execution instruction, acquiring a mapping relation between the task nodes and a target execution server (one task node corresponds to one target server) according to the execution instruction, sending the execution script information to the target execution server for execution and running the execution script information on the target execution server, and simultaneously, the cell Worker stores a running result in a database. In this embodiment, Mysql is used as a database, and can receive a command and perform a corresponding operation, and the command may include numerous operations such as deleting a file and acquiring file content, and the written command is an SQL statement. In an implementation mode, two Mysql are configured, wherein one Mysql is used for storing the transfer state of the Airflow transfer execution instruction and the operation result of the target execution server for operating the execution script information, and the other Mysql is used for storing the sending state of the Onelive sending execution request. The dual-active system of the embodiment combines onedrive and Airflow, and can switch between two sets of data service systems when a fault occurs.
Fig. 1 is a schematic flow chart of a disaster recovery method based on a dual-active system according to a first embodiment of the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 1 if the results are substantially the same. As shown in fig. 1, the method comprises the steps of:
step S101: when detecting that a disaster occurs in a first machine room providing data service, acquiring configuration information of the first machine room, matching a preset fault scene according to the configuration information, and acquiring a corresponding directed acyclic execution flow according to the fault scene.
In step S101, the dual active system includes a first machine room and a second machine room, both the first machine room and the second machine room of this embodiment are used for providing data services, and the first machine room may be an active machine room and the second machine room is a standby machine room. The configuration information of the computer room includes a composition architecture of the computer room, for example, a plurality of servers are configured in the first computer room, and the types of the servers include a traffic distribution program, a Web program, a database, and the like. The method comprises the steps that a user can preset common fault scenes by using an onedrive, and pre-configure corresponding directed acyclic execution flows for each fault scene, when a disaster happens, the preset directed acyclic execution flows can be obtained according to the fault scenes, and the system can execute according to the directed acyclic execution flows to avoid influences brought by faults. In the embodiment, one-key switching can be realized among different fault scenes. The coping strategy for the fault scene of the directed acyclic execution flow of the embodiment is composed of task nodes with a certain execution sequence, the task nodes are automatically twisted in the process of executing the directed acyclic execution flow, manual operation is not needed, and the response speed and the execution accuracy are improved.
Step S102: and acquiring an execution request sent by the client, and generating an execution instruction of each task node according to the directed acyclic execution flow based on the execution request.
In step S102, after the dual-active system automatically detects that a disaster occurs in the first machine room and acquires a corresponding coping strategy (directed acyclic execution flow), the user may trigger the execution of the directed acyclic execution flow by one key through the client. After the Airflow acquires the execution request, an execution instruction is generated according to the directed acyclic execution flow based on the execution request, the execution instruction is transmitted to the RabbitMQ, and the RabbitMQ forwards the execution instruction to the cell Worker. The embodiment also stores the transmission state of the execution instruction in the database after the execution instruction is transmitted, so as to inquire the execution state of the directed acyclic execution flow in real time.
Further, referring to fig. 2, step S102 further includes the following steps:
step S201: the method comprises the steps of obtaining an execution request sent by a client, splitting a directed acyclic execution flow according to the execution request, and obtaining a plurality of task nodes and an execution sequence of the task nodes.
Step S202: and generating corresponding execution instructions according to each task node and sequentially transmitting the execution instructions according to the execution sequence.
In step S202, the execution state of the current task node is obtained, and it is determined whether the execution state is a completion state; if the execution state of the current task node is the completion state, transmitting the next execution instruction according to the execution sequence until the execution state of the last task node of the execution sequence is the completion state; and if the execution state of the current task node is an incomplete state, suspending the transmission of the next execution instruction until the execution state of the current task node is a complete state. In the embodiment, after the execution instruction is transmitted each time, the transmission state of the execution instruction is further acquired, and the transmission state is stored in the database. The delivery status includes a delivery success status and a delivery failure status.
Step S103: and acquiring the mapping relation between the corresponding task node and the execution server and the execution script information of the corresponding task node according to the execution instruction.
In step S103, a task node corresponds to an execution server, and the execution script information corresponding to the task node is executed on the corresponding execution server. The onedrive can arrange the mapping relationship between the task nodes and the execution server and the execution script information of the corresponding task nodes, and the embodiment directly acquires the execution server of the task node corresponding to the execution instruction and the corresponding execution script information from the onedrive after receiving the execution instruction.
Step S104: and determining a target execution server according to the mapping relation, sending the execution script information to the target execution server, and running the execution script information on the target execution server to realize switching the first machine room to the second machine room.
In step S104, the battery Worker connects to the target execution server through SSH, which is a security protocol established on the basis of the application layer and the transport layer. SSH is currently a reliable protocol that provides security for telnet sessions and other web services. The SSH protocol can effectively prevent the problem of information leakage in the remote management process. The cell Worker uses a paramiko library of Python to control the execution server, supports the background to configure connected users, passwords or password-free, and supports sudo and su operations. In this embodiment, the execution script information is run on the target execution server of each task node according to the directed acyclic execution flow, and the execution of all task nodes is completed to realize switching the first machine room to the second machine room.
Further, after the executing script information is run on the target executing server, the embodiment further includes: and acquiring an operation result, determining the execution state of the task node according to the operation result, and storing the execution state in a database. In this embodiment, the execution script information is executed on the target execution server, and if the execution is successful, it indicates that the task node is completed, and if the execution is failed, it indicates that the task node is not completed, and the execution needs to be terminated. In the embodiment, the execution state of each task node is stored in the database, so that the execution result of each task node in the directed acyclic execution flow is convenient to query and monitor.
The disaster recovery method based on the dual-active system in the first embodiment of the invention can meet the automatic arrangement requirements of large-scale servers and complex service scenes in the operation and maintenance field by matching fault scenes in time and automatically acquiring directed acyclic execution streams when a disaster occurs, does not need arrangement and configuration of field scripts, reduces the level requirements on operation and maintenance personnel, effectively improves the fault response speed, and solves the problem of data loss and damage caused by artificial or natural disasters.
Fig. 3 is a schematic flow chart of a disaster recovery method based on a dual-active system according to a second embodiment of the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 3 if the results are substantially the same. As shown in fig. 3, the method comprises the steps of:
step S301: and pre-constructing a plurality of fault scenes.
In step S301, the failure scenario includes a disaster occurring in any computer room in the dual active system, which results in that the system cannot provide data services normally. In an application scenario, it is assumed that the dual active system has 10 servers, wherein 5 servers are in Shenzhen machine room, 5 servers are in Shanghai machine room, and the classification of the 5 servers is as follows: the system comprises 2 flow distribution programs, 2 Web programs and 1 database, wherein the databases of Shenzhen and Shanghai are master-slave architectures, and the database of Shenzhen is a master database. The initial state provides data service for Shenzhen and Shanghai. In a failure scenario, when a Shenzhen computer room has a disaster, all data traffic needs to be switched from the Shenzhen computer room to the Shanghai computer room.
Step S302: and pre-configuring a plurality of task nodes and the execution sequence of the task nodes for each fault scene, and generating a directed acyclic execution flow corresponding to the fault scene according to the task nodes and the execution sequence.
In step S302, a common fault scenario may be preset by using onedrive, task nodes are preconfigured according to the fault scenario, and each task node is designated to execute according to a certain sequence. For example, the task nodes are preconfigured according to the fault scene and include a, B and C, and a, B and C can be executed sequentially or in parallel, for example, A, B is executed simultaneously, C is executed after A, B is executed, or A, B, C is executed sequentially, a directed acyclic execution flow is generated according to the task nodes and the execution sequence, and the directed acyclic execution flow can be executed by Airflow.
Further, referring to fig. 4, step S302 further includes:
step S401: for a fault scene of a disaster occurring in the first machine room, a first task node, a second task node, a third task node, a fourth task node and a fifth task node are pre-configured with execution script information of each task node.
In step S401, when a disaster occurs in the first machine room in the failure scenario, the first machine room is configured with a plurality of servers, and the types of the servers include a traffic distribution program, a Web program, and a database, the first task node is configured to suspend the traffic distribution program, the second task node is configured to upgrade the database level, the third task node is configured to switch a main database accessed by the Web program, the fourth task node is configured to resume the traffic distribution program, and the fifth task node is configured to switch operator traffic distribution.
Step S402: and configuring an execution sequence according to the arrangement sequence of the first task node, the second task node, the third task node, the fourth task node and the fifth task node, and generating a directed acyclic execution flow according to the task nodes and the execution sequence.
Specifically, assuming that the first machine room is a Shenzhen machine room and the second machine room is a Shanghai machine room, when a disaster occurs in the Shenzhen machine room, all data traffic needs to be switched from the Shenzhen machine room to the Shanghai machine room in a fault scenario, and since the master database is in Shenzhen, in order to avoid data inconsistency, a first task node needs to be executed first, that is, traffic distribution is suspended; and then executing a second task node, promoting the Shanghai database to be a main database, detecting the consistency of data, executing a third task node, switching the main database accessed by the Shanghai Web program to the Shanghai main database, executing a fourth task node, recovering the Shanghai flow distribution program, and finally executing a fifth task node, and completely switching the operator flow distribution to the Shanghai flow distribution program. In order to improve the success rate of switching, the operations must be performed in the above order, for example, manually, the temporary operations are slow and have high error rate.
Step S303: and carrying out association mapping on each fault scene and the corresponding directed acyclic execution flow.
The fault scenes and the directed acyclic executive flows are in one-to-one correspondence, one fault scene is preconfigured with one coping strategy, namely the directed acyclic executive flows, when a fault occurs, the fault scene is determined, the coping strategy can be automatically and directly called and executed, on-site script information configuration is not needed, the fault response speed is effectively improved, user data loss is avoided, and user experience is improved.
Step S304: when detecting that a disaster occurs in a first machine room providing data service, acquiring configuration information of the first machine room, matching a preset fault scene according to the configuration information, and acquiring a corresponding directed acyclic execution flow according to the fault scene.
In this embodiment, step S304 in fig. 3 is similar to step S101 in fig. 1, and for brevity, is not described herein again.
Step S305: and acquiring an execution request sent by the client, and generating an execution instruction of each task node according to the directed acyclic execution flow based on the execution request.
In this embodiment, step S305 in fig. 3 is similar to step S102 in fig. 1, and for brevity, is not described herein again.
Step S306: and acquiring a mapping relation between the corresponding task node and the execution server and execution script information of the corresponding task node according to the execution instruction.
In this embodiment, step S306 in fig. 3 is similar to step S103 in fig. 1, and for brevity, is not described herein again.
Step S307: and determining a target execution server according to the mapping relation, sending the execution script information to the target execution server, and running the execution script information on the target execution server to realize switching the first machine room to the second machine room.
In this embodiment, step S307 in fig. 3 is similar to step S104 in fig. 1, and for brevity, is not described herein again.
On the basis of the first embodiment, the disaster recovery method based on the dual-active system according to the second embodiment of the present invention presets the fault scenario caused by the disaster and pre-configures the corresponding directed acyclic execution flow before the disaster occurs, so that the fault scenario can be matched in time and the directed acyclic execution flow can be automatically obtained when the disaster occurs, the arrangement and configuration of the field script are not needed, the fault response speed is effectively increased, and the problem of data loss and damage caused by artificial or natural disasters is solved.
Fig. 5 is a schematic structural diagram of a disaster recovery device based on a dual-active system according to an embodiment of the present invention. As shown in fig. 5, the apparatus 50 includes a first obtaining module 51, a generating module 52, a second obtaining module 53, and a sending and switching module 54.
The first obtaining module 51 is configured to, when detecting that a disaster occurs in a first machine room that provides data services, obtain configuration information of the first machine room, match a preset fault scenario according to the configuration information, and obtain a corresponding directed acyclic execution flow according to the fault scenario;
the generating module 52 is configured to obtain an execution request sent by the client, and generate an execution instruction of each task node according to the directed acyclic execution flow based on the execution request;
the second obtaining module 53 is configured to obtain, according to the execution instruction, a mapping relationship between a corresponding task node and the execution server and execution script information of the corresponding task node;
the sending and switching module 54 is configured to determine a target execution server according to the mapping relationship, send the execution script information to the target execution server, and run the execution script information on the target execution server to implement switching the first machine room to the second machine room.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown in fig. 6, the computer device 60 includes a processor 61 and a memory 62 coupled to the processor 61.
The memory 62 stores program instructions for implementing the disaster recovery method based on dual active systems according to any of the embodiments described above.
The processor 61 is operative to execute program instructions stored by the memory 62 to implement disaster recovery.
The processor 61 may also be referred to as a CPU (Central Processing Unit). The processor 61 may be an integrated circuit chip having signal processing capabilities. The processor 61 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a computer storage medium according to an embodiment of the present invention. The computer storage medium of the embodiment of the present invention stores a program file 71 capable of implementing all the methods described above, wherein the program file 71 may be stored in the computer storage medium in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned computer storage media include: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is only a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A disaster recovery method based on a double-active system is characterized by comprising the following steps:
when detecting that a disaster occurs in a first machine room providing data service, acquiring configuration information of the first machine room, matching a preset fault scene according to the configuration information, and acquiring a corresponding directed acyclic execution flow according to the fault scene;
acquiring an execution request sent by a client, and generating an execution instruction of each task node according to the directed acyclic execution flow based on the execution request;
acquiring a mapping relation between the corresponding task node and an execution server and execution script information of the corresponding task node according to the execution instruction;
and determining a target execution server according to the mapping relation, sending the execution script information to the target execution server, and running the execution script information on the target execution server to realize switching the first machine room to the second machine room.
2. The disaster recovery method according to claim 1, wherein before the step of acquiring configuration information of a first machine room providing data services when a disaster occurs in the first machine room is detected, matching a preset fault scenario according to the configuration information, and acquiring a corresponding directed acyclic execution flow according to the fault scenario, the method further comprises:
pre-constructing a plurality of fault scenes;
a plurality of task nodes and an execution sequence of the task nodes are pre-configured for each fault scene, and a directed acyclic execution flow corresponding to the fault scene is generated according to the task nodes and the execution sequence;
and carrying out association mapping on each fault scene and the corresponding directed acyclic execution flow.
3. The disaster recovery method according to claim 2, wherein the step of pre-configuring a plurality of task nodes and an execution sequence of the task nodes for each of the failure scenarios, and the step of generating a directed acyclic execution flow corresponding to the failure scenario according to the task nodes and the execution sequence comprises:
for a fault scene that the first machine room has a disaster, pre-configuring a first task node, a second task node, a third task node, a fourth task node and a fifth task node with execution script information of each task node;
and configuring the execution sequence according to the arrangement sequence of the first task node, the second task node, the third task node, the fourth task node and the fifth task node, and generating the directed acyclic execution flow according to the task nodes and the execution sequence.
4. The disaster recovery method according to claim 1, wherein the obtaining of the execution request sent by the client, and the generating of the execution instruction of each task node according to the directed acyclic execution flow based on the execution request comprises:
acquiring an execution request sent by a client, splitting the directed acyclic execution flow according to the execution request, and acquiring a plurality of task nodes and an execution sequence of the task nodes;
and generating a corresponding execution instruction according to each task node and sequentially transmitting the execution instructions according to the execution sequence.
5. The disaster recovery method according to claim 1, wherein said generating a corresponding execution instruction according to each of said task nodes and transmitting said execution instruction according to said execution sequence further comprises:
acquiring an execution state of a current task node, and judging whether the execution state is a completion state;
and if the execution state of the current task node is the completion state, transmitting a next execution instruction according to the execution sequence until the execution state of the last task node of the execution sequence is the completion state.
6. The disaster recovery method according to claim 5, further comprising, before obtaining the execution state of the current task node:
and after the execution script information is operated on the target execution server, acquiring an operation result, determining the execution state of the task node according to the operation result, and storing the execution state in a database.
7. The disaster recovery method according to claim 5, wherein said passing the next execution instruction according to the execution sequence until the execution state of the last task node in the execution sequence is the completion state further comprises:
and acquiring the transmission state of each execution instruction, and storing the transmission state in a database.
8. A disaster recovery device based on a dual-active system is characterized by comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring configuration information of a first machine room providing data service when detecting that a disaster occurs in the first machine room, matching a preset fault scene according to the configuration information and acquiring a corresponding directed acyclic execution flow according to the fault scene;
the generating module is used for acquiring an execution request sent by a client and generating an execution instruction of each task node according to the directed acyclic execution flow based on the execution request;
the second acquisition module is used for acquiring the mapping relation between the corresponding task node and the execution server and the execution script information of the corresponding task node according to the execution instruction;
and the sending and switching module is used for determining a target execution server according to the mapping relation, sending the execution script information to the target execution server, and running the execution script information on the target execution server to realize switching the first machine room to the second machine room.
9. A computer device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the disaster recovery method based on dual active systems according to any of claims 1 to 7 when executing the computer program.
10. A computer storage medium, on which a computer program is stored, which, when being executed by a processor, implements the dual active system based disaster recovery method according to any one of claims 1 to 7.
CN202111369384.0A 2021-11-18 2021-11-18 Disaster recovery method, device, equipment and storage medium based on double-active system Pending CN114095343A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111369384.0A CN114095343A (en) 2021-11-18 2021-11-18 Disaster recovery method, device, equipment and storage medium based on double-active system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111369384.0A CN114095343A (en) 2021-11-18 2021-11-18 Disaster recovery method, device, equipment and storage medium based on double-active system

Publications (1)

Publication Number Publication Date
CN114095343A true CN114095343A (en) 2022-02-25

Family

ID=80301729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111369384.0A Pending CN114095343A (en) 2021-11-18 2021-11-18 Disaster recovery method, device, equipment and storage medium based on double-active system

Country Status (1)

Country Link
CN (1) CN114095343A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116582618A (en) * 2023-07-13 2023-08-11 天津金城银行股份有限公司 Method and device for realizing high availability of electric pin, machine room management platform and computer

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160026542A1 (en) * 2014-07-22 2016-01-28 Cisco Technology Inc. Pre-Computation of Backup Topologies in Computer Networks
WO2017125015A1 (en) * 2016-01-18 2017-07-27 中兴通讯股份有限公司 Method for processing workflow of distributed system and workflow engine system
US20190095293A1 (en) * 2016-07-27 2019-03-28 Tencent Technology (Shenzhen) Company Limited Data disaster recovery method, device and system
CN111338858A (en) * 2020-02-18 2020-06-26 中国工商银行股份有限公司 Disaster recovery method and device for double machine rooms
CN111698152A (en) * 2019-03-15 2020-09-22 华为技术有限公司 Fault protection method, node and storage medium
CN111897671A (en) * 2020-07-23 2020-11-06 平安证券股份有限公司 Failure recovery method, computer device, and storage medium
CN112181724A (en) * 2020-09-23 2021-01-05 支付宝(杭州)信息技术有限公司 Big data disaster tolerance method and device and electronic equipment
CN112291082A (en) * 2020-09-30 2021-01-29 北京大米科技有限公司 Computer room disaster recovery processing method, terminal and storage medium
CN112463440A (en) * 2020-11-13 2021-03-09 中国建设银行股份有限公司 Disaster recovery switching method, system, storage medium and computer equipment
US20210248042A1 (en) * 2020-02-06 2021-08-12 Bank Of America Corporation Multi-layered disaster recovery manager
CN113419952A (en) * 2021-06-22 2021-09-21 中国联合网络通信集团有限公司 Cloud service management scene testing device and method

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160026542A1 (en) * 2014-07-22 2016-01-28 Cisco Technology Inc. Pre-Computation of Backup Topologies in Computer Networks
WO2017125015A1 (en) * 2016-01-18 2017-07-27 中兴通讯股份有限公司 Method for processing workflow of distributed system and workflow engine system
US20190095293A1 (en) * 2016-07-27 2019-03-28 Tencent Technology (Shenzhen) Company Limited Data disaster recovery method, device and system
CN111698152A (en) * 2019-03-15 2020-09-22 华为技术有限公司 Fault protection method, node and storage medium
US20210248042A1 (en) * 2020-02-06 2021-08-12 Bank Of America Corporation Multi-layered disaster recovery manager
CN111338858A (en) * 2020-02-18 2020-06-26 中国工商银行股份有限公司 Disaster recovery method and device for double machine rooms
CN111897671A (en) * 2020-07-23 2020-11-06 平安证券股份有限公司 Failure recovery method, computer device, and storage medium
CN112181724A (en) * 2020-09-23 2021-01-05 支付宝(杭州)信息技术有限公司 Big data disaster tolerance method and device and electronic equipment
CN112291082A (en) * 2020-09-30 2021-01-29 北京大米科技有限公司 Computer room disaster recovery processing method, terminal and storage medium
CN112463440A (en) * 2020-11-13 2021-03-09 中国建设银行股份有限公司 Disaster recovery switching method, system, storage medium and computer equipment
CN113419952A (en) * 2021-06-22 2021-09-21 中国联合网络通信集团有限公司 Cloud service management scene testing device and method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
虎嵩林;梁英;姜伟;李伟;: "一种大规模网络上的服务组合流程搜索方法", 计算机研究与发展, no. 09 *
陈刚;羌铃铃;: "如何实现智能网双平面容灾", 通信技术, no. 04 *
陈敏;李旺;: "计算机网络中的故障定位技术研究", 国外电子测量技术, no. 07 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116582618A (en) * 2023-07-13 2023-08-11 天津金城银行股份有限公司 Method and device for realizing high availability of electric pin, machine room management platform and computer
CN116582618B (en) * 2023-07-13 2023-10-10 天津金城银行股份有限公司 Method and device for realizing high availability of electric pin, machine room management platform and computer

Similar Documents

Publication Publication Date Title
US20180367365A1 (en) State control method and apparatus
CN110083541B (en) Game test method, game test device, computer storage medium and electronic equipment
CN108270726B (en) Application instance deployment method and device
CN110083455B (en) Graph calculation processing method, graph calculation processing device, graph calculation processing medium and electronic equipment
CN106911648B (en) Environment isolation method and equipment
CN105183675A (en) USB equipment access method, device and system, terminal and server
CN110138876B (en) Task deployment method, device, equipment and platform
CN110457176B (en) Monitoring method and device for distributed system, storage medium and electronic equipment
CN108306804A (en) A kind of Ethercat main station controllers and its communication means and system
CN103077034B (en) hybrid virtualization platform JAVA application migration method and system
CN112650742A (en) Cross-link data verification method, device, equipment and storage medium
CN104731566A (en) Testing device, method and system for IDE (Integrated Development Environment)
CN113419920A (en) Real-time monitoring method for joint debugging test process of simulation test system of Internet of things management platform
CN102455951A (en) Fault tolerance method and system of virtual machines
CN113658351B (en) Method and device for producing product, electronic equipment and storage medium
CN114095343A (en) Disaster recovery method, device, equipment and storage medium based on double-active system
CN108600156A (en) A kind of server and safety certifying method
CN112130889A (en) Resource management method and device, storage medium and electronic device
CN103780433B (en) Self-healing type virtual resource configuration management data architecture
CN116248526A (en) Method and device for deploying container platform and electronic equipment
CN104951346A (en) Process management method for embedded system as well as system
CN112506729B (en) Fault simulation method and device
CN103914339A (en) Server management system and server management method
CN103457771A (en) Method and device for HA virtual machine cluster management
CN114090050A (en) Robot software remote automatic updating method, system and related products

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination