WO2023005075A1

WO2023005075A1 - Disaster recovery method and system for data, terminal device and computer storage medium

Info

Publication number: WO2023005075A1
Application number: PCT/CN2021/132314
Authority: WO
Inventors: 周可; 崖飞虎; 范筝; 乔一航; 邸帅; 卢道和
Original assignee: 深圳前海微众银行股份有限公司
Priority date: 2021-07-30
Filing date: 2021-11-23
Publication date: 2023-02-02
Also published as: CN113590386B; CN113590386A

Abstract

The present application relates to the technical field of financial technology (fintech), and discloses a disaster recovery method and system for data, a terminal device and a computer storage medium. The disaster recovery method for data comprises: establishing a communication connection with a disaster recovery database of a preset main cluster by means of a data disaster recovery device; reading, by means of the communication connection, a workflow executed by the preset main cluster; acquiring task parameters of task nodes in the workflow according to a preset relationship chain model, the relationship chain model being obtained by constructing on the basis of the blood relationship between data and a data processing task; and detecting the synchronization state of the task parameters so as to determine, according to the synchronization state, a target node to be re-executed that is in the task nodes, and triggering a disaster recovery mechanism to execute the target node.

Description

Data disaster recovery method, system, terminal equipment and computer storage medium

priority information

This application claims priority to a Chinese patent application with application number 202110874019.9 filed on July 30, 2021, the entire contents of which are incorporated herein by reference.

technical field

The present application relates to the technical field of financial technology (Fintech), and in particular to a data disaster recovery method, system, terminal equipment, and computer storage medium.

Background technique

With the development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually transforming into financial technology. However, due to the security, real-time and stability requirements of the financial industry, more and more technical requirements high demands.

Nowadays, in the scenario of big data remote disaster recovery, the primary and backup clusters run in two different computer rooms, respectively run independent account systems and use independent operation and maintenance management and control systems. At present, the big data remote disaster recovery solution only considers data disaster recovery on the offline side, and the basic components involved are mainly Hadoop (Apache Hadoop, an open source software framework that supports data-intensive distributed applications and is released under the Apache 2.0 license agreement), Hive (Apache Hive, a data warehouse tool based on Hadoop) and Big Data Platform Job Scheduling System (Big Data Platform Job Scheduling System).

The existing disaster recovery strategy for big data clusters is: use cross-computer room data synchronization tools to synchronize the daily changing data of the main cluster to the disaster recovery cluster, so that when the main cluster is unavailable, switch to the disaster recovery cluster. However, in the existing big data cluster disaster recovery solution, after switching to the disaster recovery environment, it is necessary to rerun the entire process of importing, processing, and exporting business data to the business system in the disaster recovery environment to complete The entire disaster recovery switchover process, like this, leads to a long time-consuming disaster recovery switchover, and the disaster recovery switchover cannot be completed quickly and efficiently.

Contents of the invention

The main purpose of this application is to provide a data disaster recovery method, system, terminal equipment, and computer storage medium, aiming to realize rapid and fine-grained disaster recovery switching when the main cluster fails to provide services, thereby improving disaster recovery efficiency.

To achieve the above purpose, the present application provides a data disaster recovery method, the data disaster recovery method is applied to data disaster recovery equipment, the data disaster recovery method includes:

Establish a communication connection with the disaster recovery database of the preset main cluster;

Reading the workflow executed by the preset main cluster through the communication connection;

Obtaining task parameters of each task node in the workflow according to a preset relationship chain model, wherein the relationship chain model is constructed based on blood relationships between data and data processing tasks;

Detecting the synchronization state of the task parameters, so as to determine the target node to be re-executed among the task nodes according to the synchronization state, and trigger a disaster recovery mechanism to execute the target node.

In addition, in order to achieve the above purpose, the present application also provides a data disaster recovery system, the data disaster recovery system includes:

A connection module, configured to establish a communication connection with the disaster recovery database of the preset main cluster;

A workflow reading module, configured to read the workflow executed by the preset main cluster through the communication connection;

An acquisition module, configured to acquire task parameters of each task node in the workflow according to a preset relationship chain model, wherein the relationship chain model is constructed based on blood relationship between data and data processing tasks;

The recovery module is configured to detect the synchronization state of the task parameters, determine the target node to be re-executed among the task nodes according to the synchronization state, and trigger a disaster recovery mechanism to execute the target node.

Wherein, each functional module of the data disaster recovery and recovery system of the present application implements the steps of the above-mentioned data disaster recovery and recovery method during operation.

In addition, in order to achieve the above object, the present application also provides a terminal device, the terminal device includes: a memory, a processor, and a disaster recovery program for data stored in the memory and operable on the processor, When the data disaster recovery program is executed by the processor, the above steps of the data disaster recovery method are implemented.

In addition, in order to achieve the above purpose, the present application also provides a computer storage medium, on which a data disaster recovery program is stored, and when the data disaster recovery program is executed by a processor, the above-mentioned The steps of the data disaster recovery recovery method.

In addition, to achieve the above object, the present application also provides a computer program product, the computer program product includes a computer program, and when the computer program is executed by a processor, the steps of the above-mentioned data disaster recovery method are implemented.

This application provides a data disaster recovery method, system, terminal equipment, computer storage medium and computer program product, through which the data disaster recovery equipment establishes a communication connection with the disaster recovery database of the preset main cluster; through the The communication connection reads the workflow executed by the preset main cluster; obtains the task parameters of each task node in the workflow according to the preset relationship chain model, wherein the relationship chain model is based on the relationship between data and data processing tasks Detecting the synchronization state of the task parameters, so as to determine the target node to be re-executed in each of the task nodes according to the synchronization state, and trigger the disaster recovery mechanism to execute the target node.

In this application, when a disaster occurs in the main cluster and cannot continue to provide services, so that disaster recovery switching is required and the disaster recovery cluster replaces the main cluster to provide services, through the data disaster recovery and recovery equipment under the disaster recovery cluster, a The communication connection between the disaster recovery databases of the cluster, so as to read the workflow that the preset main cluster is executing when a disaster occurs through the communication connection; The relationship chain model obtained by constructing the blood relationship between them can obtain the task parameters of each task node in the workflow; finally, detect the synchronization status of the task parameters of each task node, so as to determine the task nodes to be restarted according to the synchronization status. Execute the target node, and trigger the disaster recovery mechanism to re-execute the target node when the target node is determined.

Compared with the traditional big data cluster disaster recovery scheme, this application uses a complete relationship chain model constructed in advance based on the blood relationship between data and data processing tasks, and combines the synchronization status of the task parameters of the task nodes in the workflow. The disaster recovery operation of switching the disaster recovery cluster in the case of a disaster in the main cluster does not need to rerun all business data tasks when the main cluster is in a disaster in the disaster recovery environment, but only based on the combination of the relationship chain model and synchronization The task nodes to be re-executed after the state is determined are re-run, so that rapid disaster recovery switching and fast recovery of task nodes to be re-executed can be achieved, and the purpose of fast and refined disaster recovery switching is achieved, thereby improving disaster recovery efficiency.

Description of drawings

FIG. 1 is a schematic diagram of the device structure of the terminal device hardware operating environment involved in the solution of the embodiment of the present application;

Fig. 2 is the schematic flow chart of one embodiment of the disaster recovery recovery method of the application data;

Fig. 3 is the blood relationship data acquisition and processing process involved in an embodiment of the disaster recovery method for the data of the present application;

FIG. 4 shows the first blood relationship between the data processing execution task and the data involved in an embodiment of the data disaster recovery method of the present application;

FIG. 5 shows the second blood relationship between the data processing execution task and the data processing task involved in an embodiment of the data disaster recovery method of the present application;

Fig. 6 is the workflow example of the data processing involved in an embodiment of the disaster recovery recovery method of the application data;

FIG. 7 is a processing flow of the second blood relationship involved in an embodiment of the disaster recovery method for data of the present application;

Fig. 8 is the relationship between data processing tasks and task execution IDs involved in an embodiment of the disaster recovery method for data of the present application;

FIG. 9 shows the blood relationship between data and data processing tasks involved in an embodiment of the disaster recovery method for data in this application;

FIG. 10 is a data synchronization process involved in an embodiment of a method for disaster recovery and recovery of data in the present application;

Fig. 11 is the disaster recovery processing flow involved in an embodiment of the disaster recovery method for the application data;

FIG. 12 is a schematic diagram of a disaster recovery scenario involved in an embodiment of a method for disaster recovery and recovery of data in this application;

FIG. 13 is a schematic diagram of functional modules of an embodiment of the data disaster recovery system of the present application.

The realization, functional features and advantages of the present application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

Detailed ways

It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.

Referring to FIG. 1 , FIG. 1 is a schematic diagram of a device structure of a hardware operating environment of a terminal device involved in the solution of an embodiment of the present application.

The terminal device in the embodiment of this application may be a data disaster recovery device configured under a disaster recovery cluster to perform disaster recovery and recovery in case a disaster occurs in the main cluster and cannot continue to provide services. The data disaster recovery recovery device may be a smart phone , PC (Personal Computer, personal computer), tablet computer, portable computer and so on.

As shown in FIG. 1 , the terminal device may include: a processor 1001 , such as a CPU, a communication bus 1002 , a user interface 1003 , a network interface 1004 , and a memory 1005 . Wherein, the communication bus 1002 is used to realize connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. Optionally, the network interface 1004 may include a standard wired interface and a wireless interface (such as a Wi-Fi interface). The memory 1005 can be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .

Those skilled in the art can understand that the structure of the terminal device shown in FIG. 1 does not constitute a limitation on the terminal device, and may include more or less components than those shown in the figure, or combine some components, or arrange different components.

As shown in FIG. 1 , the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a data disaster recovery program.

In the terminal shown in Figure 1, the network interface 1004 is mainly used to connect to the background server and perform data communication with the background server; the user interface 1003 is mainly used to connect to the client and perform data communication with the client; and the processor 1001 can be used for The data disaster recovery program stored in the memory 1005 is invoked, and the operations described in the following embodiments of the data disaster recovery method of this application are performed.

Based on the above hardware structure, various embodiments of the disaster recovery method for the data of the present application are proposed.

It should be noted that in the scenario of big data remote disaster recovery, the active and standby clusters (or called the main cluster and the disaster recovery cluster) run in two different computer rooms respectively, and each runs an independent account system, and uses an independent operation and maintenance management and control system. During cluster delivery, the primary and standby clusters are delivered separately.

The typical data processing flow of the big data platform is as follows:

First, data extraction, through Sqoop (Apache Sqoop is an open source tool, mainly used for data transfer between Hadoop (Hive) and traditional databases (mysql, postgresql, oracle, etc.) The relational database is collected into Hive;

Second, data processing, processing the data in Hive through programming methods such as Hive SQL/Spark SQL/Python/Shell, and finally writing it into another Hive table;

Third, data export, using Sqoop to export the processed Hive data (for example: daily statistics, single-day income calculation, etc.) to the relational database.

The entire process of the above data extraction, data processing and data export is triggered by the task scheduling system of the big data platform for timing scheduling execution.

The big data cluster disaster recovery strategy is to synchronize the daily changing data of the main cluster to the disaster recovery cluster through cross-computer room data synchronization tools, and switch to the disaster recovery cluster when the main cluster is unavailable.

The existing disaster recovery strategy for big data clusters is: use cross-computer room data synchronization tools to synchronize the daily changing data of the main cluster to the disaster recovery cluster, so that when the main cluster is unavailable, switch to the disaster recovery cluster. However, in the existing big data cluster disaster recovery solution, after switching to the disaster recovery environment, it is necessary to rerun the entire process of importing, processing, and exporting business data to the business system, which takes a long time and cannot Quickly and efficiently complete disaster recovery switchover.

In view of the above phenomenon, the present application provides a data disaster recovery method. Please refer to FIG. 2. FIG. 2 is a schematic flow chart of the first embodiment of the data disaster recovery and recovery method of the present application. When a disaster occurs in the cluster and the service cannot continue to be provided, the data disaster recovery and recovery equipment for disaster recovery (for the convenience of explanation, the following are all expressed as disaster recovery recovery equipment), and the disaster recovery recovery method for data in this application includes:

Step S10, establishing a communication connection with the disaster recovery database of the preset main cluster;

During the process of disaster recovery and recovery, the disaster recovery device first establishes a communication connection with the disaster recovery database of the preset primary cluster where a disaster occurs.

It should be noted that, in this embodiment, the preset main cluster is the cluster where the big data platform that is performing a series of processing procedures of data extraction, data processing, and data export is located in the scenario of big data remote disaster recovery. The disaster recovery database of the preset main cluster is an off-site backup of the database of the scheduling system of the preset main cluster.

Further, in a feasible embodiment, the above step S10 may include:

Step S101, when the service of the preset master cluster is unavailable, establish a communication connection with the disaster recovery database of the preset master cluster.

It should be noted that, in this embodiment, the disaster recovery recovery process of the disaster recovery device occurs when a disaster occurs in the preset primary cluster and cannot continue to provide services for data processing.

When a disaster occurs in the default primary cluster currently undergoing data processing, the disaster recovery device will immediately establish a connection with the default primary cluster’s disaster recovery database Communication connection between.

Specifically, for example, it is assumed that the preset main cluster currently performing data processing is in IDC1, and the standby disaster recovery cluster is in IDC2 in a different place. In this way, when a disaster occurs in the preset main cluster in IDC1, which makes the services provided by the preset main cluster unavailable (that is, the data processing process cannot be completed), or causes the preset main cluster to be unable to continue to provide services at all, in The disaster recovery device in the disaster recovery cluster of the IDC2 in a different place starts to establish a communication connection with the disaster recovery database of the preset main cluster.

Step S20, reading the workflow executed by the preset main cluster through the communication connection;

After the disaster recovery device establishes a communication connection with the disaster recovery database, based on the communication connection, it immediately reads the preset workflow that the main cluster is executing when a disaster occurs.

Specifically, for example, please refer to the disaster recovery recovery processing flow shown in FIG. When a disaster occurs in the preset main cluster, the workflow that is being executed by the main scheduling system (Scheduler) of the main cluster is queried from the disaster recovery database through the communication connection, and the result is queried in the form of a list.

It should be noted that, in this embodiment, the preset main cluster may be executing one or more workflows when a disaster occurs, or the preset main cluster may not provide services to execute data processing procedures when a disaster occurs , so there is no ongoing workflow. Therefore, when the disaster recovery device inquires about workflows whose status is being executed and returns the query result through a list, the number of workflows whose status is being executed in the list may be 0 or N, where N is greater than or equal to 1.

Step S30, obtaining task parameters of each task node in the workflow according to a preset relationship chain model, wherein the relationship chain model is constructed based on blood relationship between data and data processing tasks;

In a practicable manner, after the disaster recovery device reads the workflow that is being executed by the preset primary cluster when the disaster occurs, it constructs the data based on the blood relationship between the data and the data processing tasks. The relationship chain model of the workflow further obtains the respective task parameters of each task node in the workflow.

In another practicable manner, the disaster recovery device may also preset a relationship chain model based on blood relations between data and data processing tasks before a disaster occurs in the main cluster. In this way, when a disaster occurs in the preset main cluster, the disaster recovery device can directly extract the relationship chain model to obtain the respective task parameters of each task node in the workflow that the preset main cluster is executing when the disaster occurs.

It should be noted that, in this embodiment, the blood relationship between data and data processing tasks is shown in the relationship graph shown in FIG. Task nodes that need to be re-executed.

Specifically, for example, it is assumed that the disaster recovery device preliminarily constructs a task node for determining the task node that needs to be re-executed during the process of the disaster recovery switching operation based on the blood relationship between the data and the data processing task as shown in Figure 9. relationship chain model. Then, the disaster recovery device uses the relationship chain model to determine all the task nodes of each workflow and all Task parameters for each task node.

Further, in a feasible embodiment, the task parameters include the input data and output data of the task node, the above step S30 may include:

Step S301, determining each task node of the workflow;

After the disaster recovery device acquires the workflows being executed by the preset primary cluster where the disaster occurred when the disaster occurred, it determines all the task nodes for the one or more workflows.

Specifically, for example, the disaster recovery device queries the disaster recovery database of the preset primary cluster where a disaster occurs through a list to obtain only If one is one, the disaster recovery device further determines and obtains all task nodes in the one workflow in a mature breadth-first manner.

Step S302, constructing a query statement according to each of the task nodes, and indexing the respective input data and output data of each of the task nodes from the relationship chain model according to the query statement.

The disaster recovery device constructs corresponding query statements based on the determined task nodes, so that based on the query statements, the relationship chain model constructed based on the blood relationship between data and data processing tasks is indexed to query each The respective input data and output data of task nodes.

It should be noted that, in this embodiment, the relationship chain model constructed based on the blood relationship between data and data processing tasks is the blood relationship map between data and data processing tasks as shown in Figure 9, and The relationship chain model can be stored in the graph database configured under the disaster recovery cluster. Therefore, the query statement constructed by the disaster recovery device using the task node can be a graph data query statement.

Specifically, for example, please refer to the disaster recovery process flow shown in Figure 11. The disaster recovery device—FindGap, in the form of a list, obtains the preset master cluster from the disaster recovery database of the preset master cluster where a disaster occurred. When a disaster occurs in the cluster, the main scheduling system of the main cluster——Scheduler is executing a workflow, and after obtaining all the task nodes of the workflow in a breadth-first manner, it will further call the preset for the relationship graph The query template for index query relational data in the data, such as SQL (Structured Query Language, Structured Query Language) statement, and each task node is used as the input condition in the SQL statement in turn, so as to construct the query and the task Nodes have graph data query statements related to each other, and then the disaster recovery device immediately executes the SQL statement to display the data processing tasks and blood relationship of the data from the relationship chain model stored in the graph database—Graph DB (diagram application and Data lineage), analyze and obtain the direct input data and output data of each task node.

Step S40 , detecting the synchronization state of the task parameters, so as to determine the target node to be re-executed among the task nodes according to the synchronization state, and trigger the disaster recovery mechanism to execute the target node.

After the disaster recovery device obtains the task parameters of each task node in the workflow that is being executed by the default master cluster when a disaster occurs, it further detects the synchronization status of the task parameters of each task node, so that, according to the detected The synchronization state of each task node determines the target node to be re-executed that needs to be re-executed to complete the disaster recovery switchover, and finally triggers the preset disaster recovery mechanism to re-execute the target node.

Specifically, for example, please refer to the disaster recovery recovery processing flow shown in Figure 11, the disaster recovery recovery device—FindGap in the relationship chain model stored in the graph database—Graph DB shows the blood relationship between data processing tasks and data (Figure 11 display application and data lineage), analyze and obtain the direct input data and output data of each task node, and then, the disaster recovery device—FindGap will further call the synchronization for data synchronization that is also configured under the disaster recovery cluster The device—the database of Transport, checks whether the synchronization status of the input data and output data of each task node is completed, so as to obtain the synchronization status of the input data and output data of all task nodes. Finally, the disaster recovery device—FindGap once again traverses the DAG (Directed Acyclic Graph) graph of the workflow to which each task node belongs in a mature breadth-first manner (as shown in Figure 6 Workflow example), so that based on the synchronization status of the input data and output data of the task node, determine the target node that needs to be re-executed among all the task nodes of the workflow, and then trigger the preset disaster recovery mechanism to make The scheduling system under the disaster recovery cluster—Scheduler (Backup) schedules the target node for re-execution, and feeds back the status result to the disaster recovery device for synchronization after the re-execution. The disaster recovery device determines the disaster recovery switch based on the status result Finish.

Further, in another feasible embodiment, when the disaster recovery device determines that there is no target node to be re-executed among all the task nodes in the workflow based on the synchronization status, it does not need to trigger the disaster recovery mechanism to perform recovery. Disaster recovery can complete the switchover.

Further, please refer to the disaster recovery scenario shown in Figure 12, assuming that the disaster recovery device obtained through the relationship chain model, in the workflow that the main cluster is executing when a disaster occurs, each task node: Job1, Job2 and Job3, the respective input data and output data are Table1, Table2, Table3, Table4, Table5 and Table6, and the disaster recovery device further detects that the synchronization status of the Table1, Table2, Table3 and Table4 has been completed disaster recovery synchronization, and The verification is consistent. At this time, the disaster recovery device determines that the only target nodes that need to be re-executed in the disaster recovery cluster in the workflow are Job2 and Job3. Therefore, the disaster recovery device triggers the disaster recovery mechanism to make the The scheduling system only schedules the Job2 and Job3 for re-execution to speed up the disaster recovery speed.

In this embodiment, during the process of disaster recovery and recovery, the disaster recovery device first establishes a communication connection with the disaster recovery database of the preset primary cluster where the disaster occurs; Immediately after the communication connection between the standby databases, based on the communication connection, read the workflow that is being executed by the preset primary cluster when a disaster occurs; After the workflow is being executed, the relationship chain model constructed based on the relationship between data and data processing tasks is used to further obtain the task parameters of each task node in the workflow; Assume that when a disaster occurs in the main cluster, among the workflows being executed, after the respective task parameters of each task node, the synchronization status of each task parameter of each task node is further detected, so that each task is determined according to the detected synchronization status Among the nodes, the target node to be re-executed needs to be re-executed to complete the disaster recovery switchover. Finally, the preset disaster recovery mechanism is triggered to re-execute the target node.

Compared with the traditional big data cluster disaster recovery scheme, this application uses a complete relationship chain model constructed in advance based on the blood relationship between data and data processing tasks, and combines the synchronization status of the task parameters of the task nodes in the workflow. The disaster recovery operation of switching the disaster recovery cluster in the event of a disaster in the main cluster can realize rapid disaster recovery switching and fast recovery of task nodes to be re-executed, achieving the purpose of fast and refined disaster recovery switching, thereby improving disaster recovery efficiency.

Furthermore, based on the above first embodiment of the method for disaster recovery and recovery of data in this application, a second embodiment of the method for disaster recovery and recovery of data in this application is proposed. The main difference between this embodiment and the above first embodiment is that: In this embodiment, before step S10 above, the step of establishing a communication connection with the disaster recovery database of the preset main cluster, the data disaster recovery method of this application may further include:

Step S50, constructing a relationship chain model based on blood relationship between data and data processing tasks.

Before the disaster recovery device establishes a communication connection with the disaster recovery database of the preset main cluster where a disaster occurs, it starts to build a relationship chain model based on the data processing tasks being executed by the preset main cluster and the blood relationship between the data.

Further, in a feasible embodiment, step S50 may include:

Step S501, acquiring kinship data from the preset main cluster to establish a first kinship relationship between the data processing task and the data;

In the process of building a relationship chain model based on the data processing tasks being executed by the preset master cluster and the blood relationship between the data, the disaster recovery device first obtains blood relationship data from the preset master cluster that is executing data processing tasks to establish a The scheduling system of the preset master cluster executes the first blood relationship between data processing tasks and data.

Specifically, for example, please refer to the bloodline data acquisition and processing process shown in Figure 3. The disaster recovery device obtains the hook (a type provided in Windows to replace the DOS (Disk Operating System , the system mechanism of "interruption" under the disk operating system), which is translated as "hook" or "hook" in Chinese) parses blood relationship data from data components (such as relational databases, etc.) and obtains the blood relationship data, and writes the blood relationship data into the file system of the data integration tool. Then, the scheduling system of the preset main cluster triggers the data blood relationship data integration task regularly, so that the data integration tool reads the blood relationship log from the file system to obtain the blood relationship data, and further writes the blood relationship data to the big data platform hive or spark . Then, the scheduling system based on the preset main cluster triggers the data processing task regularly, so that the big data platform hive or spark can process and integrate the written blood relationship data to form the first link between the data processing task and the data as shown in Figure 4. blood relationship, and write the first blood relationship into the graph database system as blood relationship graph data. Finally, the graph database system actively reports the writing status of bloodline graph data to the scheduling system of the preset main cluster, so that the scheduling system can confirm that the first bloodline data is constructed.

It should be noted that, in this embodiment, the lineage acquisition hook under the default main cluster implements the corresponding Lineage Hook—lineage data Hook mechanism for different data systems and data transmission tools. Every time a data system executes a SQL statement, these Hooks The mechanism captures the original blood relationship data and encapsulates it into a blood relationship log and writes it to the log system of the data integration tool. Specifically, for example, Hive Lineage Hook is used for Hive data system and Spark data system respectively (by asynchronously capturing Hive execution SQL statements, calling the self-implemented Hive execution behavior analysis API to obtain SQL input data information, output data information, and associated task information), Spark-SQL Lineage Hook (obtain the SQL statement executed by Spark-SQL asynchronously, and call the self-implemented Spark SQL execution behavior analysis application program interface to obtain the input data information, output data information, and associated tasks of SQL Information), Sqoop Lineage Hook (by asynchronously capturing Sqoop execution commands, analyzing the parameters of Sqoop execution commands, and obtaining the input data and output data related information of the execution commands, as well as the associated task information), to capture lineage data. Among them, the Lineage Hook corresponding to Hive and Spark-SQL is used to obtain the blood relationship between the internal data tables of the big data platform, and the Sqoop Lineage Hook is used to capture the blood relationship between tables on the big data platform and traditional relational data.

Step S502, analyzing the object numbered musical notation file to establish a second blood relationship between the data processing execution task and the data processing task;

It should be noted that, in this embodiment, as shown in the workflow example of data processing shown in FIG. Formal organization dependencies are stored in the database in the form of JSON (object numbered musical notation).

After the disaster recovery device establishes the first blood relationship between the data processing execution task executed by the scheduling system of the preset main cluster and the data, it further establishes the relationship between the data processing execution task and the data processing task by parsing the object numbered musical notation file. second blood relationship.

Specifically, for example, the disaster recovery device reads the JSON file of the workflow of each data processing task in the scheduling system under the preset main cluster through the preset task lineage analysis program, and then parses the JSON file to obtain the result shown in Figure 5 The data processing execution task shown—the second blood relationship between the Executed Job and the data processing task.

It should be noted that in this embodiment, please refer to the relationship between the data processing task and the task execution ID shown in Figure 8, the process of the task lineage analysis program reading the JSON file of the workflow and parsing the JSON file In the database, each execution record of each data processing task will be recorded in the database, and each execution of each data processing task is associated with an Executed Job ID. In this way, the data processing task and the data processing execution task——Executed Job ID second blood relationship.

Further, in this embodiment, please refer to the process shown in Figure 7. When the disaster recovery device parses the JSON file of the workflow in the scheduling system through the task lineage analysis program, it first passes through the big data task scheduling system. Read the task relationship JSON file and task execution record from the data integration tool, and the data integration tool directly writes the blood relationship data obtained by reading the JSON file and task execution record and parsing to the big data platform hive or spark. Then, the scheduling system of the preset main cluster triggers the data processing task at regular intervals, so that the big data platform hive or spark can process and integrate the written blood relationship data to form a data processing execution task as shown in Figure 5—Executed Job and Data Processing The second blood relationship between tasks is written into the graph database system as blood relationship graph data. Finally, the graph database system actively reports the writing status of bloodline graph data to the scheduling system of the preset main cluster, so that the scheduling system can confirm the completion of the second bloodline data construction.

Step S503, fusing the first blood relationship and the second blood relationship to determine the blood relationship between the data and the data processing task to construct a relationship chain model.

After the disaster recovery device establishes the first blood relationship between the data processing execution task and the data, and the second blood relationship between the data processing execution task and the data processing task, the first blood relationship and the second blood relationship The blood relationship between the two is fused to determine the blood relationship between the data processing task and the data, so as to construct a relationship chain model.

Specifically, for example, the disaster recovery device respectively constructs the first blood relationship between the data processing execution task and the data as shown in Figure 4, and the first blood relationship between the data processing execution task and the data processing task as shown in Figure 5 The second blood relationship, after that, the disaster recovery device triggers the data fusion processing task regularly through the scheduling system of the preset main cluster, and analyzes the respective relationship maps of the first blood relationship and the second blood relationship to determine the first blood relationship Between the relationship and the second blood relationship, based on the corresponding relationship between the data processing execution task (Executed Job) and the data processing task (Job), the data processing execution task (Executed Job) in the first blood relationship is replaced by The data processing task (Executed Job) corresponds to the data processing task Job, so that the respective relationship graphs of the first blood relationship and the second blood relationship are fused to form the blood relationship between the data and the data processing task as shown in Figure 9 Then, the disaster recovery device builds a relationship chain model in the form of graph data based on the blood relationship between the data and the data processing task, and stores the relationship chain model in the graph database for subsequent calls.

It should be noted that, in this embodiment, the disaster recovery device can determine the input data and output data of each task node in the workflow based on the relationship chain model. As shown in FIG. 9, the input data of the task node—Job1 is Table1 and Table2, the output data is Table4. In this way, the disaster recovery device can determine the scheduling system in the disaster recovery cluster to determine which task node the workflow starts to rerun when performing disaster recovery switching based on the relationship chain model.

In this embodiment, in the process of constructing the relationship chain model based on the blood relationship between the data processing tasks being executed by the preset master cluster and the data through the disaster recovery device, the preset master cluster that is executing the data processing tasks first Among them, the blood relationship data is obtained to establish the first blood relationship between the data processing task executed by the scheduling system of the preset main cluster and the data; the disaster recovery device establishes the data processing execution task performed by the scheduling system of the preset main cluster and After the first blood relationship between the data, the second blood relationship between the data processing execution task and the data processing task is further established by parsing the object numbered musical notation file; the disaster recovery device establishes the data processing execution task and the data processing task respectively After the first blood relationship between the data processing execution task and the second blood relationship between the data processing task, the fusion process is performed on the first blood relationship and the second blood relationship to determine the data processing task and the data processing task. The blood relationship between the data, thus constructing the relationship chain model.

In this way, the disaster recovery device can use the tether model constructed in advance based on the blood relationship between data and data processing tasks in the process of disaster recovery switching, combined with the synchronization status of the task parameters of the task nodes in the workflow, To carry out the disaster recovery operation of switching the disaster recovery cluster in the event of a disaster in the main cluster, it can realize the fast disaster recovery switch and the fast recovery of the task nodes to be re-executed, and achieve the purpose of fast and refined disaster recovery switch, so that Improved disaster recovery efficiency.

Further, based on the above first embodiment of the method for disaster recovery and recovery of data in this application, a third embodiment of the method for disaster recovery and recovery of data in this application is proposed. The main difference between this embodiment and the above first embodiment is that: In this embodiment, before step S10 above, the step of establishing a communication connection with the disaster recovery database of the preset main cluster, the data disaster recovery method of this application may further include:

Step S60, executing a preset data synchronization task to make the database of the preset primary cluster synchronize data to the disaster recovery database.

Before the disaster recovery device establishes a communication connection with the disaster recovery database of the preset primary cluster where a disaster occurs, it first performs a data synchronization task so that the database of the preset primary cluster provides services to perform data processing tasks During the process, the data is synchronized to the disaster recovery database in the disaster recovery cluster for subsequent rapid disaster recovery switchover.

It should be noted that, in this embodiment, the data synchronization task is a task generated based on staff configuration to perform data synchronization management. It should be understood that, based on different design requirements of actual applications, in different feasible implementations, the configuration generation method and specific content of the data synchronization task may be different, and the data disaster recovery method of this application does not aim at the data synchronization The specific content of the task is limited.

Further, in a feasible embodiment, step S60 may include:

Step S601, receiving the data synchronization task, and reading the metadata to be synchronized pointed to by the data synchronization task from the database of the preset master cluster;

In the process of synchronizing data from the database of the preset main cluster to the disaster recovery database, the disaster recovery device first receives the data synchronization task generated based on the staff configuration, and then analyzes the data synchronization task to determine the data that needs to be transferred from the preset main cluster. The metadata to be synchronized is read from the database and synchronized to the disaster recovery database.

Step S602, executing the data synchronization task to pull the metadata to be synchronized into the disaster recovery database for storage;

After the disaster recovery device determines that the metadata to be synchronized needs to be read from the database of the preset primary cluster and synchronized to the disaster recovery database, it executes the data synchronization task to pull the metadata to be synchronized to the disaster recovery Store in the corresponding storage path in the database.

Further, in a feasible embodiment, step S602 may include:

Step S6021, obtaining the first storage path of the metadata to be synchronized in the database of the preset main cluster;

When the disaster recovery device synchronizes the metadata to be synchronized in the database of the preset primary cluster to the disaster recovery database, when the disaster recovery device determines the metadata to be synchronized by parsing the data synchronization task, it will synchronize A first storage path of the metadata to be synchronized in the database of the preset master cluster is obtained.

Specifically, for example, the disaster recovery device can specifically detect the resource manager YARN (Yet Another Resource Negotiator, another resource coordinator, also known as Apache Hadoop YARN, in the database of the preset master cluster, which is a new Hadoop resource manager), for the data size, storage time, update time, storage path and other data information managed by the data to be synchronized (specifically, it can be the data of a Hive table), to obtain the metadata to be synchronized in The preset first storage path in the database of the primary cluster.

Step S6022, determining a second storage path corresponding to the first storage path in the disaster recovery database;

After the disaster recovery device obtains the first storage path of the metadata to be synchronized in the database of the preset primary cluster, it can determine in the disaster recovery database based on the first storage path a path corresponding to the first storage path The second storage path.

Specifically, for example, the disaster recovery device can be based on the pre-built association relationship between the database of the preset main cluster and the disaster recovery database for synchronizing data. For example, the association relationship can be specifically established for the association relationship in advance. A relational table, directly using the first storage path in the relational table, detects and determines in the disaster recovery database, corresponding to the first storage path to store metadata to be synchronized under the first storage path Second storage path.

In another feasible embodiment, after the disaster recovery device acquires the first storage path of the metadata to be synchronized in the database of the preset primary cluster, it can also, in real time, based on the current free storage space of the disaster recovery database, Immediately generate a storage path and establish a corresponding relationship between the storage path and the first storage path, and then determine the storage path as the metadata to be synchronized under the first storage path in the disaster recovery database Second storage path.

Step S6023, storing the metadata to be synchronized in the disaster recovery database according to the second storage path.

After the disaster recovery device determines the second storage path corresponding to the first storage path in the disaster recovery database, the disaster recovery device can store the database of the preset main cluster in the second storage path according to the second storage path. The data to be stored under the first storage path is fetched and stored under the second storage path in the disaster recovery database.

Specifically, for example, after the disaster recovery device determines the second storage path corresponding to the first storage path in the disaster recovery database, it can first pull out the database that needs to be synchronized from the database of the preset primary cluster according to the first storage path. The metadata to be synchronized, and then input the metadata to be synchronized into the resource manager YARN of the disaster recovery database, so that the resource manager YARN can store the metadata to be synchronized according to the second storage path.

Step S603, monitoring the execution status of the data synchronization task and performing consistency verification on the data stored in the database of the preset primary cluster and the disaster recovery database respectively.

The disaster recovery device continuously monitors the execution status of the data synchronization task to further verify the consistency of the data stored in the preset main cluster database and the disaster recovery database when each data synchronization task is completed. This ensures that the metadata to be synchronized in the database of the preset primary cluster is completely synchronized to the disaster recovery database.

Specifically, for example, referring to the data synchronization process shown in FIG. 10 , the disaster recovery device generates a data synchronization task by receiving the configuration data of the staff. Then, based on the scheduling system—Transports, the disaster recovery device schedules the data synchronization task for execution to read metadata from the database—MySQL, which stores metadata in the preset main cluster (IDC1) through multithreading, and read the metadata to The metadata is written into the disaster recovery database of the disaster recovery cluster (IDC2) storing metadata as the metadata to be synchronized pointed to by the data synchronization task. Then, the disaster recovery device further submits the disaster recovery data synchronization task - Distcp job to YARN on the disaster recovery cluster (IDC2) based on the scheduling system - Transports, and executes the Distcp job based on the scheduling, so as to realize the preset master The data of the Hive table that needs to be synchronized in the cluster (IDC1) is pulled to the storage path (HDFS directory) corresponding to the Hive table in the disaster recovery cluster (IDC2). Finally, the disaster recovery device also monitors the execution status of the scheduled data synchronization task based on the scheduling system—Transports polling, and collects the database src-HDFS of the preset primary cluster (IDC1) and the disaster recovery database dest- in the disaster recovery cluster. Synchronized statistical data stored on HDFS, so as to verify whether the data on both sides of the preset primary cluster (IDC1) and disaster recovery cluster (IDC2) are consistent.

Further, in another feasible embodiment, before the disaster recovery device executes the data synchronization task to make the database of the preset primary cluster synchronize data to the disaster recovery database, it also predefines the data disaster recovery synchronization Therefore, when the disaster recovery and recovery device executes the data synchronization task based on the scheduling system to perform the data synchronization process between the preset primary cluster and the disaster recovery cluster, it executes the data synchronization task according to the data disaster recovery synchronization rule.

It should be noted that, in this embodiment, the disaster recovery device defines the clusters, databases and data tables corresponding to the preset primary cluster and disaster recovery cluster that need data synchronization, and the time and strategy for data synchronization To form rules for data disaster recovery synchronization. Specifically, for example, the data disaster recovery synchronization rules defined by the disaster recovery device are shown in the following table:

In the above table, the source cluster and the target cluster are respectively the preset primary cluster and disaster recovery cluster described in this embodiment.

Further, in a feasible embodiment, in step S40 in the first embodiment above, the synchronization status includes: synchronization completed and unsynchronized, "according to the synchronization status, determine the tasks to be re-executed in each of the task nodes target node" may include:

Step S401, if the synchronization state of the task parameters of the first task node among the task nodes is not synchronized, then determine that the first task node is the target node to be re-executed;

After the disaster recovery device obtains the task parameters of each task node in the workflow that is being executed by the default master cluster when a disaster occurs, it further detects whether the synchronization status of the task parameters of each task node is completed or not. Synchronization, thus, when it is detected that among the various task nodes, the synchronization state of the task parameters of the first task node is not synchronized, the first task node is directly determined as the one that needs to be re-run under the disaster recovery cluster to be re-executed target node.

Step S402, if the parent node of the first task node is a node to be re-executed, then determine that the first task node is the target node;

The disaster recovery device detects whether the synchronization status of each task parameter of each task node is synchronized or not, to determine each task node, and also detects whether the parent node of each task node is determined to be in the disaster recovery cluster Re-run the target node to be re-executed. In this way, when it is detected that the parent node of the first task node is the target node to be re-executed, the first task node is directly determined to be re-run in the disaster recovery cluster The target node to be re-executed.

Specifically, for example, the disaster recovery device traverses the acquired preset primary cluster in a depth-first manner. When a disaster occurs, each workflow that is being executed by the scheduling system of the preset primary cluster, if the disaster recovery device traverses to The state of all parent nodes of the first task node in the current workflow has been marked as no need to rerun (disable execute), and the task parameter of the first task node—the synchronization status of input data and output data is also synchronized, Then the disaster recovery device determines that the first task node is not a target node that needs to be re-run to reschedule execution, and marks the first task node as a state that does not need to be re-run (disable execute). However, if the disaster recovery device traverses to the parent node of the first task node of the current workflow, the state of a certain parent node is marked as re-execute (enable execute), or the disaster recovery device traverses the The task parameters of the first task node——the synchronization state of input data and output data is not synchronized, then the disaster recovery device will directly determine that the first task node is the target node that needs to be re-run to reschedule execution, and send the The first task node is marked to be re-executed (enable execute).

The embodiment of the present application provides a method for data disaster recovery and recovery. Before establishing a communication connection with the disaster recovery database of the preset primary cluster where a disaster occurs, the disaster recovery recovery device first executes a data synchronization task so that the default primary cluster In the process of providing services and performing data processing tasks in the preset main cluster, the data is synchronized to the disaster recovery database in the disaster recovery cluster for subsequent rapid disaster recovery switching. In the process of synchronizing data from the database of the preset main cluster to the disaster recovery database, the disaster recovery device first receives the data synchronization task generated based on the staff configuration, and then analyzes the data synchronization task to determine the data that needs to be retrieved from the preset main cluster. The metadata to be synchronized is read from the database of the database and synchronized to the disaster recovery database. After the disaster recovery device determines that the metadata to be synchronized needs to be read from the database of the preset primary cluster and synchronized to the disaster recovery database, That is, the data synchronization task is executed to pull the metadata to be synchronized to a corresponding storage path in the disaster recovery database for storage. Then, the disaster recovery device continuously monitors the execution status of the data synchronization task, so that when each data synchronization task is completed, the data stored in the preset main cluster database and the disaster recovery database are further consistent. Verification, so as to ensure that the metadata to be synchronized in the database of the preset main cluster is completely synchronized to the disaster recovery database.

In this way, the disaster recovery device can pass the tether model constructed in advance based on the blood relationship between data and data processing tasks in the process of disaster recovery switching, combined with the synchronization of the task parameters of the task nodes in the workflow In the case of a disaster in the main cluster, the disaster recovery operation of switching the disaster recovery cluster can be realized, and the task nodes to be re-executed can be quickly implemented for disaster recovery switchover and fast recovery, and the purpose of fast and fine-grained disaster recovery switchover can be achieved. , thereby improving disaster recovery efficiency.

Further, the present application also provides a data disaster recovery system. Please refer to FIG. 13 . FIG. 13 is a schematic diagram of functional modules of an embodiment of the data disaster recovery system of the present application. As shown in Figure 13, the disaster recovery system for the application data includes:

The connection module 10 is used to establish a communication connection with the disaster recovery database of the preset main cluster;

A workflow reading module 20, configured to read the workflow executed by the preset main cluster through the communication connection;

An acquisition module 30, configured to acquire task parameters of each task node in the workflow according to a preset relationship chain model, wherein the relationship chain model is constructed based on blood relationships between data and data processing tasks;

The recovery module 40 is configured to detect the synchronization state of the task parameters, determine the target node to be re-executed among the task nodes according to the synchronization state, and trigger a disaster recovery mechanism to execute the target node.

Further, the disaster recovery system for data in this application also includes:

The relationship chain building module is used to build a relationship chain model based on the blood relationship between data and data processing tasks.

Further, the relationship chain building blocks include:

The first construction unit is configured to acquire blood relationship data from the preset main cluster and establish a first blood relationship between the data processing execution task and the data;

The second construction unit is used to analyze the object numbered musical notation file and establish the second blood relationship between the data processing execution task and the data processing task;

A third construction unit, configured to fuse the first blood relationship and the second blood relationship to determine the blood relationship between the data and the data processing task to construct a relationship chain model.

Further, the task parameters include input data and output data of the task node, and the acquisition module 30 includes:

a determining unit, configured to determine each of the task nodes of the workflow;

The acquisition unit is configured to respectively construct query statements according to each of the task nodes and index the respective input data and output data of each of the task nodes from the relationship chain model.

The data synchronization module is configured to execute a preset data synchronization task to make the database of the preset main cluster synchronize data to the disaster recovery database.

Further, the data synchronization module includes:

A receiving unit, configured to receive the data synchronization task, and read the metadata to be synchronized pointed to by the data synchronization task from the database of the preset master cluster;

A task execution unit, configured to execute the data synchronization task to pull the metadata to be synchronized into the disaster recovery database for storage;

The verification unit is configured to monitor the execution status of the data synchronization task and perform consistency verification on the data stored in the preset primary cluster database and the disaster backup database respectively.

Further, the task execution unit includes:

a path obtaining subunit, configured to obtain a first storage path of the metadata to be synchronized in the database of the preset primary cluster; and determine a second storage path corresponding to the first storage path in the disaster recovery database Storage path;

The data storage subunit is configured to store the metadata to be synchronized in the disaster recovery database according to the second storage path.

Further, the synchronization status includes: synchronization completed and unsynchronized, and the recovery module 40 includes:

The first re-running node determining unit is configured to determine that the first task node is a target node to be re-executed if the synchronization state of the task parameters of the first task node in each of the task nodes is not synchronized;

The second rerun node determining unit determines the first task node as the target node if the parent node of the first task node is a node to be re-executed.

Wherein, the function implementation of each module of the task scheduling node in the above-mentioned data disaster recovery system corresponds to each step in the above-mentioned data disaster recovery method embodiment, and its functions and implementation processes will not be repeated here.

The present application also provides a computer storage medium, on which a data disaster recovery program is stored, and when the data disaster recovery program is executed by a processor, the data recovery as described in any one of the above embodiments is realized. The steps of the disaster recovery method.

The specific embodiments of the computer storage medium of the present application are basically the same as the embodiments of the above-mentioned data disaster recovery method, and will not be repeated here.

The present application also provides a computer program product, where the computer program product includes a computer program, and when the computer program is executed by a processor, the steps of the method for disaster recovery and recovery of data as described in any one of the above embodiments are implemented.

It should be noted that, as used herein, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or system comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or system. Without further limitations, an element defined by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article or system comprising that element.

The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.

Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium as described above (such as ROM/RAM , magnetic disk, optical disk), including several instructions to enable a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods described in various embodiments of the present application.

The above are only preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. All equivalent structures or equivalent process transformations made by using the description of the application and the accompanying drawings are directly or indirectly used in other related technical fields. , are all included in the patent protection scope of the present application in the same way.

Claims

A data disaster recovery method, wherein the data disaster recovery method is applied to a data disaster recovery device, and the data disaster recovery method includes:

Establish a communication connection with the disaster recovery database of the preset main cluster;

Reading the workflow executed by the preset main cluster through the communication connection;

Obtaining task parameters of each task node in the workflow according to a preset relationship chain model, wherein the relationship chain model is constructed based on blood relationships between data and data processing tasks;

Detecting the synchronization state of the task parameters, so as to determine the target node to be re-executed among the task nodes according to the synchronization state, and trigger a disaster recovery mechanism to execute the target node.
The data disaster recovery method according to claim 1, wherein said method further comprises:

Build a relationship chain model based on the blood relationship between data and data processing tasks;

The step of building a relationship chain model based on the blood relationship between data and data processing tasks includes:

Obtain blood relationship data from the preset main cluster to establish a first blood relationship between the data processing execution task and the data;

Analyzing the object numbered musical notation file to establish the second blood relationship between the data processing execution task and the data processing task;

A blood relationship between the data and the data processing task is determined by fusing the first blood relationship and the second blood relationship to construct a relationship chain model.
The data disaster recovery method according to claim 1, wherein the task parameters include the input data and output data of the task node, and the task nodes in the workflow are obtained according to the preset relationship chain model The steps of the task parameters include:

determining each of the task nodes of the workflow;

A query statement is respectively constructed according to each of the task nodes, and the respective input data and output data of each of the task nodes are indexed from the relationship chain model according to the query statement.
The data disaster recovery method according to claim 1, wherein, before the step of establishing a communication connection with the disaster recovery database of the preset main cluster, further comprising:

Executing a preset data synchronization task to make the database of the preset primary cluster synchronize data to the disaster recovery database.
The data disaster recovery method according to claim 4, wherein the step of executing a preset data synchronization task to make the database of the preset main cluster synchronize data to the disaster recovery database includes:

receiving the data synchronization task, and reading the metadata to be synchronized pointed to by the data synchronization task from the database of the preset master cluster;

Executing the data synchronization task, pulling the metadata to be synchronized into the disaster recovery database for storage;

The execution status of the data synchronization task is monitored, and the consistency verification is performed on the data stored in the database of the preset main cluster and the data stored in the disaster recovery database.
The data disaster recovery method according to claim 5, wherein the step of pulling the metadata to be synchronized into the disaster recovery database for storage includes:

Obtain a first storage path of the metadata to be synchronized in the database of the preset master cluster;

determining a second storage path corresponding to the first storage path in the disaster recovery database;

The metadata to be synchronized is stored in the disaster recovery database according to the second storage path.
The data disaster recovery method according to any one of claims 1-6, wherein the synchronization state includes: synchronization completed and unsynchronized, and the determination of each task node to be re-executed according to the synchronization state The steps of the target node include:

If the synchronization state of the task parameters of the first task node among the task nodes is unsynchronized, then determine that the first task node is the target node to be re-executed; and/or,

If the parent node of the first task node is a node to be re-executed, then determine that the first task node is the target node.
A data disaster recovery system, wherein the data disaster recovery system includes:

A connection module, configured to establish a communication connection with the disaster recovery database of the preset main cluster;

A workflow reading module, configured to read the workflow executed by the preset main cluster through the communication connection;

An acquisition module, configured to acquire task parameters of each task node in the workflow according to a preset relationship chain model, wherein the relationship chain model is constructed based on blood relationship between data and data processing tasks;

The recovery module is configured to detect the synchronization state of the task parameters, determine the target node to be re-executed among the task nodes according to the synchronization state, and trigger a disaster recovery mechanism to execute the target node.
A terminal device, wherein the terminal device includes: a memory, a processor, and a data disaster recovery program stored on the memory and operable on the processor, and the data disaster recovery program is The processor implements the steps of the data disaster recovery method according to any one of claims 1 to 7 when executed.
A computer storage medium, wherein a data disaster recovery program is stored on the computer storage medium, and when the data disaster recovery program is executed by a processor, the method according to any one of claims 1 to 7 is realized. The steps of the data disaster recovery recovery method.