CN112764907B - Task processing method and device, electronic equipment and storage medium - Google Patents

Task processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112764907B
CN112764907B CN202110101553.6A CN202110101553A CN112764907B CN 112764907 B CN112764907 B CN 112764907B CN 202110101553 A CN202110101553 A CN 202110101553A CN 112764907 B CN112764907 B CN 112764907B
Authority
CN
China
Prior art keywords
task
tasks
source
downstream
freezing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110101553.6A
Other languages
Chinese (zh)
Other versions
CN112764907A (en
Inventor
余利华
郭忆
李卓豪
陈苏安
汪源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Hangzhou Network Co Ltd
Original Assignee
Netease Hangzhou Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Hangzhou Network Co Ltd filed Critical Netease Hangzhou Network Co Ltd
Priority to CN202110101553.6A priority Critical patent/CN112764907B/en
Publication of CN112764907A publication Critical patent/CN112764907A/en
Application granted granted Critical
Publication of CN112764907B publication Critical patent/CN112764907B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/483Multiproc

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Retry When Errors Occur (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a task processing method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring at least one initial source task; generating corresponding dependency relationship information for each initial source task in the at least one initial source task; combining the at least one initial source task according to the dependency relationship information to obtain at least one final source task; and freezing the at least one final source task and all corresponding downstream tasks to stop the operation of the at least one final source task and all corresponding downstream tasks. The invention can assist in quickly recovering the data faults, greatly reduces the time for repairing the data, and simultaneously efficiently ensures the correctness of the repaired data.

Description

Task processing method and device, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of databases, and in particular relates to a task processing method, a task processing device, electronic equipment and a storage medium.
Background
Currently, data processing is performed based on the data warehouse technology (Extract-Transform-Load), and generally, one data processing flow is divided into a plurality of task steps to be completed. There is a complex dependency between the tasks, and the task upstream of a task is successful and can only be run after the run time of the task is reached.
In the related technology, when a large number of data errors occur, one or more tasks with the most upstream data errors are found, a directed acyclic graph (DIRECTED ACYCLIC GRAPH, DAG) corresponding to the task is obtained according to the task and the tasks already running on the downstream of the task, and the data is repaired by running the DAG once again from the upstream to the downstream.
However, in the process of repairing data, the above scheme reaches the operation time corresponding to the task which is not operated at the downstream, and the downstream task is operated based on the data corresponding to the upstream task, so that a new data error is caused.
Disclosure of Invention
The embodiment of the invention provides a task processing method, a device, electronic equipment and a storage medium, which are used for completely freezing tasks of repair data so as to solve the problems of incorrect output of downstream task data and low data repair efficiency in the process of repairing the data.
In a first aspect, an embodiment of the present invention provides a task processing method, where the method includes:
acquiring at least one initial source task;
generating corresponding dependency relationship information for each initial source task in the at least one initial source task;
Combining the at least one initial source task according to the dependency relationship information to obtain at least one final source task;
and freezing the at least one final source task and all corresponding downstream tasks to stop the operation of the at least one final source task and all corresponding downstream tasks.
In a possible implementation manner, the merging processing of the at least one initial source task according to the dependency relationship information includes:
Determining a plurality of initial source tasks with upstream and downstream task relationships according to the dependency relationship information;
And taking the initial source task at the most upstream in the plurality of initial source tasks and the initial source task without the relation between the upstream and downstream tasks as final source tasks.
The task processing method provided by the embodiment of the invention carries out merging processing on the initial source tasks so as to avoid repeated freezing processing on the same downstream tasks.
In one possible embodiment, before the freezing treatment is performed on the at least one end-source task, the method further includes:
And merging the common downstream tasks when determining that different upstream tasks of the at least one final source correspond to the common downstream tasks according to the dependency relationship information.
The task processing method provided by the embodiment of the invention combines and processes a plurality of common downstream tasks into one downstream task, and avoids repeated processing of the same downstream task.
In one possible implementation manner, generating the corresponding dependency information for each initial source task includes:
Adopting a DFS/BFS algorithm to determine a DAG task instance graph corresponding to each initial source task;
And adding a downstream task which does not reach the execution time of the initial source task on the basis of the DAG task instance graph corresponding to each initial source task to obtain the dependency relationship information.
The embodiment of the invention provides a task processing method, which not only comprises a downstream task for generating a task instance, but also comprises a downstream task which does not reach execution time, so that incorrect data output of the downstream task due to incomplete repair of the upstream task is avoided when the execution time of the subsequent downstream task is reached.
In one possible embodiment, the method further comprises:
And responding to a task unfreezing instruction, starting from the at least one final source task, unfreezing according to a mode of executing the unfreezing process of the downstream task when the unfreezing process of the upstream task is completed completely from the upstream task to the downstream task, so as to restore the scheduled operation of the unfrozen task.
According to the task processing method provided by the embodiment of the invention, the execution of the downstream task depends on all thawing of the upstream task, so that thawing can be controlled when the upstream task is repaired, the downstream task is executed again, and the problem of incorrect data output of the downstream task is solved.
In one possible embodiment, the task to be thawed for performing the unfreezing process is thawed in the following manner:
If the task to be unfrozen has generated a task instance, re-operating the task instance of the task to be unfrozen, and determining that the task to be unfrozen finishes unfreezing treatment after the re-operation is successful; or alternatively
And if the task to be unfrozen does not generate a task instance, directly removing freezing treatment of the task to be unfrozen.
According to the task processing method provided by the embodiment of the invention, the task generating the task instance is rerun when the task is defrosted, and the defrosting operation is executed again when the rerun of the task with errors is successful, so that the problem of incorrect output of downstream task data is solved.
In one possible implementation manner, after re-running the task instance of the task to be thawed, the method further includes:
carrying out thawing failure prompt when the heavy operation failure is determined;
and when the forced defrosting instruction is received, the freezing processing of the task to be defrosted indicated by the defrosting instruction is released.
The task processing method provided by the embodiment of the invention supports the repair of the task which fails to run again, and can repair according to the defrosting failure prompt, and after repair, the task is defrosted by the forced defrosting instruction.
In one possible embodiment, the method further comprises:
Identifying a freezing pool state for freezing tasks as being generated, and adding the at least one final source task and all corresponding downstream tasks into the freezing pool;
And when the freezing treatment of all the tasks in the freezing pool is completed, marking the state of the freezing pool as frozen.
In one possible implementation, when the task in the freezing tank is thawed, the state of the freezing tank is identified as being thawed;
And when all tasks in the freezing pool are unfrozen, marking the state of the freezing pool as unfrozen.
The task processing method provided by the embodiment of the invention provides a scheme for adding the task into the freezing pool to manage after freezing, and the task freezing processing process can be determined through the state of the freezing pool.
In one possible embodiment, the method further comprises:
Determining a task to be defrosted for performing defrosting currently in response to the defrosting suspension instruction;
and suspending the task to be defrosted, and identifying the state of the freezing pool as suspended.
The task processing method provided by the embodiment of the invention provides a pause freezing function so as to meet the requirement of 'stopping' data restoration in the operation and maintenance process.
In one possible embodiment, the method further comprises:
in response to a resume thawing instruction, determining at least one task to be thawed that is not thawed and is furthest upstream in the paused freezing tank;
And starting from the at least one task to be unfrozen, unfreezing in a mode of executing the unfreezing process of the downstream task when the unfreezing process of the downstream task is completed by the upstream task.
The task processing method provided by the embodiment of the invention provides a recovery thawing function so as to meet the requirement of recovering data in the operation and maintenance process.
In one possible implementation manner, after suspending the thawing of the task to be thawed, the method further includes:
responding to a task deleting instruction, and determining a source task indicated by the task deleting instruction;
When the indicated source task and all corresponding downstream tasks are determined to be independent of other final source tasks except the source task, thawing and deleting the indicated final source task and all corresponding downstream tasks;
Otherwise, reserving tasks which depend on other final source tasks, and thawing and deleting other tasks except the reserved tasks in the indicated source tasks and all downstream tasks.
The task processing method provided by the embodiment of the invention provides a function of deleting the task in the freezing pool, and realizes the updating of the freezing pool.
In one possible implementation manner, after suspending the thawing of the task to be thawed, the method further includes:
Responding to a task adding instruction, and determining a source task indicated by the task adding instruction;
And determining that the same task does not exist in the indicated source task and all downstream tasks thereof and the at least one final source task and all downstream tasks corresponding to the source task, and freezing the indicated source task and all downstream tasks thereof so as to stop the operation of the indicated source task and all downstream tasks thereof.
The task processing method provided by the embodiment of the invention provides a task adding mode for adding the task which does not intersect with the task in the freezing pool, so as to update the freezing pool.
In a possible implementation manner, when the indicated source task and all downstream tasks exist the same task as the at least one final source task and all downstream tasks corresponding to the source task are determined, the method further comprises:
if the at least one final source task and all the corresponding downstream tasks exist the same tasks as the indicated source task, freezing the same tasks and all the corresponding downstream tasks so as to stop the operation of the same tasks and all the downstream tasks;
If the at least one final source task and all the corresponding downstream tasks exist in the same tasks as the indicated downstream tasks of the source task, freezing the same tasks and all the corresponding downstream tasks to stop the operation of the same tasks and all the downstream tasks of the same tasks, taking the indicated source task as a new final source task, and freezing the new final source task and all the corresponding downstream tasks to stop the operation of the new final source task and all the corresponding downstream tasks;
The task processing method provided by the embodiment of the invention provides a task adding mode for adding the task intersected with the task in the freezing pool, so as to update the freezing pool.
In a second aspect, an embodiment of the present invention provides a task processing device, including:
the source task acquisition module acquires at least one initial source task;
The dependency relation generation module is used for generating corresponding dependency relation information for each initial source task in the at least one initial source task;
The source task merging module is used for merging the at least one initial source task according to the dependency relationship information to obtain at least one final source task;
and the freezing processing module is used for freezing the at least one final source task and all corresponding downstream tasks so as to stop the operation of the at least one final source task and all corresponding downstream tasks.
In a possible implementation manner, the source task merging module performs merging processing on the at least one initial source task according to the dependency relationship information, and includes:
Determining a plurality of initial source tasks with upstream and downstream task relationships according to the dependency relationship information;
And taking the initial source task at the most upstream in the plurality of initial source tasks and the initial source task without the relation between the upstream and downstream tasks as final source tasks.
In one possible embodiment, before the freezing processing module performs the freezing processing on the at least one final source task, the freezing processing module is further configured to:
And merging the common downstream tasks when determining that different upstream tasks of the at least one final source correspond to the common downstream tasks according to the dependency relationship information.
In one possible implementation manner, the dependency relationship generating module generates corresponding dependency relationship information for each initial source task, including:
Adopting a DFS/BFS algorithm to determine a DAG task instance graph corresponding to each initial source task;
And adding a downstream task which does not reach the execution time of the initial source task on the basis of the DAG task instance graph corresponding to each initial source task to obtain the dependency relationship information.
In one possible embodiment, the apparatus further comprises:
And the task defrosting module is used for starting from the at least one final source task in response to the task defrosting instruction, defrosting according to a mode of executing the unfreezing process of the downstream task when the upstream task is completely unfreezed from the upstream task to the downstream task, so as to restore the scheduled operation of the defrosted task.
In one possible implementation, the task thawing module thaws a task to be thawed that performs a unfreezing process in the following manner:
If the task to be unfrozen has generated a task instance, re-operating the task instance of the task to be unfrozen, and determining that the task to be unfrozen finishes unfreezing treatment after the re-operation is successful; or alternatively
And if the task to be unfrozen does not generate a task instance, directly removing freezing treatment of the task to be unfrozen.
In one possible implementation manner, after the task thawing module rerun the task instance of the task to be thawed, the method further includes:
carrying out thawing failure prompt when the heavy operation failure is determined;
and when the forced defrosting instruction is received, the freezing processing of the task to be defrosted indicated by the defrosting instruction is released.
In one possible embodiment, the method further comprises:
The freezing pool processing module is used for identifying the freezing pool state for freezing the task as being generated and adding the at least one final source task and all corresponding downstream tasks into the freezing pool; and when the freezing treatment of all the tasks in the freezing pool is completed, marking the state of the freezing pool as frozen.
In one possible embodiment, the freeze pool processing module is further configured to:
When the task in the freezing pool is unfrozen, marking the state of the freezing pool as unfreezing;
And when all tasks in the freezing pool are unfrozen, marking the state of the freezing pool as unfrozen.
In one possible embodiment, the method further comprises:
The pause defrosting module is used for responding to the pause defrosting instruction and determining a task to be defrosted for performing defrosting currently; and suspending the task to be defrosted, and identifying the state of the freezing pool as suspended.
In one possible embodiment, the method further comprises:
the thawing recovery module is used for thawing at least one task to be thawed, which is not thawed and is at the most upstream, in the suspended freezing pool in response to a thawing recovery instruction; and starting from the at least one task to be unfrozen, unfreezing in a mode of executing the unfreezing process of the downstream task when the unfreezing process of the downstream task is completed by the upstream task.
As a possible implementation manner, after the pause thawing module pauses thawing the task to be thawed, the pause thawing module is further configured to:
responding to a task deleting instruction, and determining a source task indicated by the task deleting instruction;
When the indicated source task and all corresponding downstream tasks are determined to be independent of other final source tasks except the source task, thawing and deleting the indicated final source task and all corresponding downstream tasks;
Otherwise, reserving tasks which depend on other final source tasks, and thawing and deleting other tasks except the reserved tasks in the indicated source tasks and all downstream tasks.
As a possible implementation manner, after the pause thawing module pauses thawing the task to be thawed, the pause thawing module is further configured to:
Responding to a task adding instruction, and determining a source task indicated by the task adding instruction;
And determining that the same task does not exist in the indicated source task and all downstream tasks thereof and the at least one final source task and all downstream tasks corresponding to the source task, and freezing the indicated source task and all downstream tasks thereof so as to stop the operation of the indicated source task and all downstream tasks thereof.
As a possible implementation manner, the pause thawing module is further configured to, when determining that the indicated source task and all downstream tasks exist the same task as the at least one final source task and all downstream tasks corresponding to the at least one final source task, further:
if the at least one final source task and all the corresponding downstream tasks exist the same tasks as the indicated source task, freezing the same tasks and all the corresponding downstream tasks so as to stop the operation of the same tasks and all the downstream tasks;
And if the at least one final source task and all the corresponding downstream tasks exist in the same tasks as the indicated downstream tasks of the source task, freezing the same tasks and all the corresponding downstream tasks to stop the operation of the same tasks and all the downstream tasks, taking the indicated source task as a new final source task, and freezing the new final source task and all the corresponding downstream tasks to stop the operation of the new final source task and all the corresponding downstream tasks.
In a fourth aspect, an embodiment of the present invention provides an electronic device, including:
one or more processors, and a memory for storing instructions executable by the processors;
wherein the processor is configured to execute the instructions to implement any of the task processing methods provided in the first aspect above.
In a fifth aspect, an embodiment of the present invention provides a storage medium, where a computer program is stored, where the computer program, when executed by a processor, implements any one of the task processing methods provided in the first aspect.
The task processing method, the device, the electronic equipment and the storage medium provided by the embodiment of the invention have the following beneficial effects:
Selecting a plurality of initial source tasks for processing, merging related tasks which possibly cause repeated generation of task examples in the initial source tasks to obtain a final source task, and freezing the final source task and all downstream tasks to solve the problem that a complementary data restoration scheme can only select one source task, so that data restoration can be rapidly performed, and the data restoration efficiency is improved; the problem that task examples are repeatedly generated when data restoration is independently carried out on a plurality of source tasks is avoided; the problem of incorrect output of downstream task data in the process of repairing the data can be avoided.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention and do not constitute a undue limitation on the invention.
FIG. 1 is a DAG graph illustrating multiple source tasks according to an exemplary embodiment;
FIG. 2 is a DAG diagram illustrating a complement operation for a single source task, according to an exemplary embodiment;
FIG. 3 is a DAG diagram illustrating an additional run-free upstream task in accordance with an exemplary embodiment;
FIG. 4 is a DAG graph generated in accordance with the corresponding technique shown in accordance with an exemplary embodiment;
FIG. 5 is a DAG graph illustrating correspondence of an upstream data unrepaired correctly leading to data corruption, according to an example embodiment;
FIG. 6 is a flowchart illustrating a method of task processing according to an exemplary embodiment;
FIG. 7 is a schematic diagram showing torsion of a freeze pool state according to an exemplary embodiment;
FIG. 8 is a DAG graph illustrating a freeze pool defrost operation according to an exemplary embodiment;
FIG. 9 is a DAG graph illustrating a freeze pool pause/resume operation according to an exemplary embodiment;
FIG. 10 is a DAG diagram of a freeze-before-delete task pool, shown in accordance with an exemplary embodiment;
FIG. 11 is a diagram of a DAG corresponding to a freeze pool after deleting tasks and adding tasks in accordance with an exemplary embodiment;
FIG. 12 is a schematic diagram of a task processing device shown according to an example embodiment;
FIG. 13 is a schematic diagram of an electronic device structure shown in accordance with an exemplary embodiment;
FIG. 14 is a schematic diagram of a program product shown according to an exemplary embodiment.
Detailed Description
The principles and spirit of the present invention will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable those skilled in the art to better understand and practice the invention and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Those skilled in the art will appreciate that embodiments of the invention may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
In this document, it should be understood that any number of elements in the drawings is for illustration and not limitation, and that any naming is used only for distinction and not for any limitation.
The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments thereof.
In the following, some terms in the embodiments of the present invention are explained for easy understanding by those skilled in the art.
(1) The data warehouse technology ETL (Extract-Transform-Load) is used to describe the process of extracting an Extract from a source, transposing the Transform, and loading a Load to a destination.
(2) Directed-Acyclic-Graph, in Graph theory, a Directed acyclic Graph is one if it cannot go back to any vertex through several edges, starting from that vertex.
(3) Task/task instance, the relationship of a task to a task instance is similar to the relationship of code to a process. The task is a definition part of a task instance, and the task instance is one real operation of the task. The task can set corresponding scheduling time on the scheduling system, and when the designated scheduling time is reached, the scheduling system can generate and operate a corresponding task instance according to the content of the task.
(4) Task dependencies, there are dependencies between tasks, and these dependencies are finally reflected on task instances, i.e. if a downstream task instance needs to run, the precondition is that an upstream task instance has run successfully.
(5) Freezing/thawing refers to the action of a certain task from a normal state to a frozen state, and thawing refers to the action of a certain task from a frozen state to a normal state.
(6) In normal state/frozen state, when the appointed scheduling time arrives, the scheduling system generates a corresponding task instance. After a task is frozen, the task is in a frozen state, and the scheduling system cannot generate a corresponding task instance for the frozen task.
Summary of The Invention
The inventors found that the basic procedure for the complement data function is: finding one or more final upstream tasks, obtaining a DAG corresponding to the task according to the task and the task already running downstream of the task, and re-running the DAG from upstream to downstream, wherein the main scene is to trace data back for the newly-online task and is not designed for recovering the fault of the data error, so that at least the following problems often occur when the fault recovery is carried out by utilizing the data supplementing function.
1) Numerous source tasks
Because there is not and only one error end source task, if data is supplemented once for each end source task, a large number of repeated running downstream task instances can be generated, and the data correctness of the downstream task instances cannot be ensured.
As shown in FIG. 1, there are two source tasks, task 1-1 and task 1-2, respectively, which both have a common downstream task, specifically downstream task 2-2, task 3-1, task 3-2. If the data complement operations are performed on task 1-1 and task 1-2, respectively, as shown in FIG. 2, two DAG execution graphs are generated. Task 2-2, task 3-1, task 3-2 would generate two task instances, respectively, which would undoubtedly result in a waste of computing resources.
Meanwhile, the task 2-2 needs to rely on the task 1-1 and the task 1-2, if only one of the task 1-1 and the task 1-2 is subjected to the complement operation, the complement operation is directly performed on the task 2-2, and the correctness of the output data of the task 2-2 cannot be guaranteed obviously. Similarly, the correctness of the output data of the tasks 3-1 and 3-2 is difficult to be ensured.
In order to solve the above problem, as shown in fig. 3, an idle upstream task 0 is added, then the task 1-1 and the task 1-2 are dependent on the task, and the data complement operation is directly performed on the task 0. By the method, the situation that the downstream tasks are operated for multiple times when the plurality of source tasks have the same downstream tasks can be avoided, and meanwhile, the accuracy of the dependence of the downstream tasks is guaranteed, for example, in the original DAG, the task 2-2 depends on the task 1-1 and the task 1-2 in two DAGs respectively, and in the new DAG, the task 1-1 and the task 1-2 depend on each other simultaneously. Therefore, when the data of the task 2-2 is repaired by the new DAG, the task 1-1 and the task 1-2 already complete data repair, and the accuracy of overall data repair is ensured.
Although the problems can be solved by adding tasks and setting task dependency relationships, the added tasks also change the DAG relationship of the ETL tasks, and increase the operation and maintenance cost of the subsequent tasks. Meanwhile, the operation of setting the dependence is complicated, and when the number of source tasks is huge, a great amount of time is required to set the most upstream idle running task on which the source tasks depend.
2) Incorrect downstream task yield data
Before the complement operation, the complement function can draw a DAG graph of the complement execution task instance by depending on the DAG relation of the ETL task. The DAG graph comprises task instances with scheduled execution time less than or equal to the current time. If the scheduled execution time of the task instance is greater than the current time, then no entry is made to the DAG graph for the task instance.
Taking the DAG shown in fig. 1 as an example, assuming that the tasks 2-3 and 3-2 have not reached the specified scheduled execution time when repairing data by the complement data function, the DAG execution diagram of the task instance drawn by the complement data is the part shown by the solid line in fig. 4.
Since the data repair itself also takes up a certain time, and the time taken up by different tasks is different. Therefore, when the data repair is performed by the data complement function, it is possible that the scheduled execution time of the tasks 2-3 and 3-2 is reached, but the upstream data is not repaired in the correct scene.
Taking fig. 5 as an example, the solid unfilled portion represents a task for which data has been repaired correctly, and the solid filled portion represents a task for which data has not been repaired correctly. It can be seen that task 2-3 is running, and that task 1-2 data has been repaired correctly, so that the data of the upstream task is correct when task 2-3 is running. When task 2-3 is completed, task 3-2 begins to run. However, the data of the task 2-2 is not repaired correctly at this time, so that when the task 3-2 is running, the data upstream of the task still has partial errors, and finally, the data produced by the task 3-2 is still erroneous.
Errors of the type described above are often difficult to find, and given that there are many other tasks downstream of task 3-2, the errors will in turn lead to a data failure.
In view of the above, the embodiment of the invention provides a task processing method, which selects a plurality of initial source tasks, merges related tasks which possibly cause repeated generation of task instances in the plurality of initial source tasks to obtain a final source task, and freezes the final source and all downstream tasks thereof, so as to solve the problem that a complementary data repairing scheme can only select one source task or needs to add an idle running task to the plurality of source tasks, and improve the efficiency of data repairing; and the problem that the task instance and the downstream task data are produced incorrectly in the process of repairing the data is solved.
Having described the basic principles of the present invention, various non-limiting embodiments of the invention are described in detail below.
Exemplary method
FIG. 6 is a flowchart illustrating a method of task processing, according to an exemplary embodiment, the method including the steps of:
In step S601, at least one initial source task is obtained;
in practice, when a data error occurs, one or more tasks with the most upstream errors are obtained as initial source tasks.
In step S602, corresponding dependency information is generated for each of the at least one initial source task;
The dependency relationship information of each initial source task comprises all downstream tasks taking the initial source task as a final upstream task and upstream and downstream dependency relationships among the tasks, wherein all downstream tasks comprise tasks reaching a specified planned running time and tasks not reaching the specified planned running time.
In step S603, the at least one initial source task is merged according to the dependency relationship information, so as to obtain at least one final source task;
the at least one initial source task obtained may have an upstream-downstream dependency relationship with a part of the initial source tasks, or may not have an upstream-downstream dependency with any other initial source. One possible way of having an upstream-downstream dependency is that one of the initial source tasks is a downstream task of the other initial source task.
As an optional implementation manner, to avoid that one task generates multiple task instances, the merging processing is performed on the at least one initial source task according to the dependency relationship information, including:
Determining a plurality of initial source tasks with upstream and downstream task relationships according to the dependency relationship information;
And taking the initial source task at the most upstream in the plurality of initial source tasks and the initial source task without the relation between the upstream and downstream tasks as final source tasks.
The final source tasks after the merging processing are obtained, and a common downstream task may exist.
As an alternative embodiment, before the freezing treatment is performed on the at least one final source task, in order to avoid that one task generates multiple task instances, the method further includes:
And merging the common downstream tasks when determining that different upstream tasks of the at least one final source correspond to the common downstream tasks according to the dependency relationship information.
In step S604, the at least one end-source task and all corresponding downstream tasks are frozen, so as to stop the operation of the at least one end-source task and all corresponding downstream tasks.
According to the task processing method provided by the embodiment of the invention, a plurality of initial source tasks are obtained at the same time, the final source tasks and all downstream tasks thereof are frozen by combining related tasks which possibly cause repeated generation of task instances in the initial source tasks, so that on one hand, the problem that the task instances are repeatedly generated when the plurality of source tasks are singly subjected to data restoration is avoided, and the problem that the complementary data restoration scheme can only select one source task is solved, the data restoration can be rapidly performed, and the efficiency of the data restoration is improved; on the other hand, the problem that the data output of the downstream task is incorrect in the process of repairing the data can be solved by avoiding that the data repair of the upstream task is not completed, and the downstream task reaching the designated scheduled running time is operated by using the error data.
As an optional implementation manner, the embodiment of the present invention may generate the corresponding dependency information for each initial source task in the following manner:
Determining a DAG task instance graph corresponding to each initial source task by adopting a DFS (DEPTH FIRST SEARCH, depth-first search)/BFS (Breadth FIRST SEARCH, breadth-first search) algorithm;
And adding a downstream task which does not reach the execution time of the initial source task on the basis of the DAG task instance graph corresponding to each initial source task to obtain the dependency relationship information.
In the implementation, the acquired initial source tasks are calculated to obtain DAG task instance graphs corresponding to each initial source task one by one through a DFS/BFS algorithm, and each DAG task instance graph only comprises task instances reaching the specified scheduled running time and does not contain task instances not reaching the specified scheduled running time. After all the initial source tasks are traversed, adding the downstream tasks which do not reach the execution time of the initial source tasks on the basis of the DAG task instance graph corresponding to each initial source task, and obtaining the dependency relationship information. Then, carrying out merging operation and de-duplication operation on the DAG task instance graph of each initial source task, wherein the merging operation is to take the source task of a DAG task instance graph as a common final source task if the downstream task in the certain DAG task instance graph is the initial source task of another DAG; and combining the common downstream tasks to realize repeated dependency relationship deduplication.
As an optional implementation manner, after the data repairing process, the freezing treatment is performed on the at least one end source task and all corresponding downstream tasks, a thawing operation may be further performed, and specifically, thawing may be performed in the following manner:
In step S604, in response to the task thawing instruction, from the at least one final source task, thawing is performed in a manner from the upstream task to the downstream task, and the downstream task unfreezing process is performed when the unfreezing process is completed for all the upstream task, and after the unfreezing process is completed for the task, the task is in a thawed state, i.e., a normal state, and can be scheduled to run by the scheduling system.
As an alternative implementation manner, the embodiment of the invention uses the freezing pool to freeze the task, and a possible implementation manner is given below.
1) Freezing function of freezing tank
As shown in fig. 7, when the freezing pool is created and the task freezing process is started, identifying the freezing pool state for freezing the task as being generated, adding the at least one final source task and all corresponding downstream tasks into the freezing pool, and specifically, freezing each task one by one according to the DAG sequence corresponding to the dependency relationship; and when the freezing treatment of all the tasks in the freezing pool is completed, marking the state of the freezing pool as frozen.
The task has a normal state and a frozen state, wherein the normal state is a state that the task starts to run when reaching the specified scheduled task running time, and the frozen state is a state that the task stops running or does not run when reaching the specified scheduled running time. The embodiment of the invention can store the freezing table in the database to record the tasks added into the freezing pool, the task state torsion is realized by the scheduling system, and if a certain task needs to be frozen at a certain time point, the task information and the corresponding time point of the task are recorded in the freezing table. As shown in table 1, once recorded in the freeze table, the task state is twisted from the normal state to the frozen state.
Table 1 freeze table
By adding at least one end-source task and all corresponding downstream tasks to the freeze pool, simultaneous selection of multiple end-source tasks for data repair may be supported. Taking fig. 1 as an example, tasks 1-1 and 1-2 may be selected as source tasks in the freezing pool, and then tasks 2-1, 2-2, 2-3, 3-1 and 3-2 may be added to the freezing pool as downstream tasks.
Tasks added to the freezing pool are frozen, and the frozen tasks are in a frozen state. Once a task is in a frozen state, the task will not generate a corresponding task instance even if it is up to the specified scheduled execution time.
When the task of the generated task instance executes the freezing processing, whether the task instance generated by the task is still in the running state or not is judged, and if the task instance is in the running state, the task instance is terminated. Because the upstream data is wrong, even if the operation can be successful, the data is wrong, and the embodiment of the invention freezes the task, so that incorrect data can be prevented from being produced by continuing the downstream task.
2) Thawing function of freezing tank
The embodiment of the invention provides a thawing operation for the freezing tank in a frozen state, and the thawing operation can twist the state of the freezing tank to a thawing state. And the freezing pool in the state corresponds to the DAG according to the dependency relationship information, and unfreezing operation is carried out on each task one by one according to the DAG sequence. For example, with reference to FIG. 1, task 2-2 requires that both task 1-1 and task 1-2 be in a thawed state prior to thawing.
As shown in fig. 7, when the task in the freezing pool is thawed, the state of the freezing pool is identified as being thawed; and when all tasks in the freezing pool are unfrozen, marking the state of the freezing pool as unfrozen, and repairing the representing data correctly. If there are other accidents during the thawing process, the state of the freezing chamber will be twisted from the thawing to the failed state.
The defrost operation is actually the deletion of the frozen record corresponding to table 1 in the database. Once the frozen record of the task is deleted from the table 1, the task state is twisted from the frozen state to the normal state, and when the task in the normal state is to be checked for next scheduling, a corresponding task instance is generated.
When the scheduling system generates a task instance for a certain task, firstly, judging whether the task is in a freezing table (task existence task information and scheduled planned execution time), if so, temporarily not generating the instance (delay generation), and generating the instance after thawing.
For example: when the scheduled execution time of the task 2-2 is 09:00 of each day, the scheduling system can judge whether the task at the 09:00 moment of the task 2-2 is in the freezing table or not when the scheduled execution time reaches 09:00, the azkaban system can uniquely locate each task through the project_id and the flow_name, and each task instance can also uniquely locate each task through the project_id, the flow_name and the schedule_exec_time. It is thus determined whether task 2-2 is frozen at 09:00, i.e., the corresponding record is actually looked up from the freezing table by project_id, flow_name and schedule_exec_time. If the corresponding record is found, the task instance of the task 2-2 at 09:00 is frozen, the task instance is not generated until the corresponding record in the freezing table is deleted, and the task instance of the task 2-2 at 09:00 is not generated.
As an alternative embodiment, the task to be thawed for performing the unfreezing process is thawed in the following manner:
Judging whether the task to be unfrozen generates a task instance, if so, restarting the task instance of the task to be unfrozen, namely performing task instance restarting operation on the task, and determining that the task to be unfrozen finishes unfreezing treatment after restarting is successful; if the task to be unfrozen does not generate a task instance, the scheduling system is directly requested to remove the freezing processing of the task to be unfrozen.
For the task of re-running failure, data repair operation is needed, the freezing pool also provides forced thawing operation, and the function provides a manual processing opportunity for data developers.
As an optional implementation manner, after re-running the task instance of the task to be thawed, the method further includes:
Carrying out thawing failure prompt when determining that the rerun fails so as to prompt a data developer to carry out data repairing operation;
When the forced defrosting instruction is received, the freezing processing of the task to be defrosted indicated by the defrosting instruction is released, and after the data restoration is completed, the forced defrosting instruction can be triggered to be generated, and the freezing processing of the task to be defrosted indicated by the defrosting instruction is released.
Taking fig. 8 as an example, tasks 1-1, 1-2, 2-1, 2-2 and 3-1 are tasks for generating task instances, tasks 2-3 and 3-2 are tasks for not generating task instances, and when the tasks 1-1, 1-2, 2-1, 2-2 and 3-1 are defrosted, the corresponding task instances are re-run first, and if the re-run instances run successfully, the states of the corresponding tasks are set to be defrosted. The tasks 2-3 and 3-2 are directly set in a unfrozen state because no instance is generated, and if the tasks 2-3 and 3-2 are in the unfrozen state and the scheduled execution time corresponding to the tasks is reached, corresponding task instances are generated in the scheduling system.
As an alternative embodiment, as shown in Table 2, a task table in the freezing pool may be created based on Table 1, states of tasks may be identified by status in the task table, status of tasks may be identified as frozen if freezing processing of tasks is performed, status of tasks may be identified as unfrozen if unfreezing processing is performed on tasks, and status of tasks may be identified as unfrozen if unfreezing processing is performed on tasks.
Table 2 task table in freeze pool
Typically, the modification of status of tasks can be achieved by the freezing pool itself by performing a defrosting operation, such as by defrosting the tasks one by one to ultimately achieve a state twist. If the thawing operation of the task fails, such as the case of task re-running failure, manual intervention is needed, a failure prompt is output on a page, and an entrance is provided for modifying status of the task, which is limited to being modified from thawing to thawing.
3) Pause and resume functions of freezing ponds
In the related art, if the time for repairing the data is long, in order to avoid the influence of data repair on the data output in the subsequent time period, the data complement function needs to be turned off and all running task instances need to be terminated before the task in the subsequent time period starts to run. However, the complementary data function is not supported to resume after a pause, and therefore can only be stopped completely. Thus, when the repair time period is reached next time, the running downstream tasks need to be manually re-run. If the number of the downstream tasks needing manual re-running is large, the downstream tasks are added into the idle running tasks which are depended on at the same time, and the fault data repairing process is carried out again according to the previous data repairing scheme.
The embodiment of the invention provides a pause function for a freezing processing task, which specifically comprises the following steps:
Determining a task to be defrosted for performing defrosting currently in response to the defrosting suspension instruction;
and suspending the task to be defrosted, and identifying the state of the freezing pool as suspended.
When the freezing process of the task is performed by using the freezing pool, the freezing pool in thawing also provides a pause function, and as shown in fig. 7, the pause operation can twist the state of the freezing pool from the thawing to the paused state. In view of the status of the task in the freezing pool provided in the above embodiment of the present invention, when the freezing pool is in a suspended state, the thawing progress of the freezing pool may be determined.
The embodiment of the invention provides a function for recovering a freezing processing task, which specifically comprises the following steps:
In response to a resume thawing instruction, determining at least one task to be thawed that is not thawed and is furthest upstream in the paused freezing tank; and starting from the at least one task to be unfrozen, unfreezing in a mode of executing the unfreezing process of the downstream task when the unfreezing process of the downstream task is completed by the upstream task.
When the application freezing pond performs the freezing process of the task, as shown in fig. 7, the freezing pond in the suspended state may perform a resume operation, which may twist the state of the freezing pond from the suspended state to the thawing state. Specifically, according to status of tasks in the freezing pool, status is determined to be frozen task, and at least one task to be thawed at the most upstream is determined to start to continue to execute thawing operation.
After the pause operation is triggered, the freezing pool will not continue to defrost the downstream task in the embodiment of the invention. For a task in thawing will be reset to a frozen state, if an instance of the task is run again while thawing, and the corresponding instance is still running, it will be terminated.
Taking fig. 9 as an example, tasks 1-1, 1-2, 2-1, 2-3 are already in a thawed state before the pause operation is triggered. Task 2-2 is in thawing. After the pause operation is triggered, task 2-2 is reset to frozen, the task instance in which task 2-2 is rerun is terminated, and tasks 3-1 and 3-2 do not continue to defrost.
After the restoration operation of the freezing pool is triggered, the task 2-2 is continuously thawed, and the task instance of the task 2-2 is re-run. After task 2-2 has been thawed, tasks 3-1, 3-2 will continue to be thawed. When all tasks are in the unfrozen state, the state of the freezing pool is changed from the unfrozen state to the unfrozen state.
The pause and resume operation of the freezing pool provided by the embodiment of the invention can be temporarily interrupted according to the need and does not need to be remarked when resuming later, thereby improving the data repair efficiency, providing the time for data repair, for example, suspending the thawing of the task to be thawed when the re-running task instance fails, and marking the state of the freezing pool as paused. And after the data is repaired correctly and the corresponding task is forcedly thawed, the thawing operation of the freezing pool is restored.
4) Updating function of freezing pool
The embodiment of the invention provides a function for updating frozen tasks, which particularly can comprise a function for deleting a source task, and when deleting the source task, the source task and the downstream tasks thereof need to be completely thawed and removed from the original DAG. The downstream tasks that are defrosted and removed are required to be independent of other source tasks. As an optional implementation manner, after suspending the thawing of the task to be thawed, the method further includes:
responding to a task deleting instruction, and determining a source task indicated by the task deleting instruction;
When the indicated source task and all corresponding downstream tasks are determined to be independent of other final source tasks except the source task, thawing and deleting the indicated final source task and all corresponding downstream tasks;
Otherwise, reserving tasks which depend on other final source tasks, and thawing and deleting other tasks except the reserved tasks in the indicated source tasks and all downstream tasks.
The embodiment of the invention provides a function of adding a freezing task, and after suspending the task to be thawed, the method further comprises the following steps:
Responding to a task adding instruction, and determining a source task indicated by the task adding instruction;
And if the indicated source task and all downstream tasks do not have the same task with the at least one final source task and all downstream tasks corresponding to the at least one final source task, freezing the indicated source task and all downstream tasks so as to stop the operation of the indicated source task and all downstream tasks.
When the freezing pool is applied to freeze the tasks, determining the source tasks to be added into the freezing pool, and if the newly added source tasks and the downstream tasks do not coincide with the DAG of the original freezing pool, the original tasks are not affected, and the indicated source tasks and the corresponding downstream tasks can be directly frozen into the freezing pool.
As an optional implementation manner, when the indicated source task and all downstream tasks exist the same task as the at least one final source task and all downstream tasks corresponding to the source task, the following processing manner may be adopted:
if the at least one final source task and all the corresponding downstream tasks exist the same tasks as the indicated source task, freezing the same tasks and all the corresponding downstream tasks so as to stop the operation of the same tasks and all the downstream tasks;
And if the at least one final source task and all the corresponding downstream tasks exist in the same tasks as the indicated downstream tasks of the source task, freezing the same tasks and all the corresponding downstream tasks to stop the operation of the same tasks and all the downstream tasks, taking the indicated source task as a new final source task, and freezing the new final source task and all the corresponding downstream tasks to stop the operation of the new final source task and all the corresponding downstream tasks.
When the freezing pool is used for freezing the tasks, determining source tasks to be added into the freezing pool, if newly added source tasks and downstream tasks thereof are coincident with DAGs of the original freezing pool, and when the coincident tasks are indicated source tasks, re-freezing the start tasks and all other downstream tasks by taking the coincident tasks as starting points, freezing the indicated source tasks and the corresponding downstream tasks thereof into the freezing pool, and merging the coincident tasks. When the overlapping task is the downstream task of the indicated source task, the overlapping task is taken as a starting point, the starting point task and all other downstream tasks are re-frozen, the indicated source task is taken as a new final source task, the indicated source task and the corresponding downstream tasks are frozen and added into a freezing pool, and the overlapping tasks are combined. I.e. if a task is newly added upstream of a certain task, it is required to be re-run, whether or not it was previously thawed.
When the application freezing pool is used for freezing the task, the freezing pool in thawing also provides a pause function, and the pause operation can twist the state of the freezing pool to a paused state. The freezing pool in the suspended state can be updated, and if the freezing pool is updated, the state of the freezing pool can be twisted from the suspended state to the thawing state after the updating operation is finished.
FIG. 10 is a DAG execution diagram of an example freeze pool, with solid lines indicating that the task has been thawed and dashed lines indicating that the task is in a frozen state. In response to the task deletion instruction, determining that task 1-1 is deleted, determining that task 2-2, task 3-1 and task 3-2 downstream of task 1-1 depend on another task 1-2, thus preserving task 2-2, task 3-1 and task 3-2, deleting task 1-1 and task 2-1 independent of other source tasks, newly adding task 1-3 and task 2-3 downstream thereof, task 2-4 and task 3-3, overlapping task being 2-3, task 2-3 originally being thawed, but since newly adding task 1-3 is an upstream task of task 2-3, task 2-3 is re-frozen, adding task 1-3 and downstream task 2-4 and task 3-3 in the DAG graph, and the DAG of the freezing pool is changed from FIG. 10 to FIG. 11.
5) State torsion of freezing tank
When the freezing pool is used for executing task freezing treatment, the state of the freezing pool can be in the process of generating, freezing, thawing, suspending, thawing and failing according to different operations, the freezing pool in the suspended state can be abandoned, all tasks can be thawed by the abandoned freezing pool, and then the state of the freezing pool is changed to be abandoned. The freezing pool table structure can be used for maintaining the state of the freezing pool, and is specifically shown in table 2:
table 2 freezing pool table structure
Fields Type(s) Description of the invention
id int Main key self-increasing
name varchar Freezing pool name
status varchar Freezing pool state
creator varchar Creator person
create_time bigint Creation time
mofifier varchar Modifier person
modify_time bigint Modification time
service_time bigint Modification time
version int Freezing pool version
The state torsion of the freezing pool is realized by modifying the status field of the freezing pool table structure table. The Status field has: the in-process, frozen, in-process, thawed, failed, abandoned, and paused state. The torsion of the above state of the freezing chamber is described above and will not be repeated here.
The function of the freezing pool provided by the embodiment of the invention can help to quickly recover the data fault, greatly reduce the time for repairing the data, and also efficiently ensure the correctness of the repairing data.
Exemplary apparatus
Having described an embodiment of an exemplary task processing mode of the present invention, next, a task processing device of an exemplary embodiment of the present invention will be described with reference to fig. 12.
As shown in fig. 12, based on the same inventive concept, an embodiment of the present invention further provides a task processing device, including:
The source task acquisition module 1201 acquires at least one initial source task;
A dependency relationship generating module 1202, configured to generate corresponding dependency relationship information for each of the at least one initial source task;
The source task merging module 1203 is configured to merge the at least one initial source task according to the dependency relationship information to obtain at least one final source task;
And the freezing processing module 1204 is configured to perform freezing processing on the at least one final source task and all corresponding downstream tasks, so as to stop the operation of the at least one final source task and all corresponding downstream tasks.
In a possible implementation manner, the source task merging module 1203 merges the at least one initial source task according to the dependency relationship information, including:
Determining a plurality of initial source tasks with upstream and downstream task relationships according to the dependency relationship information;
And taking the initial source task at the most upstream in the plurality of initial source tasks and the initial source task without the relation between the upstream and downstream tasks as final source tasks.
In one possible implementation, before the freezing processing module 1204 performs the freezing processing on the at least one end source task, the freezing processing module is further configured to:
And merging the common downstream tasks when determining that different upstream tasks of the at least one final source correspond to the common downstream tasks according to the dependency relationship information.
In one possible implementation, the dependency generation module 1202 generates the corresponding dependency information for each of the initial source tasks, including:
Adopting a DFS/BFS algorithm to determine a DAG task instance graph corresponding to each initial source task;
And adding a downstream task which does not reach the execution time of the initial source task on the basis of the DAG task instance graph corresponding to each initial source task to obtain the dependency relationship information.
In one possible embodiment, the apparatus further comprises:
The task unfreezing module 1205 is configured to respond to a task unfreezing instruction, and begin from the at least one final source task, unfreeze the at least one final source task in a manner from an upstream task to a downstream task, and execute the downstream task unfreezing process when the unfreezing process is completely completed by the upstream task, so as to resume the scheduled operation of the unfrozen task.
In one possible implementation, the task thawing module 1205 thaws the task to be thawed that performs the unfreezing process in the following manner:
If the task to be unfrozen has generated a task instance, re-operating the task instance of the task to be unfrozen, and determining that the task to be unfrozen finishes unfreezing treatment after the re-operation is successful; or alternatively
And if the task to be unfrozen does not generate a task instance, directly removing freezing treatment of the task to be unfrozen.
In one possible implementation manner, after the task thawing module 1205 re-executes the task instance of the task to be thawed, the method further includes:
carrying out thawing failure prompt when the heavy operation failure is determined;
and when the forced defrosting instruction is received, the freezing processing of the task to be defrosted indicated by the defrosting instruction is released.
In one possible embodiment, the method further comprises:
The freezing pool processing module is used for identifying the freezing pool state for freezing the task as being generated and adding the at least one final source task and all corresponding downstream tasks into the freezing pool; and when the freezing treatment of all the tasks in the freezing pool is completed, marking the state of the freezing pool as frozen.
In one possible embodiment, the freeze pool processing module is further configured to:
When the task in the freezing pool is unfrozen, marking the state of the freezing pool as unfreezing;
And when all tasks in the freezing pool are unfrozen, marking the state of the freezing pool as unfrozen.
In one possible embodiment, the method further comprises:
The pause defrosting module is used for responding to the pause defrosting instruction and determining a task to be defrosted for performing defrosting currently; and suspending the task to be defrosted, and identifying the state of the freezing pool as suspended.
In one possible embodiment, the method further comprises:
the thawing recovery module is used for thawing at least one task to be thawed, which is not thawed and is at the most upstream, in the suspended freezing pool in response to a thawing recovery instruction; and starting from the at least one task to be unfrozen, unfreezing in a mode of executing the unfreezing process of the downstream task when the unfreezing process of the downstream task is completed by the upstream task.
As a possible implementation manner, after the pause thawing module pauses thawing the task to be thawed, the pause thawing module is further configured to:
responding to a task deleting instruction, and determining a source task indicated by the task deleting instruction;
When the indicated source task and all corresponding downstream tasks are determined to be independent of other final source tasks except the source task, thawing and deleting the indicated final source task and all corresponding downstream tasks;
Otherwise, reserving tasks which depend on other final source tasks, and thawing and deleting other tasks except the reserved tasks in the indicated source tasks and all downstream tasks.
As a possible implementation manner, after the pause thawing module pauses thawing the task to be thawed, the pause thawing module is further configured to:
Responding to a task adding instruction, and determining a source task indicated by the task adding instruction;
And determining that the same task does not exist in the indicated source task and all downstream tasks thereof and the at least one final source task and all downstream tasks corresponding to the source task, and freezing the indicated source task and all downstream tasks thereof so as to stop the operation of the indicated source task and all downstream tasks thereof.
As a possible implementation manner, the pause thawing module is further configured to, when determining that the indicated source task and all downstream tasks exist the same task as the at least one final source task and all downstream tasks corresponding to the at least one final source task, further:
if the at least one final source task and all the corresponding downstream tasks exist the same tasks as the indicated source task, freezing the same tasks and all the corresponding downstream tasks so as to stop the operation of the same tasks and all the downstream tasks;
And if the at least one final source task and all the corresponding downstream tasks exist in the same tasks as the indicated downstream tasks of the source task, freezing the same tasks and all the corresponding downstream tasks to stop the operation of the same tasks and all the downstream tasks, taking the indicated source task as a new final source task, and freezing the new final source task and all the corresponding downstream tasks to stop the operation of the new final source task and all the corresponding downstream tasks.
An electronic device 130 according to this embodiment of the present invention is described below with reference to fig. 13. The electronic device shown in fig. 13 is merely an example, and should not be construed as limiting the functionality and scope of use of the embodiments of the present invention.
As shown in fig. 13, the electronic device 130 may be in the form of a general purpose computing device, which may be a terminal device, for example. Components of electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 connecting the various system components, including the memory 132 and the processor 131. The processor 132 is configured to execute the instructions to implement the task processing method provided by the above-described embodiment of the present invention.
Bus 133 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, and a local bus using any of a variety of bus architectures.
Memory 132 may include readable media in the form of volatile memory such as Random Access Memory (RAM) 1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.
Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), one or more devices that enable a user to interact with electronic device 130, and/or any device (e.g., router, modem, etc.) that enables electronic device 130 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 135. Also, the inventory supply chain management device 130 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through a network adapter 136. As shown, network adapter 136 communicates with other modules of electronic device 130 over bus 133. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 130, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
Exemplary program product
In some possible embodiments, the aspects of the present invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps of the task processing method according to the various exemplary embodiments of the invention as described in the "exemplary method" section of this specification, when the program product is run on the terminal device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As shown in fig. 14, a program product 140 according to an embodiment of the present invention is depicted, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
It should be noted that while several modules or sub-modules of the system are mentioned in the detailed description above, such partitioning is merely exemplary and not mandatory. Indeed, the features and functions of two or more modules described above may be embodied in one module in accordance with embodiments of the present invention. Conversely, the features and functions of one module described above may be further divided into a plurality of modules to be embodied.
Furthermore, while the operations of the various modules of the inventive system are depicted in a particular order in the drawings, this is not required to either imply that the operations must be performed in that particular order or that all of the illustrated operations be performed to achieve desirable results. Additionally or alternatively, certain operations may be omitted, multiple operations combined into one operation execution, and/or one operation decomposed into multiple operation executions.
While the spirit and principles of the present invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments nor does it imply that features of the various aspects are not useful in combination, nor are they useful in any combination, such as for convenience of description. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (28)

1. A method of task processing, comprising:
acquiring at least one initial source task;
generating corresponding dependency relationship information for each initial source task in the at least one initial source task;
Determining a plurality of initial source tasks with upstream and downstream task relationships according to the dependency relationship information;
taking the initial source task at the most upstream of the plurality of initial source tasks and the initial source task which does not have an upstream-downstream task relation with any other initial source as final source tasks to obtain at least one final source task;
and freezing the at least one final source task and all corresponding downstream tasks to stop the operation of the at least one final source task and all corresponding downstream tasks.
2. The method of claim 1, further comprising, prior to freezing each of the at least one end source task:
And merging the common downstream tasks when determining that different upstream tasks of the at least one final source correspond to the common downstream tasks according to the dependency relationship information.
3. The method of claim 1, wherein generating the corresponding dependency information for each of the initial source tasks comprises:
Adopting a DFS/BFS algorithm to determine a DAG task instance graph corresponding to each initial source task;
And adding a downstream task which does not reach the execution time of the initial source task on the basis of the DAG task instance graph corresponding to each initial source task to obtain the dependency relationship information.
4. The method as recited in claim 1, further comprising:
And responding to a task unfreezing instruction, starting from the at least one final source task, unfreezing according to a mode of executing the unfreezing process of the downstream task when the unfreezing process of the upstream task is completed completely from the upstream task to the downstream task, so as to restore the scheduled operation of the unfrozen task.
5. The method according to claim 4, wherein the task to be thawed for performing the unfreezing process is thawed by:
If the task to be unfrozen has generated a task instance, re-operating the task instance of the task to be unfrozen, and determining that the task to be unfrozen finishes unfreezing treatment after the re-operation is successful; or alternatively
And if the task to be unfrozen does not generate a task instance, directly removing freezing treatment of the task to be unfrozen.
6. The method of claim 5, further comprising, after re-running the task instance of the task to be defrosted:
carrying out thawing failure prompt when the heavy operation failure is determined;
and when the forced defrosting instruction is received, the freezing processing of the task to be defrosted indicated by the defrosting instruction is released.
7. The method according to any one of claims 1 to 4, further comprising:
Identifying a freezing pool state for freezing tasks as being generated, and adding the at least one final source task and all corresponding downstream tasks into the freezing pool;
And when the freezing treatment of all the tasks in the freezing pool is completed, marking the state of the freezing pool as frozen.
8. The method of claim 7, wherein the step of determining the position of the probe is performed,
When the task in the freezing pool is unfrozen, marking the state of the freezing pool as unfreezing;
And when all tasks in the freezing pool are unfrozen, marking the state of the freezing pool as unfrozen.
9. The method as recited in claim 8, further comprising:
Determining a task to be defrosted for performing defrosting currently in response to the defrosting suspension instruction;
and suspending the task to be defrosted, and identifying the state of the freezing pool as suspended.
10. The method as recited in claim 9, further comprising:
in response to a resume thawing instruction, determining at least one task to be thawed that is not thawed and is furthest upstream in the paused freezing tank;
And starting from the at least one task to be unfrozen, unfreezing in a mode of executing the unfreezing process of the downstream task when the unfreezing process of the downstream task is completed by the upstream task.
11. The method of claim 9, wherein after suspending thawing the task to be thawed, further comprising:
responding to a task deleting instruction, and determining a source task indicated by the task deleting instruction;
When the indicated source task and all corresponding downstream tasks are determined to be independent of other final source tasks except the source task, thawing and deleting the indicated final source task and all corresponding downstream tasks;
Otherwise, reserving tasks which depend on other final source tasks, and thawing and deleting other tasks except the reserved tasks in the indicated source tasks and all downstream tasks.
12. The method of claim 9, wherein after suspending thawing the task to be thawed, further comprising:
Responding to a task adding instruction, and determining a source task indicated by the task adding instruction;
And determining that the same task does not exist in the indicated source task and all downstream tasks thereof and the at least one final source task and all downstream tasks corresponding to the source task, and freezing the indicated source task and all downstream tasks thereof so as to stop the operation of the indicated source task and all downstream tasks thereof.
13. The method of claim 12, wherein determining that the indicated source task and all downstream tasks thereof are the same as the at least one final source task and all downstream tasks thereof, further comprises:
if the at least one final source task and all the corresponding downstream tasks exist the same tasks as the indicated source task, freezing the same tasks and all the corresponding downstream tasks so as to stop the operation of the same tasks and all the downstream tasks;
And if the at least one final source task and all the corresponding downstream tasks exist in the same tasks as the indicated downstream tasks of the source task, freezing the same tasks and all the corresponding downstream tasks to stop the operation of the same tasks and all the downstream tasks, taking the indicated source task as a new final source task, and freezing the new final source task and all the corresponding downstream tasks to stop the operation of the new final source task and all the corresponding downstream tasks.
14. A task processing device, the device comprising:
the source task acquisition module acquires at least one initial source task;
The dependency relation generation module is used for generating corresponding dependency relation information for each initial source task in the at least one initial source task;
The source task merging module is used for determining a plurality of initial source tasks with upstream and downstream task relationships according to the dependency relationship information; taking the initial source task at the most upstream of the plurality of initial source tasks and the initial source task which does not have an upstream-downstream task relation with any other initial source as final source tasks to obtain at least one final source task;
and the freezing processing module is used for freezing the at least one final source task and all corresponding downstream tasks so as to stop the operation of the at least one final source task and all corresponding downstream tasks.
15. The apparatus of claim 14, wherein the freeze-processing module, prior to freezing the at least one end-source task, is further configured to:
And merging the common downstream tasks when determining that different upstream tasks of the at least one final source correspond to the common downstream tasks according to the dependency relationship information.
16. The apparatus of claim 14, wherein the dependency generation module generates corresponding dependency information for each of the initial source tasks, comprising:
Adopting a DFS/BFS algorithm to determine a DAG task instance graph corresponding to each initial source task;
And adding a downstream task which does not reach the execution time of the initial source task on the basis of the DAG task instance graph corresponding to each initial source task to obtain the dependency relationship information.
17. The apparatus as recited in claim 14, further comprising:
And the task defrosting module is used for starting from the at least one final source task in response to the task defrosting instruction, defrosting according to a mode of executing the unfreezing process of the downstream task when the upstream task is completely unfreezed from the upstream task to the downstream task, so as to restore the scheduled operation of the defrosted task.
18. The apparatus of claim 17, wherein the task thawing module thaws the task to be thawed that performs the unfreezing process by:
If the task to be unfrozen has generated a task instance, re-operating the task instance of the task to be unfrozen, and determining that the task to be unfrozen finishes unfreezing treatment after the re-operation is successful; or alternatively
And if the task to be unfrozen does not generate a task instance, directly removing freezing treatment of the task to be unfrozen.
19. The apparatus of claim 18, wherein after the task thawing module re-runs the task instance of the task to be thawed, further comprising:
carrying out thawing failure prompt when the heavy operation failure is determined;
and when the forced defrosting instruction is received, the freezing processing of the task to be defrosted indicated by the defrosting instruction is released.
20. The apparatus according to any one of claims 14 to 17, further comprising:
The freezing pool processing module is used for identifying the freezing pool state for freezing the task as being generated and adding the at least one final source task and all corresponding downstream tasks into the freezing pool; and when the freezing treatment of all the tasks in the freezing pool is completed, marking the state of the freezing pool as frozen.
21. The apparatus of claim 20, wherein the freeze pool processing module is further configured to:
When the task in the freezing pool is unfrozen, marking the state of the freezing pool as unfreezing;
And when all tasks in the freezing pool are unfrozen, marking the state of the freezing pool as unfrozen.
22. The apparatus as recited in claim 21, further comprising:
The pause defrosting module is used for responding to the pause defrosting instruction and determining a task to be defrosted for performing defrosting currently; and suspending the task to be defrosted, and identifying the state of the freezing pool as suspended.
23. The apparatus as recited in claim 22, further comprising:
The thawing recovery module is used for responding to a thawing recovery instruction and determining at least one task to be thawed, which is not thawed and is at the most upstream, in the paused freezing pool; and starting from the at least one task to be unfrozen, unfreezing in a mode of executing the unfreezing process of the downstream task when the unfreezing process of the downstream task is completed by the upstream task.
24. The apparatus of claim 22, wherein the pause thawing module is further configured to, after pausing thawing the task to be thawed:
responding to a task deleting instruction, and determining a source task indicated by the task deleting instruction;
When the indicated source task and all corresponding downstream tasks are determined to be independent of other final source tasks except the source task, thawing and deleting the indicated final source task and all corresponding downstream tasks;
Otherwise, reserving tasks which depend on other final source tasks, and thawing and deleting other tasks except the reserved tasks in the indicated source tasks and all downstream tasks.
25. The apparatus of claim 22, wherein the pause thawing module is further configured to, after pausing thawing the task to be thawed:
Responding to a task adding instruction, and determining a source task indicated by the task adding instruction;
And determining that the same task does not exist in the indicated source task and all downstream tasks thereof and the at least one final source task and all downstream tasks corresponding to the source task, and freezing the indicated source task and all downstream tasks thereof so as to stop the operation of the indicated source task and all downstream tasks thereof.
26. The apparatus of claim 25, wherein the pause thawing module determines that the indicated source task and all downstream tasks are the same task as the at least one final source task and all downstream tasks, and further configured to:
if the at least one final source task and all the corresponding downstream tasks exist the same tasks as the indicated source task, freezing the same tasks and all the corresponding downstream tasks so as to stop the operation of the same tasks and all the downstream tasks;
And if the at least one final source task and all the corresponding downstream tasks exist in the same tasks as the indicated downstream tasks of the source task, freezing the same tasks and all the corresponding downstream tasks to stop the operation of the same tasks and all the downstream tasks, taking the indicated source task as a new final source task, and freezing the new final source task and all the corresponding downstream tasks to stop the operation of the new final source task and all the corresponding downstream tasks.
27. An electronic device comprising one or more processors and memory for storing instructions executable by the processors;
wherein the processor is configured to execute the instructions to implement the task processing method of any one of claims 1 to 13.
28. A storage medium having stored therein a computer program which, when executed by a processor, implements the task processing method according to any one of claims 1 to 13.
CN202110101553.6A 2021-01-26 2021-01-26 Task processing method and device, electronic equipment and storage medium Active CN112764907B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110101553.6A CN112764907B (en) 2021-01-26 2021-01-26 Task processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110101553.6A CN112764907B (en) 2021-01-26 2021-01-26 Task processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112764907A CN112764907A (en) 2021-05-07
CN112764907B true CN112764907B (en) 2024-05-10

Family

ID=75707400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110101553.6A Active CN112764907B (en) 2021-01-26 2021-01-26 Task processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112764907B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168275B (en) * 2021-10-28 2022-10-18 厦门国际银行股份有限公司 Task scheduling method, system, terminal device and storage medium
CN117076095B (en) * 2023-10-16 2024-02-09 华芯巨数(杭州)微电子有限公司 Task scheduling method, system, electronic equipment and storage medium based on DAG

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8578389B1 (en) * 2004-05-04 2013-11-05 Oracle America, Inc. Method and system for merging directed acyclic graphs representing data flow codes
CN103761111A (en) * 2014-02-19 2014-04-30 中国科学院软件研究所 Method and system for constructing data-intensive workflow engine based on BPEL language
CN108388474A (en) * 2018-02-06 2018-08-10 北京易沃特科技有限公司 Intelligent distributed management of computing system and method based on DAG
CN108984284A (en) * 2018-06-26 2018-12-11 杭州比智科技有限公司 DAG method for scheduling task and device based on off-line calculation platform
CN110019144A (en) * 2018-06-19 2019-07-16 杭州数澜科技有限公司 A kind of method and system of big data platform data O&M
CN110347708A (en) * 2019-06-28 2019-10-18 深圳市元征科技股份有限公司 A kind of data processing method and relevant device
CN110737542A (en) * 2018-07-19 2020-01-31 慧与发展有限责任合伙企业 Freezing and unfreezing upstream and downstream rolls
CN111061551A (en) * 2019-12-06 2020-04-24 深圳前海微众银行股份有限公司 Node merging and scheduling method, device, equipment and storage medium
CN112052077A (en) * 2019-06-06 2020-12-08 北京字节跳动网络技术有限公司 Method, device, equipment and medium for software task management
CN112100019A (en) * 2019-09-12 2020-12-18 无锡江南计算技术研究所 Multi-source fault collaborative analysis positioning method for large-scale system
CN112148455A (en) * 2020-09-29 2020-12-29 星环信息科技(上海)有限公司 Task processing method, device and medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11182207B2 (en) * 2019-06-24 2021-11-23 Nvidia Corporation Pre-fetching task descriptors of dependent tasks

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8578389B1 (en) * 2004-05-04 2013-11-05 Oracle America, Inc. Method and system for merging directed acyclic graphs representing data flow codes
CN103761111A (en) * 2014-02-19 2014-04-30 中国科学院软件研究所 Method and system for constructing data-intensive workflow engine based on BPEL language
CN108388474A (en) * 2018-02-06 2018-08-10 北京易沃特科技有限公司 Intelligent distributed management of computing system and method based on DAG
CN110019144A (en) * 2018-06-19 2019-07-16 杭州数澜科技有限公司 A kind of method and system of big data platform data O&M
CN108984284A (en) * 2018-06-26 2018-12-11 杭州比智科技有限公司 DAG method for scheduling task and device based on off-line calculation platform
CN110737542A (en) * 2018-07-19 2020-01-31 慧与发展有限责任合伙企业 Freezing and unfreezing upstream and downstream rolls
CN112052077A (en) * 2019-06-06 2020-12-08 北京字节跳动网络技术有限公司 Method, device, equipment and medium for software task management
CN110347708A (en) * 2019-06-28 2019-10-18 深圳市元征科技股份有限公司 A kind of data processing method and relevant device
CN112100019A (en) * 2019-09-12 2020-12-18 无锡江南计算技术研究所 Multi-source fault collaborative analysis positioning method for large-scale system
CN111061551A (en) * 2019-12-06 2020-04-24 深圳前海微众银行股份有限公司 Node merging and scheduling method, device, equipment and storage medium
CN112148455A (en) * 2020-09-29 2020-12-29 星环信息科技(上海)有限公司 Task processing method, device and medium

Also Published As

Publication number Publication date
CN112764907A (en) 2021-05-07

Similar Documents

Publication Publication Date Title
US10261869B2 (en) Transaction processing using torn write detection
US7386752B1 (en) Using asset dependencies to identify the recovery set and optionally automate and/or optimize the recovery
CN112764907B (en) Task processing method and device, electronic equipment and storage medium
EP2904501B1 (en) Creating validated database snapshots for provisioning virtual databases
US8495635B2 (en) Mechanism to enable and ensure failover integrity and high availability of batch processing
US20110246823A1 (en) Task-oriented node-centric checkpointing (toncc)
US20120260053A1 (en) Cascade ordering
US9652491B2 (en) Out-of-order execution of strictly-ordered transactional workloads
US20130246358A1 (en) Online verification of a standby database in log shipping physical replication environments
CN109086425B (en) Data processing method and device for database
US20170235641A1 (en) Runtime file system consistency checking during backup operations
Li et al. Tachyon: Memory throughput i/o for cluster computing frameworks
CN113312114B (en) On-orbit reconstruction method, device, equipment and storage medium of satellite-borne software
US8032618B2 (en) Asynchronous update of virtualized applications
WO2016127557A1 (en) Method for re-establishing standby database, and apparatus thereof
US9372855B1 (en) Transactional control of RDBMS database definition language operations
WO2020040958A1 (en) Providing consistent database recovery after database failure for distributed databases with non-durable storage leveraging background synchronization point
US8856070B2 (en) Consistent replication of transactional updates
US12093139B2 (en) Rolling back a database transaction
JP6327028B2 (en) Object storage system, control method thereof, and control program thereof
CN108733704B (en) Multi-database data processing method and device, storage medium and electronic equipment
CN113672277B (en) Code synchronization method, system, computer device and storage medium
Borisov et al. Warding off the dangers of data corruption with Amulet
Suárez-Otero González et al. An integrated approach for column-oriented database application evolution using conceptual models
CN116541089A (en) Real-time editing of configuration management execution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant