CN110647402A - Processing method for multiple predecessors and multiple successors in Oozie working process - Google Patents
Processing method for multiple predecessors and multiple successors in Oozie working process Download PDFInfo
- Publication number
- CN110647402A CN110647402A CN201910940924.2A CN201910940924A CN110647402A CN 110647402 A CN110647402 A CN 110647402A CN 201910940924 A CN201910940924 A CN 201910940924A CN 110647402 A CN110647402 A CN 110647402A
- Authority
- CN
- China
- Prior art keywords
- task
- node
- processing
- oozie
- nodes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/508—Monitor
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention particularly relates to a processing method of multiple predecessors and multiple successors in an Oozie workflow. According to the processing method of the multi-predecessor multi-successor nodes in the Oozie working process, a multi-successor task processing node is respectively arranged behind each main task node, and the sub-task nodes of each main task node are arranged behind the multi-successor task processing nodes; and a multi-precursor task processing node is arranged before the end node, and each sub-task node to be combined is connected to the same multi-precursor task processing node. The processing method for the multi-predecessor multi-successor nodes in the Oozie working process can not only ensure that the multi-predecessor task nodes and the multi-successor task nodes are successfully executed, but also be compatible with the original functions of the Oozie, ensure that the original Oozie working process is normally executed, and solve the problem that the multi-predecessor task nodes and the multi-successor task nodes in the working process established by the Oozie cannot be successfully executed.
Description
Technical Field
The invention relates to the technical field of big data processing, in particular to a processing method of multiple predecessors and multiple successors in an Oozie workflow.
Background
Oozie is a workflow engine-based service component designed specifically for executing Hadoop Map/Reduce tasks or Pig jobs by way of flow orchestration. Oozie achieves flow execution of tasks by building numerous executable tasks (Map/Reduce, Pig, etc.) into workflows in a DAG fashion.
The Oozie workflow is defined by the hPDL language (an XML flow definition language similar to JBOSS JBPM). When Oozie executes a task, submitting the task to a remote system (such as a Hadoop cluster) for execution; after the task is executed, acquiring the task execution condition through a callback (callback) of a remote system; oozie will then proceed to execute the next flow task.
There are two types of nodes in the workflow of Oozie: one is a control node that controls the flow and one is an action node that performs the task.
The control node defines the start and the end of the workflow and can provide functions of branch judgment, Fork/Join and the like.
The action node triggers the computation/processing task execution, Oozie natively provides a multitude of executable task types, including SSH, HTTP, Spark, Hive2, etc., in addition to the Map/Reduce, Pig mentioned above, while Oozie provides a standard extensible interface to support the addition of other task types.
Although Oozie provides convenience for large data flow processing, the design strategy of DAG (Directed Acyclic Graph) provided by Oozie is not applicable in some scenarios.
Among the control nodes of Oozie, the Fork/Join node is a widely used pair of control nodes that can be used for parallel execution of tasks. Oozie imposes some limitations in designing Fork/join nodes, such as Fork/join must appear in pairs, which is not only used in verification of submitted workflow. Fork/join cannot be used across, so the scenes that it can provide are relatively limited.
When the Fork/join node has a plurality of predecessor and successor node tasks, one predecessor task node or successor task node is executed (success or killd), the Fork/join node is triggered, and an end node (end) is reached; then the other unexecuted predecessor or successor nodes) will all be killed (killed).
The invention provides a processing method of multiple predecessor and multiple successor nodes in an Oozie working process, aiming at solving the problem of multiple predecessor and multiple successor of task nodes and realizing the free construction of the DAG process of the Oozie.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides a simple and efficient method for processing multiple predecessors and multiple successors in the Oozie working process.
The invention is realized by the following technical scheme:
a processing method of multiple predecessors and multiple successors in an Oozie workflow is characterized in that: the method comprises the following steps:
firstly, respectively arranging a multi-successor task processing node behind each main task node, wherein the sub-task nodes of each main task node are arranged behind the multi-successor task processing nodes; a multi-precursor task processing node is arranged before an end node, and each sub-task node to be combined is connected to the same multi-precursor task processing node;
secondly, after the main task node is finished, triggering and starting all sub-task nodes of the main task node through the multiple successor task processing nodes, so that all the sub-task nodes are ensured to be started and executed;
and thirdly, after all the sub-task nodes connected to the same multi-precursor task processing node are executed, the multi-precursor task processing node triggers an end node, so that the workflow is finished after all the sub-task nodes are executed.
In the second step, the multi-successor task processing node adds all accessed sub-task nodes into the processing queue, and when the DAG of Oozie detects that the currently running working node is a sub-task node, the sub-task node in the processing queue is automatically started, and the started sub-task node is deleted from the processing queue; and when all the sub-task nodes in the processing queue are executed, the task of the multi-successor task processing node is completed.
In the third step, a counter is arranged in the multi-precursor task processing node and used for counting the number of the sub-task nodes which are not executed; when the count of the counter is 0, the sub-task nodes are all executed, the multi-precursor task processing node triggers the end node, and the Oozie workflow is ended.
And after the sub-task nodes are executed, deleting an access path of the sub-task nodes connected to the multi-precursor task processing node, wherein the multi-precursor task processing node cannot acquire the information of the sub-task nodes, and the counter is decreased by 1.
The multi-predecessor task processing node and the multi-successor task processing node are both set in a lightweight code intrusion mode, and can be compatible with original functions of Oozie, and normal execution of an original Oozie working flow is guaranteed.
Defining MultiNextNodef and MultiPrevnodeff in the Oozie-core project of Oozie, wherein the MultiNextNodef is a multi-successor task processing node, and the MultiPrevnodeff is a multi-predecessor task processing node.
Defining actuators as multiNextNodeDef and an internal class SingalXCommand, wherein the SingalXCommand is used for adding a newly started subtask node into a processing queue, monitoring the execution condition of the subtask node in real time and deleting a completed subtask node from the processing queue.
Defining an executor, namely, a MultiPrevActionExecutor in the MultiPrevNodeDef, and obtaining the number N of unexecuted tail end task nodes and assigning the number N to a counter.
The invention has the beneficial effects that: the processing method for the multi-predecessor multi-successor nodes in the Oozie working process can not only ensure that the multi-predecessor task nodes and the multi-successor task nodes are successfully executed, but also be compatible with the original functions of the Oozie, ensure that the original Oozie working process is normally executed, and solve the problem that the multi-predecessor task nodes and the multi-successor task nodes in the working process established by the Oozie cannot be successfully executed.
Drawings
FIG. 1 is a schematic flow chart of a conventional Oozie work process.
FIG. 2 is a flow chart of a processing method of multiple predecessor and multiple successor nodes in an Oozie workflow of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more apparent, the present invention is described in detail below with reference to the embodiments. It should be noted that the specific embodiments described herein are only for explaining the present invention and are not used to limit the present invention.
The processing method of the multi-precursor multi-successor node in the Oozie workflow comprises the following steps:
firstly, respectively arranging a multi-successor task processing node behind each main task node, wherein the sub-task nodes of each main task node are arranged behind the multi-successor task processing nodes; a multi-precursor task processing node is arranged before an end node, and each sub-task node to be combined is connected to the same multi-precursor task processing node;
secondly, after the main task node is finished, triggering and starting all sub-task nodes of the main task node through the multiple successor task processing nodes, so that all the sub-task nodes are ensured to be started and executed;
and thirdly, after all the sub-task nodes connected to the same multi-precursor task processing node are executed, the multi-precursor task processing node triggers an end node, so that the workflow is finished after all the sub-task nodes are executed.
In the second step, the multi-successor task processing node adds all accessed sub-task nodes into the processing queue, and when the DAG of Oozie detects that the currently running working node is a sub-task node, the sub-task node in the processing queue is automatically started, and the started sub-task node is deleted from the processing queue; and when all the sub-task nodes in the processing queue are executed, the task of the multi-successor task processing node is completed.
In the third step, a counter is arranged in the multi-precursor task processing node and used for counting the number of the sub-task nodes which are not executed; when the count of the counter is 0, the sub-task nodes are all executed, the multi-precursor task processing node triggers the end node, and the Oozie workflow is ended.
And after the sub-task nodes are executed, deleting an access path of the sub-task nodes connected to the multi-precursor task processing node, wherein the multi-precursor task processing node cannot acquire the information of the sub-task nodes, and the counter is decreased by 1.
The multi-predecessor task processing node and the multi-successor task processing node are both set in a lightweight code intrusion mode, and can be compatible with original functions of Oozie, and normal execution of an original Oozie working flow is guaranteed.
Example 1
This example was performed using oozie 4.3.1, adding element types to the xsd of the oozie-client, the child item of oozie. Since there are numerous xsd versions of oozie, oozie-workflow-0.4.xsd was chosen here.
Defining MultiNextNodef and MultiPrevnodeff in the Oozie-core project of Oozie, wherein the MultiNextNodef is a multi-successor task processing node, and the MultiPrevnodeff is a multi-predecessor task processing node.
(1) Code 1 defines element type MULTIPREV and MULTINEXT defines element type MIX _ IN
(2) Code 2 declares the MULTIP _ PREV and MULTIP _ NEXT elements
<xs:element name="mix-in"type="workflow:MIX_IN"/>
<xs:element name="mix-out"type="workflow:MIX_OUT"/>
(3) Finally, mix-in is added to the workflow-app, as shown in code 3.
Code 3 adds the MULTIPI-PREV and MULTIPI-NEXT elements to the workflow-app
Defining actuators as multiNextNodeDef and an internal class SingalXCommand, wherein the SingalXCommand is used for adding a newly started subtask node into a processing queue, monitoring the execution condition of the subtask node in real time and deleting a completed subtask node from the processing queue.
Code 4 MultiNextNodef processing logic
Code 5 SingalXCommand increase judgment logic
Defining an executor, namely, a MultiPrevActionExecutor in the MultiPrevNodeDef, and obtaining the number N of unexecuted tail end task nodes and assigning the number N to a counter.
Processing logic for code 6 MultiPreNodeF
Example 2
In this embodiment, taking 2 main Task nodes (Task a1 and Task a2), each of which has two sub Task nodes (Task B1 to Task B4) as an example, a comparison verification is performed between the conventional Oozie workflow and the workflow of the processing method for multiple predecessors and multiple successors in the Oozie workflow.
FIG. 1 is a schematic flow diagram of a conventional Oozie workflow. In the figure, Fork/Join appears in pairs, but Task A1 is divided into two tasks, wherein Task B1 is merged into Join J1, and Task B2 is merged into Join J2; the nodes are penalized to finish when the execution of any node in the Task B1-Task B4 is finished, so that the success of the execution of all the Task B1-Task B4 cannot be guaranteed in the process.
Fig. 2 is a schematic workflow diagram of a processing method of multiple predecessor and multiple successor nodes in the Oozie workflow. In the figure, a MULTI-NEXT node is used to replace a Fork node, and a MULTI-PREV node is used to replace a Join node. The MULTI-successor Task processing node (MULTI-NEXT node) can ensure that the Task nodes Task B1-Task B4 are all started, and the Task nodes Task B1-Task B4 are all executed to complete the MULTI-predecessor Task processing node (MULTI-PREV node) to trigger the end node, thereby ensuring that the Task nodes Task B1-Task B4 are all executed successfully.
The above-described embodiment is only one specific embodiment of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.
Claims (8)
1. A processing method of multiple predecessors and multiple successors in an Oozie workflow is characterized by comprising the following steps:
firstly, respectively arranging a multi-successor task processing node behind each main task node, wherein the sub-task nodes of each main task node are arranged behind the multi-successor task processing nodes; a multi-precursor task processing node is arranged before an end node, and each sub-task node to be combined is connected to the same multi-precursor task processing node;
secondly, after the main task node is finished, triggering and starting all sub-task nodes of the main task node through the multiple successor task processing nodes, so that all the sub-task nodes are ensured to be started and executed;
and thirdly, after all the sub-task nodes connected to the same multi-precursor task processing node are executed, the multi-precursor task processing node triggers an end node, so that the workflow is finished after all the sub-task nodes are executed.
2. The method for processing multiple predecessors and multiple successors in an Oozie workflow of claim 1, wherein: in the second step, the multi-successor task processing node adds all accessed sub-task nodes into the processing queue, and when the DAG of Oozie detects that the currently running working node is a sub-task node, the sub-task node in the processing queue is automatically started, and the started sub-task node is deleted from the processing queue; and when all the sub-task nodes in the processing queue are executed, the task of the multi-successor task processing node is completed.
3. The method for processing multiple predecessors and multiple successors in an Oozie workflow of claim 1, wherein: in the third step, a counter is arranged in the multi-precursor task processing node and used for counting the number of the sub-task nodes which are not executed; when the count of the counter is 0, the sub-task nodes are all executed, the multi-precursor task processing node triggers the end node, and the Oozie workflow is ended.
4. The method for processing multiple predecessors and multiple successors in an Oozie workflow of claim 3, wherein: and after the sub-task nodes are executed, deleting an access path of the sub-task nodes connected to the multi-precursor task processing node, wherein the multi-precursor task processing node cannot acquire the information of the sub-task nodes, and the counter is decreased by 1.
5. The method for processing multiple predecessors and multiple successors in an Oozie workflow according to any one of claims 1-4, wherein: the multi-predecessor task processing node and the multi-successor task processing node are both set in a lightweight code intrusion mode, and can be compatible with original functions of Oozie, and normal execution of an original Oozie working flow is guaranteed.
6. The method for processing multiple predecessors and multiple successors in an Oozie workflow of claim 5, wherein: defining MultiNextNodef and MultiPrevnodeff in the Oozie-core project of Oozie, wherein the MultiNextNodef is a multi-successor task processing node, and the MultiPrevnodeff is a multi-predecessor task processing node.
7. The method for processing multiple predecessors and multiple successors in an Oozie workflow of claim 6, wherein: defining actuators as multiNextNodeDef and an internal class SingalXCommand, wherein the SingalXCommand is used for adding a newly started subtask node into a processing queue, monitoring the execution condition of the subtask node in real time and deleting a completed subtask node from the processing queue.
8. The method for processing multiple predecessors and multiple successors in an Oozie workflow of claim 6, wherein: defining an executor, namely, a MultiPrevActionExecutor in the MultiPrevNodeDef, and obtaining the number N of unexecuted tail end task nodes and assigning the number N to a counter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910940924.2A CN110647402A (en) | 2019-09-30 | 2019-09-30 | Processing method for multiple predecessors and multiple successors in Oozie working process |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910940924.2A CN110647402A (en) | 2019-09-30 | 2019-09-30 | Processing method for multiple predecessors and multiple successors in Oozie working process |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110647402A true CN110647402A (en) | 2020-01-03 |
Family
ID=69012035
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910940924.2A Pending CN110647402A (en) | 2019-09-30 | 2019-09-30 | Processing method for multiple predecessors and multiple successors in Oozie working process |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110647402A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100281241A1 (en) * | 2009-05-04 | 2010-11-04 | International Business Machines Corporation | Method and system for synchronizing inclusive decision branches |
CN101963908A (en) * | 2010-09-30 | 2011-02-02 | 山东中创软件工程股份有限公司 | Method and device for realizing dynamic merging of branches in unstructured process |
CN103426045A (en) * | 2012-05-22 | 2013-12-04 | 阿里巴巴集团控股有限公司 | Take method and device of process virtual machine |
-
2019
- 2019-09-30 CN CN201910940924.2A patent/CN110647402A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100281241A1 (en) * | 2009-05-04 | 2010-11-04 | International Business Machines Corporation | Method and system for synchronizing inclusive decision branches |
CN101963908A (en) * | 2010-09-30 | 2011-02-02 | 山东中创软件工程股份有限公司 | Method and device for realizing dynamic merging of branches in unstructured process |
CN103426045A (en) * | 2012-05-22 | 2013-12-04 | 阿里巴巴集团控股有限公司 | Take method and device of process virtual machine |
Non-Patent Citations (3)
Title |
---|
LIU YUAN: "Scheduling of fork-join tasks on multi-core processors to avoid communication conflict", 《TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE》 * |
刘士冬等: "基于Domino平台的工作流设计与实现", 《计算机工程》 * |
刘林林: "面向审批业务的Web自定义工作流模型研究与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6223569B2 (en) | Computer apparatus, method and apparatus for scheduling business flows | |
US8572618B2 (en) | Event driven change injection and dynamic extensions to a business process execution language process | |
US8863131B2 (en) | Transaction load reduction for process completion | |
US7467383B2 (en) | System for controlling task execution using a graphical representation of task dependency | |
CN113535367B (en) | Task scheduling method and related device | |
US20110307905A1 (en) | Indicating parallel operations with user-visible events | |
CN110806923A (en) | Parallel processing method and device for block chain tasks, electronic equipment and medium | |
JP2016541056A5 (en) | ||
WO2011137672A1 (en) | Method and device for task execution based on database | |
WO2019047441A1 (en) | Communication optimization method and system | |
US20150205633A1 (en) | Task management in single-threaded environments | |
CN112363913B (en) | Parallel test task scheduling optimizing method, device and computing equipment | |
WO2009002722A2 (en) | Concurrent exception handling | |
CN113485840A (en) | Multi-task parallel processing device and method based on Go language | |
US20110191775A1 (en) | Array-based thread countdown | |
US8473954B2 (en) | Executing operations via asynchronous programming model | |
CN113010276A (en) | Task scheduling method and device, terminal equipment and storage medium | |
US8984259B2 (en) | Method, system, and computer program product for optimizing runtime branch selection in a flow process | |
CN110647402A (en) | Processing method for multiple predecessors and multiple successors in Oozie working process | |
CN111290868B (en) | Task processing method, device and system and flow engine | |
CN110688227A (en) | Method for processing tail end task node in Oozie workflow | |
US20080163224A1 (en) | Modeling interrupts in a business process | |
CN112581080A (en) | Lightweight distributed workflow engine construction system | |
CN112162840A (en) | Coroutine processing and managing method based on interrupt reentrant mechanism | |
US8856792B2 (en) | Cancelable and faultable dataflow nodes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |