CN110647402A - Processing method for multiple predecessors and multiple successors in Oozie working process - Google Patents

Processing method for multiple predecessors and multiple successors in Oozie working process Download PDF

Info

Publication number
CN110647402A
CN110647402A CN201910940924.2A CN201910940924A CN110647402A CN 110647402 A CN110647402 A CN 110647402A CN 201910940924 A CN201910940924 A CN 201910940924A CN 110647402 A CN110647402 A CN 110647402A
Authority
CN
China
Prior art keywords
task
node
processing
oozie
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910940924.2A
Other languages
Chinese (zh)
Inventor
张旭
赵志宏
周庆勇
王建华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Co Ltd
Original Assignee
Inspur Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Co Ltd filed Critical Inspur Software Co Ltd
Priority to CN201910940924.2A priority Critical patent/CN110647402A/en
Publication of CN110647402A publication Critical patent/CN110647402A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention particularly relates to a processing method of multiple predecessors and multiple successors in an Oozie workflow. According to the processing method of the multi-predecessor multi-successor nodes in the Oozie working process, a multi-successor task processing node is respectively arranged behind each main task node, and the sub-task nodes of each main task node are arranged behind the multi-successor task processing nodes; and a multi-precursor task processing node is arranged before the end node, and each sub-task node to be combined is connected to the same multi-precursor task processing node. The processing method for the multi-predecessor multi-successor nodes in the Oozie working process can not only ensure that the multi-predecessor task nodes and the multi-successor task nodes are successfully executed, but also be compatible with the original functions of the Oozie, ensure that the original Oozie working process is normally executed, and solve the problem that the multi-predecessor task nodes and the multi-successor task nodes in the working process established by the Oozie cannot be successfully executed.

Description

Processing method for multiple predecessors and multiple successors in Oozie working process
Technical Field
The invention relates to the technical field of big data processing, in particular to a processing method of multiple predecessors and multiple successors in an Oozie workflow.
Background
Oozie is a workflow engine-based service component designed specifically for executing Hadoop Map/Reduce tasks or Pig jobs by way of flow orchestration. Oozie achieves flow execution of tasks by building numerous executable tasks (Map/Reduce, Pig, etc.) into workflows in a DAG fashion.
The Oozie workflow is defined by the hPDL language (an XML flow definition language similar to JBOSS JBPM). When Oozie executes a task, submitting the task to a remote system (such as a Hadoop cluster) for execution; after the task is executed, acquiring the task execution condition through a callback (callback) of a remote system; oozie will then proceed to execute the next flow task.
There are two types of nodes in the workflow of Oozie: one is a control node that controls the flow and one is an action node that performs the task.
The control node defines the start and the end of the workflow and can provide functions of branch judgment, Fork/Join and the like.
The action node triggers the computation/processing task execution, Oozie natively provides a multitude of executable task types, including SSH, HTTP, Spark, Hive2, etc., in addition to the Map/Reduce, Pig mentioned above, while Oozie provides a standard extensible interface to support the addition of other task types.
Although Oozie provides convenience for large data flow processing, the design strategy of DAG (Directed Acyclic Graph) provided by Oozie is not applicable in some scenarios.
Among the control nodes of Oozie, the Fork/Join node is a widely used pair of control nodes that can be used for parallel execution of tasks. Oozie imposes some limitations in designing Fork/join nodes, such as Fork/join must appear in pairs, which is not only used in verification of submitted workflow. Fork/join cannot be used across, so the scenes that it can provide are relatively limited.
When the Fork/join node has a plurality of predecessor and successor node tasks, one predecessor task node or successor task node is executed (success or killd), the Fork/join node is triggered, and an end node (end) is reached; then the other unexecuted predecessor or successor nodes) will all be killed (killed).
The invention provides a processing method of multiple predecessor and multiple successor nodes in an Oozie working process, aiming at solving the problem of multiple predecessor and multiple successor of task nodes and realizing the free construction of the DAG process of the Oozie.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides a simple and efficient method for processing multiple predecessors and multiple successors in the Oozie working process.
The invention is realized by the following technical scheme:
a processing method of multiple predecessors and multiple successors in an Oozie workflow is characterized in that: the method comprises the following steps:
firstly, respectively arranging a multi-successor task processing node behind each main task node, wherein the sub-task nodes of each main task node are arranged behind the multi-successor task processing nodes; a multi-precursor task processing node is arranged before an end node, and each sub-task node to be combined is connected to the same multi-precursor task processing node;
secondly, after the main task node is finished, triggering and starting all sub-task nodes of the main task node through the multiple successor task processing nodes, so that all the sub-task nodes are ensured to be started and executed;
and thirdly, after all the sub-task nodes connected to the same multi-precursor task processing node are executed, the multi-precursor task processing node triggers an end node, so that the workflow is finished after all the sub-task nodes are executed.
In the second step, the multi-successor task processing node adds all accessed sub-task nodes into the processing queue, and when the DAG of Oozie detects that the currently running working node is a sub-task node, the sub-task node in the processing queue is automatically started, and the started sub-task node is deleted from the processing queue; and when all the sub-task nodes in the processing queue are executed, the task of the multi-successor task processing node is completed.
In the third step, a counter is arranged in the multi-precursor task processing node and used for counting the number of the sub-task nodes which are not executed; when the count of the counter is 0, the sub-task nodes are all executed, the multi-precursor task processing node triggers the end node, and the Oozie workflow is ended.
And after the sub-task nodes are executed, deleting an access path of the sub-task nodes connected to the multi-precursor task processing node, wherein the multi-precursor task processing node cannot acquire the information of the sub-task nodes, and the counter is decreased by 1.
The multi-predecessor task processing node and the multi-successor task processing node are both set in a lightweight code intrusion mode, and can be compatible with original functions of Oozie, and normal execution of an original Oozie working flow is guaranteed.
Defining MultiNextNodef and MultiPrevnodeff in the Oozie-core project of Oozie, wherein the MultiNextNodef is a multi-successor task processing node, and the MultiPrevnodeff is a multi-predecessor task processing node.
Defining actuators as multiNextNodeDef and an internal class SingalXCommand, wherein the SingalXCommand is used for adding a newly started subtask node into a processing queue, monitoring the execution condition of the subtask node in real time and deleting a completed subtask node from the processing queue.
Defining an executor, namely, a MultiPrevActionExecutor in the MultiPrevNodeDef, and obtaining the number N of unexecuted tail end task nodes and assigning the number N to a counter.
The invention has the beneficial effects that: the processing method for the multi-predecessor multi-successor nodes in the Oozie working process can not only ensure that the multi-predecessor task nodes and the multi-successor task nodes are successfully executed, but also be compatible with the original functions of the Oozie, ensure that the original Oozie working process is normally executed, and solve the problem that the multi-predecessor task nodes and the multi-successor task nodes in the working process established by the Oozie cannot be successfully executed.
Drawings
FIG. 1 is a schematic flow chart of a conventional Oozie work process.
FIG. 2 is a flow chart of a processing method of multiple predecessor and multiple successor nodes in an Oozie workflow of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more apparent, the present invention is described in detail below with reference to the embodiments. It should be noted that the specific embodiments described herein are only for explaining the present invention and are not used to limit the present invention.
The processing method of the multi-precursor multi-successor node in the Oozie workflow comprises the following steps:
firstly, respectively arranging a multi-successor task processing node behind each main task node, wherein the sub-task nodes of each main task node are arranged behind the multi-successor task processing nodes; a multi-precursor task processing node is arranged before an end node, and each sub-task node to be combined is connected to the same multi-precursor task processing node;
secondly, after the main task node is finished, triggering and starting all sub-task nodes of the main task node through the multiple successor task processing nodes, so that all the sub-task nodes are ensured to be started and executed;
and thirdly, after all the sub-task nodes connected to the same multi-precursor task processing node are executed, the multi-precursor task processing node triggers an end node, so that the workflow is finished after all the sub-task nodes are executed.
In the second step, the multi-successor task processing node adds all accessed sub-task nodes into the processing queue, and when the DAG of Oozie detects that the currently running working node is a sub-task node, the sub-task node in the processing queue is automatically started, and the started sub-task node is deleted from the processing queue; and when all the sub-task nodes in the processing queue are executed, the task of the multi-successor task processing node is completed.
In the third step, a counter is arranged in the multi-precursor task processing node and used for counting the number of the sub-task nodes which are not executed; when the count of the counter is 0, the sub-task nodes are all executed, the multi-precursor task processing node triggers the end node, and the Oozie workflow is ended.
And after the sub-task nodes are executed, deleting an access path of the sub-task nodes connected to the multi-precursor task processing node, wherein the multi-precursor task processing node cannot acquire the information of the sub-task nodes, and the counter is decreased by 1.
The multi-predecessor task processing node and the multi-successor task processing node are both set in a lightweight code intrusion mode, and can be compatible with original functions of Oozie, and normal execution of an original Oozie working flow is guaranteed.
Example 1
This example was performed using oozie 4.3.1, adding element types to the xsd of the oozie-client, the child item of oozie. Since there are numerous xsd versions of oozie, oozie-workflow-0.4.xsd was chosen here.
Defining MultiNextNodef and MultiPrevnodeff in the Oozie-core project of Oozie, wherein the MultiNextNodef is a multi-successor task processing node, and the MultiPrevnodeff is a multi-predecessor task processing node.
(1) Code 1 defines element type MULTIPREV and MULTINEXT defines element type MIX _ IN
Figure BDA0002222874540000041
Figure BDA0002222874540000051
(2) Code 2 declares the MULTIP _ PREV and MULTIP _ NEXT elements
<xs:element name="mix-in"type="workflow:MIX_IN"/>
<xs:element name="mix-out"type="workflow:MIX_OUT"/>
(3) Finally, mix-in is added to the workflow-app, as shown in code 3.
Code 3 adds the MULTIPI-PREV and MULTIPI-NEXT elements to the workflow-app
Figure BDA0002222874540000052
Figure BDA0002222874540000061
Defining actuators as multiNextNodeDef and an internal class SingalXCommand, wherein the SingalXCommand is used for adding a newly started subtask node into a processing queue, monitoring the execution condition of the subtask node in real time and deleting a completed subtask node from the processing queue.
Code 4 MultiNextNodef processing logic
Figure BDA0002222874540000062
Code 5 SingalXCommand increase judgment logic
Figure BDA0002222874540000063
Figure BDA0002222874540000071
Defining an executor, namely, a MultiPrevActionExecutor in the MultiPrevNodeDef, and obtaining the number N of unexecuted tail end task nodes and assigning the number N to a counter.
Processing logic for code 6 MultiPreNodeF
Figure BDA0002222874540000072
Figure BDA0002222874540000081
Example 2
In this embodiment, taking 2 main Task nodes (Task a1 and Task a2), each of which has two sub Task nodes (Task B1 to Task B4) as an example, a comparison verification is performed between the conventional Oozie workflow and the workflow of the processing method for multiple predecessors and multiple successors in the Oozie workflow.
FIG. 1 is a schematic flow diagram of a conventional Oozie workflow. In the figure, Fork/Join appears in pairs, but Task A1 is divided into two tasks, wherein Task B1 is merged into Join J1, and Task B2 is merged into Join J2; the nodes are penalized to finish when the execution of any node in the Task B1-Task B4 is finished, so that the success of the execution of all the Task B1-Task B4 cannot be guaranteed in the process.
Fig. 2 is a schematic workflow diagram of a processing method of multiple predecessor and multiple successor nodes in the Oozie workflow. In the figure, a MULTI-NEXT node is used to replace a Fork node, and a MULTI-PREV node is used to replace a Join node. The MULTI-successor Task processing node (MULTI-NEXT node) can ensure that the Task nodes Task B1-Task B4 are all started, and the Task nodes Task B1-Task B4 are all executed to complete the MULTI-predecessor Task processing node (MULTI-PREV node) to trigger the end node, thereby ensuring that the Task nodes Task B1-Task B4 are all executed successfully.
The above-described embodiment is only one specific embodiment of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.

Claims (8)

1. A processing method of multiple predecessors and multiple successors in an Oozie workflow is characterized by comprising the following steps:
firstly, respectively arranging a multi-successor task processing node behind each main task node, wherein the sub-task nodes of each main task node are arranged behind the multi-successor task processing nodes; a multi-precursor task processing node is arranged before an end node, and each sub-task node to be combined is connected to the same multi-precursor task processing node;
secondly, after the main task node is finished, triggering and starting all sub-task nodes of the main task node through the multiple successor task processing nodes, so that all the sub-task nodes are ensured to be started and executed;
and thirdly, after all the sub-task nodes connected to the same multi-precursor task processing node are executed, the multi-precursor task processing node triggers an end node, so that the workflow is finished after all the sub-task nodes are executed.
2. The method for processing multiple predecessors and multiple successors in an Oozie workflow of claim 1, wherein: in the second step, the multi-successor task processing node adds all accessed sub-task nodes into the processing queue, and when the DAG of Oozie detects that the currently running working node is a sub-task node, the sub-task node in the processing queue is automatically started, and the started sub-task node is deleted from the processing queue; and when all the sub-task nodes in the processing queue are executed, the task of the multi-successor task processing node is completed.
3. The method for processing multiple predecessors and multiple successors in an Oozie workflow of claim 1, wherein: in the third step, a counter is arranged in the multi-precursor task processing node and used for counting the number of the sub-task nodes which are not executed; when the count of the counter is 0, the sub-task nodes are all executed, the multi-precursor task processing node triggers the end node, and the Oozie workflow is ended.
4. The method for processing multiple predecessors and multiple successors in an Oozie workflow of claim 3, wherein: and after the sub-task nodes are executed, deleting an access path of the sub-task nodes connected to the multi-precursor task processing node, wherein the multi-precursor task processing node cannot acquire the information of the sub-task nodes, and the counter is decreased by 1.
5. The method for processing multiple predecessors and multiple successors in an Oozie workflow according to any one of claims 1-4, wherein: the multi-predecessor task processing node and the multi-successor task processing node are both set in a lightweight code intrusion mode, and can be compatible with original functions of Oozie, and normal execution of an original Oozie working flow is guaranteed.
6. The method for processing multiple predecessors and multiple successors in an Oozie workflow of claim 5, wherein: defining MultiNextNodef and MultiPrevnodeff in the Oozie-core project of Oozie, wherein the MultiNextNodef is a multi-successor task processing node, and the MultiPrevnodeff is a multi-predecessor task processing node.
7. The method for processing multiple predecessors and multiple successors in an Oozie workflow of claim 6, wherein: defining actuators as multiNextNodeDef and an internal class SingalXCommand, wherein the SingalXCommand is used for adding a newly started subtask node into a processing queue, monitoring the execution condition of the subtask node in real time and deleting a completed subtask node from the processing queue.
8. The method for processing multiple predecessors and multiple successors in an Oozie workflow of claim 6, wherein: defining an executor, namely, a MultiPrevActionExecutor in the MultiPrevNodeDef, and obtaining the number N of unexecuted tail end task nodes and assigning the number N to a counter.
CN201910940924.2A 2019-09-30 2019-09-30 Processing method for multiple predecessors and multiple successors in Oozie working process Pending CN110647402A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910940924.2A CN110647402A (en) 2019-09-30 2019-09-30 Processing method for multiple predecessors and multiple successors in Oozie working process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910940924.2A CN110647402A (en) 2019-09-30 2019-09-30 Processing method for multiple predecessors and multiple successors in Oozie working process

Publications (1)

Publication Number Publication Date
CN110647402A true CN110647402A (en) 2020-01-03

Family

ID=69012035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910940924.2A Pending CN110647402A (en) 2019-09-30 2019-09-30 Processing method for multiple predecessors and multiple successors in Oozie working process

Country Status (1)

Country Link
CN (1) CN110647402A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100281241A1 (en) * 2009-05-04 2010-11-04 International Business Machines Corporation Method and system for synchronizing inclusive decision branches
CN101963908A (en) * 2010-09-30 2011-02-02 山东中创软件工程股份有限公司 Method and device for realizing dynamic merging of branches in unstructured process
CN103426045A (en) * 2012-05-22 2013-12-04 阿里巴巴集团控股有限公司 Take method and device of process virtual machine

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100281241A1 (en) * 2009-05-04 2010-11-04 International Business Machines Corporation Method and system for synchronizing inclusive decision branches
CN101963908A (en) * 2010-09-30 2011-02-02 山东中创软件工程股份有限公司 Method and device for realizing dynamic merging of branches in unstructured process
CN103426045A (en) * 2012-05-22 2013-12-04 阿里巴巴集团控股有限公司 Take method and device of process virtual machine

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LIU YUAN: "Scheduling of fork-join tasks on multi-core processors to avoid communication conflict", 《TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE》 *
刘士冬等: "基于Domino平台的工作流设计与实现", 《计算机工程》 *
刘林林: "面向审批业务的Web自定义工作流模型研究与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Similar Documents

Publication Publication Date Title
JP6223569B2 (en) Computer apparatus, method and apparatus for scheduling business flows
US8572618B2 (en) Event driven change injection and dynamic extensions to a business process execution language process
US8863131B2 (en) Transaction load reduction for process completion
US7467383B2 (en) System for controlling task execution using a graphical representation of task dependency
CN113535367B (en) Task scheduling method and related device
US20110307905A1 (en) Indicating parallel operations with user-visible events
CN110806923A (en) Parallel processing method and device for block chain tasks, electronic equipment and medium
JP2016541056A5 (en)
WO2011137672A1 (en) Method and device for task execution based on database
WO2019047441A1 (en) Communication optimization method and system
US20150205633A1 (en) Task management in single-threaded environments
CN112363913B (en) Parallel test task scheduling optimizing method, device and computing equipment
WO2009002722A2 (en) Concurrent exception handling
CN113485840A (en) Multi-task parallel processing device and method based on Go language
US20110191775A1 (en) Array-based thread countdown
US8473954B2 (en) Executing operations via asynchronous programming model
CN113010276A (en) Task scheduling method and device, terminal equipment and storage medium
US8984259B2 (en) Method, system, and computer program product for optimizing runtime branch selection in a flow process
CN110647402A (en) Processing method for multiple predecessors and multiple successors in Oozie working process
CN111290868B (en) Task processing method, device and system and flow engine
CN110688227A (en) Method for processing tail end task node in Oozie workflow
US20080163224A1 (en) Modeling interrupts in a business process
CN112581080A (en) Lightweight distributed workflow engine construction system
CN112162840A (en) Coroutine processing and managing method based on interrupt reentrant mechanism
US8856792B2 (en) Cancelable and faultable dataflow nodes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination