CN117634866A - Method, device, equipment and medium for processing data among nodes of workflow scheduling engine - Google Patents

Method, device, equipment and medium for processing data among nodes of workflow scheduling engine Download PDF

Info

Publication number
CN117634866A
CN117634866A CN202410105768.9A CN202410105768A CN117634866A CN 117634866 A CN117634866 A CN 117634866A CN 202410105768 A CN202410105768 A CN 202410105768A CN 117634866 A CN117634866 A CN 117634866A
Authority
CN
China
Prior art keywords
node
data
workflow
parameter
execution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410105768.9A
Other languages
Chinese (zh)
Other versions
CN117634866B (en
Inventor
王东辉
张为华
马帅超
高经伟
周奇
武泽平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202410105768.9A priority Critical patent/CN117634866B/en
Publication of CN117634866A publication Critical patent/CN117634866A/en
Application granted granted Critical
Publication of CN117634866B publication Critical patent/CN117634866B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The method comprises the steps of developing and designing a node data transmission information table for each computing node participating in computation in a workflow, dynamically storing all input and output data stream information related to the node set by a user, and storing the data stream information in a database along with the warehousing of the node tasks of the workflow. In addition, a pre-execution preparation stage is designed for each node execution, and the target node of the data flow can collect data required by the execution of the target node according to the node data transmission information table at the stage and apply the collected data to a subsequent execution step. The coupling degree of workflow and data flow functions in the system is reduced, the cross-node data transfer in the directed acyclic graph of the workflow engine can be efficiently realized, and the complexity of the system is reduced.

Description

Method, device, equipment and medium for processing data among nodes of workflow scheduling engine
Technical Field
The invention belongs to the technical field of data processing, and relates to a method, a device, equipment and a medium for processing data among nodes of a workflow scheduling engine.
Background
The optimal design flow engine is a design auxiliary system, and is commonly provided with a multidisciplinary optimal design flow construction execution engine, which can also be called a workflow scheduling engine, and the solution of the optimization problem is completed through each calculation node in the flow according to the user design sequence or parallel or one by one execution. After the calculation is completed, each node in the process needs to transmit the calculation result to the following node to solve the optimization problem, and each optimization task calculation node has a default input/output data list and can independently operate or receive external input to perform input coverage operation. In general, the node computation scheduling of the flow scheduling engine is performed sequentially, and data transfer between adjacent nodes can be performed, but similar to data transfer between any nodes in the flow, the conventional method is inefficient and complex.
Disclosure of Invention
Aiming at the problems in the traditional method, the invention provides a data processing method among nodes of a workflow scheduling engine, a data processing device among nodes of the workflow scheduling engine, a computer device and a computer readable storage medium, and can efficiently realize the cross-node data transmission in a directed acyclic graph of the workflow engine.
In order to achieve the above object, the embodiment of the present invention adopts the following technical scheme:
in one aspect, a method for processing data between nodes of a workflow scheduling engine is provided, including the steps of:
when the multidisciplinary optimization design workflow starts to be executed, scheduling computing nodes to be executed in the directed acyclic graph corresponding to the workflow layer by layer according to the design sequence of the workflow; adding the computing node after the execution into a node completion buffer queue;
before any computing node is executed, starting an execution preparation stage of executing the computing node; the execution preparation stage is used for actively collecting all data stream information required by the computing node when the computing node starts to execute according to the node data transmission information table of the computing node;
after the execution preparation stage is completed, starting to execute the computing node and storing the data to be output into an output data list in a node data transmission information table;
and repeating the scheduling execution of each computing node until all computing nodes of the workflow are executed.
In one embodiment, the method further comprises the steps of:
constructing a workflow in a front-end interface of a workflow scheduling engine, and setting data transfer relation and data transfer direction among nodes in the workflow one by one according to a multidisciplinary optimization design task execution file;
and saving the node information of the workflow after the setting to a database of a workflow scheduling engine.
In one embodiment, a process for starting an execution preparation phase of a computing node includes:
the node data transmission information table of the computing node is taken out, and a node input data list is analyzed;
circularly traversing the node input data list, and obtaining a data item source node of the computing node from the node completion cache queue;
reading an output data list of a data item source node and analyzing to obtain address data of a data stream source required by the matched computing node when the matched computing node starts to execute;
the obtained data stream source address data is set in a default parameter list of the computing node in a covering mode;
and returning to the step of circularly traversing the node input data list, and completing the step of obtaining the data item source node of the computing node from the cache queue by the node until all data stream information required by the computing node when starting execution is collected.
In one embodiment, the input data list of the node data transmission information table includes a data stream type, an input node unique identifier, a name space to which the input parameter belongs, and an id identifier of the input data parameter in the node, the output data list of the node data transmission information table includes a target node unique identifier, a name space to which the target parameter belongs, and an id identifier of the target data parameter in the node, and the default parameter list of the computing node includes a parameter name, a parameter name space, a parameter id identifier, a parameter type, parameter meta information, a parameter unit, a parameter data value, and a number of data streams in which the parameter participates.
In another aspect, there is also provided an inter-node data processing apparatus of a workflow scheduling engine, including:
the node scheduling module is used for scheduling computing nodes to be executed in the directed acyclic graph corresponding to the workflow layer by layer according to the design sequence of the workflow when the multidisciplinary optimization design workflow starts to be executed; adding the computing node after the execution into a node completion buffer queue;
the execution preparation module is used for starting an execution preparation stage of the execution computing node before any computing node is executed; the execution preparation stage is used for actively collecting all data stream information required by the computing node when the computing node starts to execute according to the node data transmission information table of the computing node;
the node execution module is used for starting to execute the computing node and storing the data to be output into an output data list in the node data transmission information table after the execution preparation stage is completed;
and the scheduling relay module is used for triggering and repeating the scheduling execution of each computing node until all computing nodes of the workflow are executed.
In one embodiment, the system further comprises:
the workflow configuration module is used for constructing a workflow in a front-end interface of the workflow scheduling engine and setting data transfer relation and data transfer direction among nodes in the workflow one by one according to the multidisciplinary optimization design task execution file;
and the setting storage module is used for storing the node information of the workflow after the setting to a database of the workflow scheduling engine.
In one embodiment, a process for starting an execution preparation phase of a computing node includes:
the node data transmission information table of the computing node is taken out, and a node input data list is analyzed;
circularly traversing the node input data list, and obtaining a data item source node of the computing node from the node completion cache queue;
reading an output data list of a data item source node and analyzing to obtain address data of a data stream source required by the matched computing node when the matched computing node starts to execute;
the obtained data stream source address data is set in a default parameter list of the computing node in a covering mode;
and returning to the step of circularly traversing the node input data list, and completing the step of obtaining the data item source node of the computing node from the cache queue by the node until all data stream information required by the computing node when starting execution is collected.
In one embodiment, the input data list of the node data transmission information table includes a data stream type, an input node unique identifier, a name space to which the input parameter belongs, and an id identifier of the input data parameter in the node, the output data list of the node data transmission information table includes a target node unique identifier, a name space to which the target parameter belongs, and an id identifier of the target data parameter in the node, and the default parameter list of the computing node includes a parameter name, a parameter name space, a parameter id identifier, a parameter type, parameter meta information, a parameter unit, a parameter data value, and a number of data streams in which the parameter participates.
In yet another aspect, a computer device is provided, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the method for processing data between nodes of a workflow scheduling engine described above when the processor executes the computer program.
In yet another aspect, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method for processing data between nodes of a workflow scheduling engine described above.
One of the above technical solutions has the following advantages and beneficial effects:
according to the method, the device, the equipment and the medium for processing the data among the nodes of the workflow scheduling engine, the node data transmission information table is developed and designed for each computing node participating in computation in the workflow, all the input and output data flow information related to the node set by a user is dynamically stored, and the input and output data flow information is stored in a database along with the warehousing of the tasks of the workflow nodes and is persisted in the database. In addition, a pre-execution preparation stage is designed for each node execution, and the target node of the data flow can collect data required by the execution of the target node according to the node data transmission information table at the stage and apply the collected data to a subsequent execution step. Because the data flow information required by the nodes is converted into the node active collection, the problem of data cross-node transmission in the directed acyclic graph is converted into the problem of data directional collection in a simpler flow, and the problems of data cross-node transmission and data flow multi-to-one transmission can be effectively solved. Meanwhile, the preparation stage before the node execution is an independent stage independent of the workflow scheduling program and can be independently developed and realized, so that the coupling degree of the workflow and the data flow function in the system is reduced while a stronger data flow transmission mechanism is introduced into the workflow, the complexity of the system is reduced, and the development and realization of the system are facilitated.
Drawings
In order to more clearly illustrate the technical solutions of embodiments or conventional techniques of the present application, the drawings required for the descriptions of the embodiments or conventional techniques will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a flow diagram of a method of processing data between nodes of a workflow scheduling engine in one embodiment;
FIG. 2 is a schematic diagram of a node data transmission information table configuration in one embodiment;
FIG. 3 is a flow chart of a method of processing data between nodes of a workflow scheduling engine according to another embodiment;
FIG. 4 is a schematic diagram of a data stream transport mechanism in one embodiment;
FIG. 5 is a flow diagram of a preparation phase performed in one embodiment;
FIG. 6 is a general flow diagram of a data streaming mechanism in one embodiment;
FIG. 7 is a block diagram of an inter-node data processing apparatus for a workflow scheduling engine in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
It is noted that reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
Those skilled in the art will appreciate that the embodiments described herein may be combined with other embodiments. The term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
The common workflow engine data transmission method at present comprises the following steps: the method comprises the following specific steps of: the process is constructed, and a data transfer relation is set for each adjacent node according to the requirement; the process starts to be executed, wherein each node outputs own data after the execution is finished, namely the data is actively transmitted to a self-following node; all data is ready for execution by the successor node, and execution can begin directly.
The workflow node jumps and executes, can carry on the one-to-one data transfer of the cross node, its concrete step is as follows: after the process is constructed, individual nodes can be set as a jump execution mode, namely, partial nodes in the process are ignored, and nodes in the process are directly arranged at the back or in the sub-process; the jump execution of the nodes can carry necessary data, so that one-to-one data transmission can be realized across the nodes; the target node that is executed by the jump directly starts execution, which is equivalent to a function call.
The existing data transmission method between the nodes of the workflow engine has the following defects: the data transmission between adjacent nodes can be smoothly carried out, but the mode that the nodes passively receive the data of other nodes is adopted, the execution requirement of the target node is ignored, and the defects of data redundancy transmission or possible data transmission missing and the like exist. The data transmission of non-adjacent nodes is limited, the transmission of node specific parameters cannot be performed in a targeted manner, and the defect that the data stream transmission is not fine exists. The links for transmitting data from a plurality of nodes to one target node are blocked, the defects of multiple setting, covering setting and the like of the target node data exist, and the risk of data transmission errors exists. The data flow is coupled with the workflow dispatcher, cannot exist alone, limits the flexibility of system development, increases the difficulty of system development, and cannot be introduced into the existing workflow engine at a lower cost.
Aiming at the problem that each non-adjacent node in the current workflow scheduling engine cannot efficiently transfer data, a task computing node entity structure in a process is transformed, and a preparation stage before execution is added for each node, so that each node can acquire needed data and other information from other related nodes before execution, the problem that the workflow non-adjacent node is complex in data interaction, the node to be executed is difficult to acquire data from other nodes in the process is solved, and meanwhile, a cross-node data transfer function can be added to the workflow scheduling engine of the current directed acyclic graph at lower cost.
Embodiments of the present invention will be described in detail below with reference to the attached drawings in the drawings of the embodiments of the present invention.
Referring to fig. 1, in one embodiment, a method for processing data between nodes of a workflow scheduling engine is provided, including the following processing steps S12 to S18:
s12, when the multidisciplinary optimization design workflow starts to be executed, scheduling computing nodes to be executed in the directed acyclic graph corresponding to the workflow layer by layer according to the design sequence of the workflow; adding the computing node after the execution into a node completion buffer queue;
s14, before any computing node is executed, starting to execute an execution preparation stage of the computing node; the execution preparation stage is used for actively collecting all data stream information required by the computing node when the computing node starts to execute according to the node data transmission information table of the computing node;
s16, after the execution preparation stage is completed, the execution of the computing nodes is started, and data to be output are stored in an output data list in the node data transmission information table.
S18, repeating the scheduling execution of each computing node until all computing nodes of the workflow are executed.
It can be understood that the multidisciplinary optimization design workflow can be a workflow which is built in advance according to the disciplinary optimization design task, or can be a workflow to be executed which is configured according to the task on-line construction. Because the node execution preparation module is designed and developed and used for executing the execution preparation stage of the node, the node execution preparation module is used for enabling the computing node to actively collect all data stream information required by the computing node when the computing node starts to execute according to the configured node data transmission information table before the node execution program executes the computing node, and the data stream information required to be output after the self execution is finished is also stored in the output data list of the self node data transmission information table so as to be acquired and used when other computing nodes actively collect the data stream information. Therefore, all data flow information required by the self-starting execution can be actively collected before the execution of each computing node, so that the execution can be started after all the data flow information required by the self-starting execution is obtained, and efficient inter-node data transfer and many-to-one transfer are realized.
According to the method, data flow description information is designed and integrated for each computing node in the workflow scheduling engine, the data flow description information comprises node input data list information and node output data list information, the data flow description information of the nodes is utilized to design an execution preparation stage before node execution, inter-node data flow transmission is achieved, multiple pairs of node data flow transmission is achieved, the node data flow description information is designed in detail, accurate transmission of data flow (parameter) information is achieved, and a more powerful and efficient data flow transmission mechanism is achieved in the workflow scheduling engine independently in a low coupling mode.
Specifically, when the multidisciplinary optimization design workflow starts to be executed, the workflow scheduling engine schedules each computing node to be executed in the directed acyclic graph corresponding to the workflow layer by layer according to the design sequence of the workflow; in the executing process, the executed computing node is added into the node completion buffer queue for the subsequent other computing nodes to actively acquire the source node information of the own data item. For any computing node, before executing the computing node, the workflow scheduling engine needs to set the data flow in the flow of the computing node (other previous nodes or executing the workflow at this time) by the node executing program, and starts the flow of executing the execution preparation phase of the computing node, so as to actively collect all the input information required by the computing node.
When all input information required by the computing node, namely all data flow information required by starting execution is collected, the flow of the execution preparation stage of the computing node is completed, the node execution program of the engine starts the execution operation of the current computing node, an output data list is obtained by analyzing the node data transmission information list of the computing node, and after the execution processing is completed, the execution post-processing stage of the computing node is started, namely, the data required to be output by the computing node currently is stored in the output data list and provided for the subsequent other nodes to collect and scan. And repeating the scheduling execution of each node until all the nodes are executed.
According to the data processing method among the nodes of the workflow scheduling engine, the data transmission information table of the design node is developed for each calculation node participating in calculation in the workflow, all input and output data flow information related to the node set by a user is dynamically stored, and the input and output data flow information is stored in the database along with the warehousing of the tasks of the workflow nodes and is persisted in the database. In addition, a pre-execution preparation stage is designed for each node execution, and the target node of the data flow can collect data required by the execution of the target node according to the node data transmission information table at the stage and apply the collected data to a subsequent execution step. Because the data flow information required by the nodes is converted into the node active collection, the problem of data cross-node transmission in the directed acyclic graph is converted into the problem of data directional collection in a simpler flow, and the problems of data cross-node transmission and data flow multi-to-one transmission can be effectively solved. Meanwhile, the preparation stage before the node execution is an independent stage independent of the workflow scheduling program and can be independently developed and realized, so that the coupling degree of the workflow and the data flow function in the system is reduced while a stronger data flow transmission mechanism is introduced into the workflow, the complexity of the system is reduced, and the development and realization of the system are facilitated.
In one embodiment, the node data transfer information table may include an input data list of existing attribute parameters for each item of data in the workflow of the art, e.g., the input data list includes a data flow type, an input node unique identification, a namespace to which the input parameter belongs, and an id identification of the input data parameter in the node. The output data list of the node data transmission information table comprises a unique identifier of the target node, a name space to which the target parameter belongs and an id identifier of the target data parameter in the node. Parameters 1 to n in the default parameter list of the computing node may include a parameter name, a parameter name space, a parameter id identifier, a parameter type (input and output), parameter meta information, a parameter unit, a parameter data value and the number of data streams in which the parameter participates, where n is the number of parameters.
It will be understood that, as shown in fig. 2, a schematic configuration diagram is configured for a node data transmission information table of a node, and in a node data field of a computing node, the node data transmission information table includes other data of a node existing in the node itself, a node data transmission information table, and a default parameter list owned by the node itself, which is used for storing various parameter information of the node. The node data transmission information table mechanism designed for the task node in the workflow is used for recording the input data table information of the current node parameter data flow and the data information to be output of the current node participation data flow, and the default parameter list mechanism owned by the node in the workflow is a basic mechanism which can be independently operated by the node and is also an input/output selectable list which can be used by a user when the node data flow is set; the system comprises a node execution preparation module (stage) for developing and designing before each node of a workflow scheduling engine is executed, and is used for realizing analysis of a node data flow information table, data collection and acquisition of each item of a node input data list and dynamic coverage operation of a default parameter list, so that a convenient and efficient data flow transmission mechanism is established, and cross-node data transmission and many-to-one data flow transmission can be realized; due to the design of the node data transmission information table, the data transmission among the nodes is more accurate, the improved realization of the data flow mechanism has lower coupling degree with other functions of the system, and the data flow mechanism can be introduced into a workflow scheduling engine of the current mainstream at lower cost, so that the data processing among the nodes is more efficient.
In fig. 2, the data stream types may include types of node input, system input, user input, and the like, the start node code may include a unique input node identifier, the start parameter name space may include a name space to which the input parameter belongs, the start parameter property may include an id identifier of the input data parameter in the node, the end node code may include a unique target node (i.e., a current node) identifier, the end parameter name space may include a name space to which the target parameter belongs, and the end parameter property may include an id identifier of the target data parameter in the node.
Further, as shown in fig. 3, the method for processing data between nodes of the workflow scheduling engine may further include the following pre-steps:
s10, constructing a workflow in a front end interface of a workflow scheduling engine, and setting data transfer relation and data transfer direction among nodes in the workflow one by one according to a multidisciplinary optimization design task execution file;
and S11, saving the node information of the workflow after the setting to a database of a workflow scheduling engine.
It can be understood that a user can construct a multidisciplinary optimization design workflow according to a given multidisciplinary optimization design task execution file through a front-end interface of an engine, and set the relation and the data transmission direction of data transmission among nodes according to workflow processing requirements, the setting of the node data streams is completed one by one as required, and a front-end interactive program only displays the front-end nodes of the nodes, so that the data transmitted in the data streams are the latest data after calculation. The node information in the workflow is saved to the database along with the save operation of the flow, which also includes the node data transmission information table in the node.
Taking the multidisciplinary optimization design flow engine as a specific implementation case of the workflow scheduling engine of the present invention as an example, a data flow transmission mechanism can be shown in fig. 4, which can realize spanning data transmission between multiple pairs of nodes, and can specify specific parameter entries of data transmission, for example, parameter 6 of node a spans other preamble nodes to be directly transmitted to parameter 5 of node b. In addition, the default parameter items of the data in each node can be also covered by the input of the user and the input of other sources of the system, and of course, the input data of the sources is recorded in the node data transmission information table as a special data stream information in the invention.
In the front end interface of the node data flow setting in the multidisciplinary optimization design flow engine, a user can check the parameter condition of the node according to the node default parameter list and connect the parameters of different nodes according to the requirement, so that the data transmission among the nodes is set. The front-end interface may include at least a source data list of the data stream and a data class of a target node (a currently set computing node) of the data stream.
By the processing of the pre-step, various parameters and processing flows of the workflow can be configured on line according to task demands, the system development configuration mode is more flexible and convenient, and the node data processing efficiency can be greatly improved.
In one embodiment, as shown in fig. 5, regarding the process of starting to execute the execution preparation phase of the computing node in the above step S14, the following processing steps may be specifically included:
s141, a node data transmission information table of the computing node is taken out, and a node input data list is analyzed;
s143, circularly traversing the node input data list, and obtaining a data item source node of the computing node from the node completion cache queue;
s145, reading an output data list of the data item source node and analyzing to obtain address data of a data stream source required by the matched computing node when the matched computing node starts to execute;
s147, the obtained data stream source address data is set in a default parameter list of the computing node in an overlaying way;
s149, returning to step S143, until all data stream information required for the computing node to start execution is collected.
It will be appreciated that, as shown in fig. 6, there is a general flow diagram of a data stream transmission mechanism, and in the preparation phase of node execution, specific steps thereof may be as follows:
a) And entering a node execution preparation module to perform node operation preparation, for example, taking out a node data transmission information table in the node, and analyzing out a node input data list. The node input data list contains the information of the node from which the data item of the node is derived.
b) And circularly traversing the node input data list obtained in the last step, and obtaining the node information of the source of the data item from the node completion cache queue.
c) The flow dispatcher of the request workflow dispatching engine searches the data item source node of the current computing node to read the output data list of the data item source node and analyze the output data list to obtain the data detailed information matched with the current required data item.
d) And using the obtained data item to cover a default parameter list of the current computing node so as to cover and set the obtained data stream source address data into a default input list of the current node, and repeating the steps until all the input information required by the current node is collected.
By the design of the execution preparation stage, the analysis of the node data flow information table, the collection and acquisition of each cross-node data of the node input data table and the dynamic coverage operation on the default input list are realized.
It should be understood that, although the steps in the flowcharts 1, 3, 5, and 6 described above are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps of the flowcharts 1, 3, 5, and 6 described above may include a plurality of sub-steps or phases, which are not necessarily performed at the same time, but may be performed at different times, and the order of execution of the sub-steps or phases is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or phases of other steps.
The invention has been tried out in the multi-disciplinary optimization design large-scale flow construction scheduling system, successfully realizes the transmission function of complex data flow in the optimization flow, ensures the correct calculation and analysis of the optimization problem represented by the optimization flow, and is the core realization part of the multi-disciplinary optimization design flow engine.
In one embodiment, as shown in fig. 7, an inter-node data processing apparatus 100 of a workflow scheduling engine is provided, including a node scheduling module 11, an execution preparation module 13, a node execution module 15, and a scheduling relay module 17, where the node scheduling module 11 is configured to schedule computing nodes to be executed in a directed acyclic graph corresponding to a workflow in a hierarchical manner according to a design sequence of the workflow when the workflow in a multidisciplinary optimization design starts to be executed; and adding the computing node after the execution is finished into a node completion cache queue. The execution preparation module 13 is configured to start an execution preparation phase of executing any computing node before executing the computing node; the execution preparation stage is used for actively collecting all data flow information required by the computing node when the computing node starts to execute according to the node data transmission information table of the computing node. The node execution module 15 is configured to start executing the computing node and save the data to be output to the output data list in the node data transmission information table after the execution preparation stage is completed. The scheduling relay module 17 is configured to trigger and repeat the scheduling execution of each computing node until all computing nodes of the workflow have been completed.
The data processing device 100 between the nodes of the workflow scheduling engine dynamically stores all the input and output data stream information related to the node set by the user by developing and designing a node data transmission information table for each computing node participating in computation in the workflow, and persists in a database along with the warehousing of the tasks of the workflow nodes. In addition, a pre-execution preparation stage is designed for each node execution, and the target node of the data flow can collect data required by the execution of the target node according to the node data transmission information table at the stage and apply the collected data to a subsequent execution step. Because the data flow information required by the nodes is converted into the node active collection, the problem of data cross-node transmission in the directed acyclic graph is converted into the problem of data directional collection in a simpler flow, and the problems of data cross-node transmission and data flow multi-to-one transmission can be effectively solved. Meanwhile, the preparation stage before the node execution is an independent stage independent of the workflow scheduling program and can be independently developed and realized, so that the coupling degree of the workflow and the data flow function in the system is reduced while a stronger data flow transmission mechanism is introduced into the workflow, the complexity of the system is reduced, and the development and realization of the system are facilitated.
In one embodiment, the data processing apparatus 100 between nodes of the workflow scheduling engine may further include a workflow configuration module and a setting and saving module, where the workflow configuration module is configured to construct a workflow in a front-end interface of the workflow scheduling engine and set a data transfer relationship and a data transfer direction between nodes in the workflow one by one according to a multidisciplinary optimization design task execution file. The setting storage module is used for storing the node information of the workflow after setting to a database of the workflow scheduling engine.
In one embodiment, a process for starting an execution preparation phase of a computing node includes:
the node data transmission information table of the computing node is taken out, and a node input data list is analyzed;
circularly traversing the node input data list, and obtaining a data item source node of the computing node from the node completion cache queue;
reading an output data list of a data item source node and analyzing to obtain address data of a data stream source required by the matched computing node when the matched computing node starts to execute;
the obtained data stream source address data is set in a default parameter list of the computing node in a covering mode;
and returning to the step of circularly traversing the node input data list, and completing the step of obtaining the data item source node of the computing node from the cache queue by the node until all data stream information required by the computing node when starting execution is collected.
In one embodiment, the input data list of the node data transmission information table includes a data stream type, an input node unique identifier, a name space to which the input parameter belongs, and an id identifier of the input data parameter in the node, the output data list of the node data transmission information table includes a target node unique identifier, a name space to which the target parameter belongs, and an id identifier of the target data parameter in the node, and the default parameter list of the computing node includes a parameter name, a parameter name space, a parameter id identifier, a parameter type, parameter meta information, a parameter unit, a parameter data value, and a number of data streams in which the parameter participates.
For specific limitation of the data processing apparatus 100 between the nodes of the workflow scheduling engine, reference may be made to the corresponding limitation of the data processing method between the nodes of the workflow scheduling engine, which is not described herein. The various modules in the workflow scheduling engine inter-node data processing apparatus 100 described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a device with a data processing function, or may be stored in a memory of the device in software, so that the processor may call and execute operations corresponding to the above modules, where the device may be, but is not limited to, various data computing and processing devices existing in the art.
In one embodiment, there is also provided a computer device including a memory and a processor, the memory storing a computer program, the processor implementing the following processing steps when executing the computer program: when the multidisciplinary optimization design workflow starts to be executed, scheduling computing nodes to be executed in the directed acyclic graph corresponding to the workflow layer by layer according to the design sequence of the workflow; adding the computing node after the execution into a node completion buffer queue; before any computing node is executed, starting an execution preparation stage of executing the computing node; the execution preparation stage is used for actively collecting all data stream information required by the computing node when the computing node starts to execute according to the node data transmission information table of the computing node; after the execution preparation stage is completed, starting to execute the computing node and storing the data to be output into an output data list in a node data transmission information table; and repeating the scheduling execution of each computing node until all computing nodes of the workflow are executed.
It will be appreciated that the above-mentioned computer device may include other software and hardware components not listed in the specification besides the above-mentioned memory and processor, and may be specifically determined according to the model of the specific computer device in different application scenarios, and the detailed description will not be listed in any way.
In one embodiment, the processor may further implement the steps or sub-steps added in the embodiments of the method for processing data between nodes of the workflow scheduling engine when executing the computer program.
In one embodiment, there is also provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the following processing steps: when the multidisciplinary optimization design workflow starts to be executed, scheduling computing nodes to be executed in the directed acyclic graph corresponding to the workflow layer by layer according to the design sequence of the workflow; adding the computing node after the execution into a node completion buffer queue; before any computing node is executed, starting an execution preparation stage of executing the computing node; the execution preparation stage is used for actively collecting all data stream information required by the computing node when the computing node starts to execute according to the node data transmission information table of the computing node; after the execution preparation stage is completed, starting to execute the computing node and storing the data to be output into an output data list in a node data transmission information table; and repeating the scheduling execution of each computing node until all computing nodes of the workflow are executed.
In one embodiment, the computer program when executed by the processor may further implement the steps or sub-steps added in the embodiments of the method for processing data between nodes of the workflow scheduling engine.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium and which, when executed, may comprise the steps of the above-described embodiments of the methods. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus dynamic random access memory (Rambus DRAM, RDRAM for short), and interface dynamic random access memory (DRDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples represent only a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, and are intended to be within the scope of the present application. The scope of the patent is therefore intended to be covered by the appended claims.

Claims (10)

1. The data processing method between the nodes of the workflow scheduling engine is characterized by comprising the following steps:
when a multidisciplinary optimization design workflow starts to be executed, scheduling computing nodes to be executed in a directed acyclic graph corresponding to the workflow layer by layer according to the design sequence of the workflow; adding the computing node after the execution into a node completion buffer queue;
before any computing node is executed, starting to execute an execution preparation phase of the computing node; the execution preparation stage is used for actively collecting all data flow information required by the computing node when the computing node starts to execute according to a node data transmission information table of the computing node;
after the execution preparation stage is finished, starting to execute the computing node and storing data to be output into an output data list in the node data transmission information table;
and repeating the scheduling execution of each computing node until all computing nodes of the workflow are executed.
2. The method of inter-node data processing for a workflow scheduling engine of claim 1, further comprising the steps of:
constructing the workflow in a front-end interface of the workflow scheduling engine, and setting data transfer relation and data transfer direction among nodes in the workflow one by one according to a multidisciplinary optimization design task execution file;
and saving the node information of the workflow after the setting is finished to a database of the workflow scheduling engine.
3. The method of inter-node data processing of a workflow scheduling engine of claim 2, wherein starting the process of executing the execution preparation phase of the computing node comprises:
the node data transmission information list of the computing node is taken out, and a node input data list is analyzed;
circularly traversing the node input data list, and obtaining a data item source node of the computing node from the node completion cache queue;
reading an output data list of the data item source node and analyzing to obtain address data of a data stream source required by the computing node when the computing node starts to execute;
the obtained data stream source address data are set in a default parameter list of the computing node in a covering mode;
and returning the step of circularly traversing the node input data list, and obtaining the data item source node of the computing node from the node completion cache queue until all data stream information required by the computing node when starting execution is collected.
4. A method of processing data between nodes of a workflow scheduling engine according to any one of claims 1 to 3, wherein the input data list of the node data transmission information table comprises a data stream type, an input node unique identifier, a name space to which an input parameter belongs, and an id identifier of the input data parameter in a node, the output data list of the node data transmission information table comprises a target node unique identifier, a name space to which a target parameter belongs, and an id identifier of the target data parameter in a node, and the default parameter list of the computing node comprises a parameter name, a parameter name space, a parameter id identifier, a parameter type, parameter meta information, a parameter unit, a parameter data value, and a number of data streams to which the parameter participates.
5. A workflow scheduling engine inter-node data processing apparatus, comprising:
the node scheduling module is used for scheduling computing nodes to be executed in the directed acyclic graph corresponding to the workflow layer by layer according to the design sequence of the workflow when the multidisciplinary optimization design workflow starts to be executed; adding the computing node after the execution into a node completion buffer queue;
an execution preparation module, configured to start executing an execution preparation phase of any computing node before executing the computing node; the execution preparation stage is used for actively collecting all data flow information required by the computing node when the computing node starts to execute according to a node data transmission information table of the computing node;
the node execution module is used for starting to execute the computing node and storing the data to be output into an output data list in the node data transmission information table after the execution preparation stage is finished;
and the scheduling relay module is used for triggering and repeating the scheduling execution of each computing node until the execution is completed on all computing nodes of the workflow.
6. The workflow scheduling engine inter-node data processing apparatus of claim 5, further comprising:
the workflow configuration module is used for constructing the workflow in a front end interface of the workflow scheduling engine and setting data transfer relation and data transfer direction among nodes in the workflow one by one according to the multidisciplinary optimal design task execution file;
and the setting storage module is used for storing the node information of the workflow after the setting is finished to a database of the workflow scheduling engine.
7. The workflow scheduling inter-engine node data processing apparatus of claim 6, wherein the process of starting execution of the execution preparation phase of the computing node comprises:
the node data transmission information list of the computing node is taken out, and a node input data list is analyzed;
circularly traversing the node input data list, and obtaining a data item source node of the computing node from the node completion cache queue;
reading an output data list of the data item source node and analyzing to obtain address data of a data stream source required by the computing node when the computing node starts to execute;
the obtained data stream source address data are set in a default parameter list of the computing node in a covering mode;
and returning the step of circularly traversing the node input data list, and obtaining the data item source node of the computing node from the node completion cache queue until all data stream information required by the computing node when starting execution is collected.
8. The apparatus according to any one of claims 5 to 7, wherein the input data list of the node data transmission information table includes a data stream type, an input node unique identification, a namespace to which an input parameter belongs, and an id identification of the input data parameter in the node, the output data list of the node data transmission information table includes a destination node unique identification, a namespace to which a destination parameter belongs, and an id identification of the destination data parameter in the node, and the default parameter list of the computing node includes a parameter name, a parameter namespace, a parameter id identification, a parameter type, parameter meta information, a parameter unit, a parameter data value, and a number of data streams to which the parameter participates.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of data processing between workflow scheduling engine nodes of any one of claims 1 to 4.
10. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the method of data processing between workflow scheduling engine nodes of any one of claims 1 to 4.
CN202410105768.9A 2024-01-25 2024-01-25 Method, device, equipment and medium for processing data among nodes of workflow scheduling engine Active CN117634866B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410105768.9A CN117634866B (en) 2024-01-25 2024-01-25 Method, device, equipment and medium for processing data among nodes of workflow scheduling engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410105768.9A CN117634866B (en) 2024-01-25 2024-01-25 Method, device, equipment and medium for processing data among nodes of workflow scheduling engine

Publications (2)

Publication Number Publication Date
CN117634866A true CN117634866A (en) 2024-03-01
CN117634866B CN117634866B (en) 2024-04-19

Family

ID=90025528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410105768.9A Active CN117634866B (en) 2024-01-25 2024-01-25 Method, device, equipment and medium for processing data among nodes of workflow scheduling engine

Country Status (1)

Country Link
CN (1) CN117634866B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107924502A (en) * 2016-03-01 2018-04-17 华为技术有限公司 Multistage high-effect Business Process Management engine
US20180349183A1 (en) * 2017-06-02 2018-12-06 Milos Popovic Systems and methods for scheduling jobs from computational workflows
CN109766196A (en) * 2018-12-18 2019-05-17 深圳云天励飞技术有限公司 A kind of method for scheduling task, device and equipment
CN110058932A (en) * 2019-04-19 2019-07-26 中国科学院深圳先进技术研究院 A kind of storage method and storage system calculated for data flow driven
CN110209629A (en) * 2019-07-15 2019-09-06 北京一流科技有限公司 Data flowing acceleration means and its method in the data handling path of coprocessor
CN114169801A (en) * 2021-12-27 2022-03-11 中国建设银行股份有限公司 Workflow scheduling method and device
CN114281573A (en) * 2021-12-28 2022-04-05 城云科技(中国)有限公司 Workflow data interaction method and device, electronic device and readable storage medium
WO2022135079A1 (en) * 2020-12-25 2022-06-30 北京有竹居网络技术有限公司 Data processing method for task flow engine, and task flow engine, device and medium
CN115114333A (en) * 2022-06-23 2022-09-27 北京元年科技股份有限公司 Multi-engine visual data stream implementation method, device, equipment and storage medium
CN115658270A (en) * 2022-11-02 2023-01-31 广州市易鸿智能装备有限公司 Workflow execution method, device, equipment and storage medium of visual system
CN116521744A (en) * 2023-06-30 2023-08-01 杭州拓数派科技发展有限公司 Full duplex metadata transmission method, device, system and computer equipment
US11815943B1 (en) * 2020-06-05 2023-11-14 State Farm Mutual Automobile Insurance Company Systems and methods for processing using directed acyclic graphs

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107924502A (en) * 2016-03-01 2018-04-17 华为技术有限公司 Multistage high-effect Business Process Management engine
US20180349183A1 (en) * 2017-06-02 2018-12-06 Milos Popovic Systems and methods for scheduling jobs from computational workflows
CN109766196A (en) * 2018-12-18 2019-05-17 深圳云天励飞技术有限公司 A kind of method for scheduling task, device and equipment
CN110058932A (en) * 2019-04-19 2019-07-26 中国科学院深圳先进技术研究院 A kind of storage method and storage system calculated for data flow driven
CN110209629A (en) * 2019-07-15 2019-09-06 北京一流科技有限公司 Data flowing acceleration means and its method in the data handling path of coprocessor
US11815943B1 (en) * 2020-06-05 2023-11-14 State Farm Mutual Automobile Insurance Company Systems and methods for processing using directed acyclic graphs
WO2022135079A1 (en) * 2020-12-25 2022-06-30 北京有竹居网络技术有限公司 Data processing method for task flow engine, and task flow engine, device and medium
CN114169801A (en) * 2021-12-27 2022-03-11 中国建设银行股份有限公司 Workflow scheduling method and device
CN114281573A (en) * 2021-12-28 2022-04-05 城云科技(中国)有限公司 Workflow data interaction method and device, electronic device and readable storage medium
CN115114333A (en) * 2022-06-23 2022-09-27 北京元年科技股份有限公司 Multi-engine visual data stream implementation method, device, equipment and storage medium
CN115658270A (en) * 2022-11-02 2023-01-31 广州市易鸿智能装备有限公司 Workflow execution method, device, equipment and storage medium of visual system
CN116521744A (en) * 2023-06-30 2023-08-01 杭州拓数派科技发展有限公司 Full duplex metadata transmission method, device, system and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李申;邵晓东;常建涛;: "面向产品设计过程的动态工作流建模方法", 计算机集成制造系统, no. 06, 15 June 2012 (2012-06-15) *

Also Published As

Publication number Publication date
CN117634866B (en) 2024-04-19

Similar Documents

Publication Publication Date Title
CN104423953B (en) A kind of SCADA timings data processing script execution system and method
AU2007253862A1 (en) Managing computing resources in graph-based computations
CN107016094B (en) Project shared file multi-person collaborative development method, device and system
JPWO2013065687A1 (en) Processor system and accelerator
CN112835714B (en) Container arrangement method, system and medium for CPU heterogeneous clusters in cloud edge environment
CN112214289A (en) Task scheduling method and device, server and storage medium
Bürgy et al. Optimal job insertion in the no-wait job shop
CN117634866B (en) Method, device, equipment and medium for processing data among nodes of workflow scheduling engine
US20090158263A1 (en) Device and method for automatically optimizing composite applications having orchestrated activities
CN117610320A (en) Directed acyclic graph workflow engine cyclic scheduling method, device and equipment
CN114444715A (en) Graph data processing method, device and system, electronic equipment and readable storage medium
Skillicorn et al. Optimising data-parallel programs using the BSP cost model
CN110035103A (en) A kind of transferable distributed scheduling system of internodal data
Weiß et al. Rewinding and repeating scientific choreographies
Hewett et al. Scalable optimized composition of web services with complexity analysis
JP2008090541A (en) Parallelization program generation method, parallelization program generation device, and parallelization program generation program
Subramaniam et al. Improving process models by discovering decision points
CN106445403B (en) Distributed storage method and system for paired storage of mass data
US20080147221A1 (en) Grid modeling tool
JP2001092647A (en) Method for converting process definition information into flow control program
CN107526573B (en) Method for processing remote sensing image by adopting parallel pipeline
CN117076095B (en) Task scheduling method, system, electronic equipment and storage medium based on DAG
AU2018206850A1 (en) Computer implemented technologies configured to enable efficient processing of data in a transportation network based on generation of directed graph data derived from transportation timetable data
CN116661978B (en) Distributed flow processing method and device and distributed business flow engine
Ren Single machine batch scheduling with non-increasing time slot costs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant