CN112529438B - Workflow processing method and device for distributed scheduling system, computer equipment and storage medium - Google Patents

Workflow processing method and device for distributed scheduling system, computer equipment and storage medium Download PDF

Info

Publication number
CN112529438B
CN112529438B CN202011500926.9A CN202011500926A CN112529438B CN 112529438 B CN112529438 B CN 112529438B CN 202011500926 A CN202011500926 A CN 202011500926A CN 112529438 B CN112529438 B CN 112529438B
Authority
CN
China
Prior art keywords
scheduling
message
workflow
node
execution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011500926.9A
Other languages
Chinese (zh)
Other versions
CN112529438A (en
Inventor
杨真
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Bank Co Ltd
Original Assignee
Ping An Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Bank Co Ltd filed Critical Ping An Bank Co Ltd
Priority to CN202011500926.9A priority Critical patent/CN112529438B/en
Publication of CN112529438A publication Critical patent/CN112529438A/en
Application granted granted Critical
Publication of CN112529438B publication Critical patent/CN112529438B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a workflow processing method and device of a distributed scheduling system, computer equipment and a storage medium, and belongs to the technical field of computer communication. The workflow processing method of the distributed scheduling system can monitor the working states of the nodes and the execution clusters in the scheduling paths transmitted by each scheduling message when the workflow execution engine schedules the execution clusters associated with the nodes in the workflow according to the service information, monitors the first scheduling message of the scheduling instruction transmitted by the workflow execution engine according to the nodes in the workflow, the second scheduling message of the scheduling instruction received by the execution clusters associated with the nodes, the first feedback message of the output feedback data of the execution clusters and the second feedback message of the received feedback data received by the workflow execution engine, so that the working states of the nodes and the execution clusters in the scheduling paths transmitted by each scheduling message can be known according to the first scheduling message, the second scheduling message, the first feedback message and the second feedback message monitored in the scheduling paths, and the abnormal position can be fed back quickly when the abnormality occurs, thereby being convenient for alarming or remedying in time.

Description

Workflow processing method and device for distributed scheduling system, computer equipment and storage medium
Technical Field
The present invention relates to the field of computer communications technologies, and in particular, to a method and apparatus for processing a workflow in a distributed scheduling system, a computer device, and a storage medium.
Background
The Workflow (Workflow) technology is a core technology for realizing enterprise business process modeling, business process simulation analysis, business process optimization, business process management and integration and realizing business process automation. The workflow execution engine is the core of workflow management, and is a software service or "engine" that provides an execution environment for workflow instances, which is the task scheduler of the enterprise business process, and to some extent, the allocator of enterprise resources. How to accomplish the management of the workflow is a core issue of workflow management.
In recent years, with the development of the internet and the increasing of data volume, the data application scene is more and more complex, and the requirement on workflow management is also higher. For a distributed scheduling system, the workflow processing flow of the existing distributed scheduling system is as follows: and sending a scheduling instruction to the corresponding job execution assembly through a working link by utilizing the workflow execution engine according to the operation logic of the nodes in the workflow so as to control the job execution assembly to execute corresponding operation. For the distributed scheduling system, whether the scheduling instruction can be reliably transmitted or not and whether the scheduling instruction can be timely found when an abnormality occurs can be an important consideration index of the distributed scheduling system, and the transmission (sending and receiving) of the scheduling instruction needs to pass through a plurality of components, so that problems may occur in each component or in a transmission network between components. The existing distributed scheduling system cannot effectively track and monitor transmission of scheduling instructions, and particularly cannot effectively and rapidly locate abnormal positions of scheduling instruction transmission links when service is particularly large, so that normal operation of workflow is not facilitated.
Disclosure of Invention
Aiming at the problem that the existing distributed scheduling system cannot quickly locate the abnormal position of the transmission of the scheduling instruction, the workflow processing method, the workflow processing device, the computer equipment and the storage medium of the distributed scheduling system, which aim to track the transmission of the scheduling instruction and are generally used for quickly locating the abnormal position.
In order to achieve the above object, the present invention provides a workflow processing method for a distributed scheduling system, including:
acquiring service information of a distributed scheduling system, and selecting a corresponding workflow according to the service information;
scheduling, by a workflow execution engine, execution clusters associated with nodes in the workflow to run according to the service information;
monitoring a first scheduling message of a scheduling instruction sent by the workflow execution engine according to a node in the workflow, a second scheduling message of the scheduling instruction received by the execution cluster associated with the node, a first feedback message of feedback data output by the execution cluster and a second feedback message of the feedback data received by the workflow execution engine;
and identifying the working states of the node and the execution cluster in the scheduling path according to the first scheduling message, the second scheduling message, the first feedback message and the second feedback message monitored in the scheduling path.
Optionally, the distributed scheduling system includes a plurality of workflows, and the service information includes a service ID, and the service ID is associated with the corresponding workflow.
Optionally, before the workflow execution engine schedules the execution cluster associated with the node in the workflow to run according to the service information, the method further includes:
triggering execution clusters associated with nodes in the workflow to run by adopting the workflow execution engine based on a preset instruction according to the logic sequence of the nodes in the workflow, and recording the state of each node and the state of the execution clusters until the workflow is completed;
when the states of all the nodes in the workflow and the states of the execution clusters are in a normal state, scheduling the execution clusters associated with the nodes in the workflow to run through the workflow execution engine according to the service information; if not, generating a message triggering the path abnormality.
Optionally, the preset instruction is a null instruction.
Optionally, the workflow execution engine schedules execution cluster operation associated with the nodes in the workflow according to the service information, including:
Inquiring the node which accords with the scheduling condition in the current workflow according to the scheduling data in the service information, determining the node which needs to be operated by the workflow execution engine according to the state of the node and the operation logic of the node in the workflow, and sending the scheduling instruction to the associated execution cluster according to the node to execute the corresponding operation.
Optionally, the nodes in the workflow include working nodes, and each working node is associated with a corresponding execution cluster; or (b)
The nodes in the workflow comprise rule nodes and working nodes, the rule nodes comprise a plurality of working nodes, and each working node is associated with a corresponding execution cluster.
Optionally, identifying the working states of the node and the execution cluster in the scheduling path according to the first scheduling message, the second scheduling message, the first feedback message and the second feedback message monitored in the scheduling path includes:
after the first scheduling message is monitored, judging whether the second scheduling message is received within a first preset time range or not based on the first scheduling message, if so, indicating that the execution cluster associated with the node is in a normal state, and if not, indicating that an abnormality exists in a transmission message path between the node and the execution cluster associated with the node;
And after the first feedback message is monitored, judging whether the second feedback message is received within a second preset time range or not based on the first feedback message, if so, indicating that the execution cluster associated with the node is in a normal state, and if not, indicating that the feedback message path between the execution cluster and the associated node is abnormal.
In order to achieve the above object, the present invention further provides a workflow processing apparatus of a distributed scheduling system, including:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring service information of a distributed scheduling system and selecting a corresponding workflow according to the service information;
the scheduling unit is used for scheduling the execution cluster associated with the nodes in the workflow to run according to the service information through a workflow execution engine;
the monitoring unit is used for monitoring that the workflow execution engine in each scheduling path sends a first scheduling message of a scheduling instruction according to a node in the workflow, a second scheduling message of the scheduling instruction received by the execution cluster associated with the node, a first feedback message of feedback data output by the execution cluster and a second feedback message of the feedback data received by the workflow execution engine;
And the identification unit is used for identifying the working states of the node and the execution cluster in the scheduling path according to the first scheduling message, the second scheduling message, the first feedback message and the second feedback message which are monitored in the scheduling path.
To achieve the above object, the present invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.
To achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program characterized in that: the computer program realizes the steps of the above method when being executed by a processor.
The workflow executing engine in each scheduling path can be monitored when the workflow executing engine schedules the execution cluster associated with the node in the workflow according to the service information to operate, and the workflow executing engine in each scheduling path can quickly feed back the abnormal position according to the first scheduling information of the scheduling instruction sent by the node in the workflow, the second scheduling information of the execution cluster associated with the node to receive the scheduling instruction, the first feedback information of the output feedback data of the execution cluster and the second feedback information of the feedback data received by the workflow executing engine, so that the node and the execution cluster in the scheduling path transmitted by each scheduling message can be known according to the first scheduling information, the second scheduling information, the first feedback information and the second feedback information monitored in the scheduling path, and the abnormal position can be quickly fed back when the node, the execution cluster or the transmission path is abnormal, thereby being convenient for alarming or remedying in time, and ensuring the high availability of the scheduling instruction.
Drawings
FIG. 1 is a flow chart of one embodiment of a distributed scheduling system workflow processing method according to the present invention;
FIG. 2 is a flow chart of another embodiment of a workflow processing method of a distributed scheduling system according to the present invention;
FIG. 3 is a block diagram of one embodiment of a distributed scheduling system workflow processing apparatus according to the present invention;
FIG. 4 is a block diagram of another embodiment of a workflow processing apparatus of a distributed scheduling system according to the present invention;
FIG. 5 is a hardware architecture diagram of one embodiment of a computer device of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
The workflow processing method, the workflow processing device, the computer equipment and the storage medium of the distributed scheduling system can be applied to the financial field or enterprise management (such as statistics attendance). The invention can monitor the working state of the node and the execution cluster in each scheduling path when the workflow execution engine schedules the execution cluster associated with the node in the workflow according to the service information, monitors the first scheduling information of the scheduling instruction sent by the node in the workflow, the second scheduling information of the scheduling instruction received by the execution cluster associated with the node, the first feedback information of the feedback data output by the execution cluster and the second feedback information of the feedback data received by the workflow execution engine, so as to know the working state of the node and the execution cluster in the scheduling path transmitted by each scheduling message according to the first scheduling information, the second scheduling information, the first feedback information and the second feedback information monitored in the scheduling path, and the abnormal position of the node, the execution cluster or the transmission path can be fed back quickly, thereby being convenient for alarming or remedying in time, and ensuring the high availability of the scheduling instruction.
The workflow related to the embodiment of the invention is the abstraction, generalization and description of the complex workflow in the distributed system and the business rules between the operation steps. Namely: the workflow may utilize a computer to automatically communicate documents, information, or tasks according to certain predetermined rules to achieve a certain business objective.
Example 1
Referring to fig. 1, a workflow processing method of a distributed scheduling system of the present embodiment includes the following steps:
s1, acquiring service information of a distributed scheduling system, and selecting a corresponding workflow according to the service information.
It is emphasized that to further guarantee the privacy and security of the workflow, the workflow may also be stored in a blockchain node.
Specifically, the distributed scheduling system is a set formed by a large number of servers, can be used for communication and resource sharing, provides higher computing performance and availability, thereby meeting the requirements of users for processing services, and the local server obtains service information, wherein the service information can be a cloud hard disk copy creation service, a clone mirror image service and other services, but is not limited to the cloud hard disk copy service, the clone mirror image service and other services.
In this embodiment, the distributed scheduling system includes a plurality of workflows, and the service information includes a service ID, where the service ID is associated with the corresponding workflow. The nodes in the workflow comprise working nodes, and each working node is associated with a corresponding execution cluster; or the nodes in the workflow comprise rule nodes and working nodes, the rule nodes comprise a plurality of working nodes, and each working node is associated with a corresponding execution cluster.
The workflow may include a job orchestration diagram of job orchestration including work nodes and rule nodes, connections between nodes, or strings or concurrences. The rule nodes are associated with a sub-collaboration graph embedded in the job collaboration graph, which may include a plurality of work nodes. An execution cluster corresponding to a working node may include one or more execution machines (execution components). Each working node corresponds to a batch task, and an allowable instance is associated with physical deployment.
Further, step S1 is to select the workflow associated with the service ID according to the service ID in the service information.
In practical application, when the workflow is scheduled based on service information, the messages transmitted in the entire scheduling link all include service IDs, and different service information is distinguished by the service IDs.
By way of example and not limitation, the distributed scheduling system workflow processing method may be applied in a Horae (being the first decentralized token investment management platform worldwide) system.
S2, scheduling the execution clusters associated with the nodes in the workflow to run according to the service information through a workflow execution engine.
Further, step S2 may include: inquiring the node which accords with the scheduling condition in the current workflow according to the scheduling data in the service information, determining the node which needs to be operated by the workflow execution engine according to the state of the node and the operation logic of the node in the workflow, and sending the scheduling instruction to the associated execution cluster according to the node to execute the corresponding operation.
The state priority ordering of the nodes is as follows from high to low in sequence: stop (stop), error (failed), pause (used), wait (await), timeout (timeout), run (running), ready (ready), complete (skip), etc. The workflow execution engine determines the working node which needs to be operated currently based on the state of the operated working node in the job cooperation graph and the operation logic, so that a scheduling instruction is generated according to the ID of the working node, the scheduling instruction is sent to a corresponding execution cluster through a transmission component, so that the execution cluster executes a corresponding task according to the received scheduling instruction, after execution is completed, the execution cluster sends feedback data to the workflow execution engine through the transmission component, so that the state of the corresponding working node is changed, a called closed link is formed, and the workflow execution engine can redetermine the working node which needs to be operated according to the state of the changed working node until the whole workflow is completed.
Considering that a job-agreement graph of a workflow can include thousands or tens of thousands of nodes, when there are thousands or tens of thousands of job nodes, there are often various problems, such as incorrect job-agreement graphs, or problematic node configuration, or disabled nodes on job-agreement graphs, or execution clusters are abnormal, etc. When the workflow processing process has the problems, as each node processes real batch data, batch breakage occurs when the nodes run to half, so that the task processing efficiency is affected, and the iteration efficiency is low for operation and maintenance personnel.
Based on the above, the detection of the workflow and the execution clusters may be performed in advance before the workflow processing, i.e., before the execution of step S2.
In a preferred embodiment, referring to fig. 2, before performing step S2, the method may further include:
A1. and triggering execution clusters associated with the nodes in the workflow to run by adopting the workflow execution engine based on a preset instruction according to the logic sequence of the nodes in the workflow, and recording the state of each node and the state of the execution clusters until the workflow is completed.
The preset instruction is a null instruction, such as cmd= "kp".
A2. Judging whether the states of all the nodes in the workflow and the states of the execution clusters are in a normal state or not, if so, executing the step S2; if not, generating a message triggering the path abnormality.
In this embodiment, when a working node on a workflow is triggered, a workflow execution engine sends a null instruction to an execution unit (including a Zookeeper and an execution cluster) through a transmission component, runs a slice through the Zookeeper (a distributed application coordination service of a distributed open source code) in the execution unit, slices a received task so that the corresponding execution cluster executes the corresponding task, and it is required to be noted that in the execution process, service logic is not executed specifically, but the state of the execution cluster is directly returned to the workflow execution engine through the transmission component, that is, a trigger instruction runs through the whole link and the scheduling logic, but specific service is not executed. The states of the execution cluster may include a completion state (normal state) and a failure state (abnormal state), and when the execution cluster is in the failure state, the cause of the failure may be returned and the operation of the following nodes may be stopped.
In this embodiment, for a workflow with tens of thousands of nodes, the scheduled nodes are thousands or tens of thousands, if the data is run, it basically takes more than several hours, if the problem during running is not found in time, the problem is found during running, the work efficiency is extremely low due to the fact that the workflow is to be adjusted and re-run, in this embodiment, by adding the function of workflow idle running, the running problem on the workflow and the physical link can be verified through idle running, so that the operation and maintenance or developer can adjust in time, and the work efficiency is greatly improved.
S3, monitoring a first scheduling message of a scheduling instruction sent by the workflow execution engine according to a node in the workflow, a second scheduling message of the scheduling instruction received by the execution cluster associated with the node, a first feedback message of feedback data output by the execution cluster and a second feedback message of the feedback data received by the workflow execution engine in each scheduling path.
In this embodiment, the scheduling path of the scheduling instruction and the monitoring of the message are independent from each other.
And adopting an instruction tracking component to track and monitor the message at the key stage of the scheduling path, thereby achieving the purpose of acquiring the message transmission condition in the scheduling path in real time. Wherein the critical phase may include: the method comprises a stage of sending scheduling instructions by a workflow execution engine, a stage of receiving feedback data by the workflow execution engine, a stage of receiving the scheduling instructions by an execution cluster and a stage of outputting the feedback data by the execution cluster.
In practical application, the stage of monitoring the execution cluster to receive the scheduling instruction can be realized by monitoring the Zookeeper in the execution unit to receive the scheduling instruction; the stage of monitoring the cluster output feedback data can send the feedback data to the workflow execution engine through detecting the Zookeeper in the execution unit, in order to improve throughput, all the information returns according to a certain sequence, the parallelism can be improved by adopting the information middleware according to the information ID, each workflow corresponds to one storage partition, the state of each node in the workflow is recorded through the storage partition, and the information of the same workflow is stored in the same partition.
S4, identifying working states of the node and the execution cluster in the scheduling path according to the first scheduling message, the second scheduling message, the first feedback message and the second feedback message monitored in the scheduling path.
Further, step S4 may include:
and after the first scheduling message is monitored, judging whether the second scheduling message is received within a first preset time range or not based on the first scheduling message, if so, indicating that the execution cluster associated with the node is in a normal state, and if not, indicating that an abnormality exists in a transmission message path between the node and the execution cluster associated with the node.
In this embodiment, the first scheduling message is used to record message sending information, where the sending information may include information such as a message ID, a working node ID, a service ID, and a timestamp of the sending message; the second scheduling message is used to record message reception information, which may include: message ID, working node ID, service ID, timestamp of received message, etc. After the timestamp of the sent message is monitored, judging whether a second scheduling message is received in a first preset time range (for example, 30 seconds), if so, indicating that the working node and the execution cluster associated with the node are in a normal state; if not, the abnormal transmission of the instruction between the working node and the execution cluster is indicated, and the working node can be triggered to reissue the scheduling instruction.
Further, step S4 may further include:
and after the first feedback message is monitored, judging whether the second feedback message is received within a second preset time range or not based on the first feedback message, if so, indicating that the execution cluster associated with the node is in a normal state, and if not, indicating that the feedback message path between the execution cluster and the associated node is abnormal.
In this embodiment, the first feedback message is used to record transmission information of feedback data, where the transmission information may include information such as a message ID, a working node ID, a service ID, and a timestamp of the transmission message; the second feedback message is used for recording the receiving information of the feedback data, and the sending information can comprise information such as a message ID, a working node ID, a service ID, a timestamp of the receiving message and the like. After the timestamp of the sent message is monitored, judging whether a second feedback message is received within a second preset time range (for example, 15 seconds), if so, indicating that the executive cluster and the associated working node are in a normal state; if not, the abnormal transmission of the instruction between the execution cluster and the working node is indicated, and the working node can be triggered to reissue the scheduling instruction. If the first feedback message is still not monitored within the third preset time range (5 minutes) after the second scheduling message is monitored, the situation that the execution cluster is abnormal is indicated, whether the task runs on the execution cluster or not needs to be judged, and whether the task is lost or not does not need to be judged.
In this embodiment, the workflow processing method of the distributed scheduling system may monitor, when the workflow execution engine schedules the execution clusters associated with the nodes in the workflow according to the service information, that the workflow execution engine in each scheduling path sends a first scheduling message of a scheduling instruction, receives a second scheduling message of the scheduling instruction according to the nodes in the workflow, receives a first feedback message of the feedback data output by the execution clusters, and receives a second feedback message of the feedback data, so as to know, according to the first scheduling message, the second scheduling message, the first feedback message, and the second feedback message monitored in the scheduling paths, the working states of the nodes and the execution clusters in the scheduling paths transmitted by each scheduling message, and that the node, the execution cluster, or the transmission path has an abnormality, so that the abnormal position can be fed back quickly, so as to facilitate the warning or remedy, thereby ensuring high availability of the scheduling instruction and providing a foundation for quick recovery of the service.
Example two
Referring to fig. 3, a workflow processing apparatus 1 of a distributed scheduling system of the present embodiment includes: an acquisition unit 11, a scheduling unit 12, a monitoring unit 13 and an identification unit 14.
And the acquiring unit 11 is configured to acquire service information of the distributed scheduling system, and select a corresponding workflow according to the service information.
It is emphasized that to further guarantee the privacy and security of the workflow, the workflow may also be stored in a blockchain node.
Specifically, the distributed scheduling system is a set formed by a large number of servers, can be used for communication and resource sharing, provides higher computing performance and availability, thereby meeting the requirements of users for processing services, and the local server obtains service information, wherein the service information can be a cloud hard disk copy creation service, a clone mirror image service and other services, but is not limited to the cloud hard disk copy service, the clone mirror image service and other services.
In this embodiment, the distributed scheduling system includes a plurality of workflows, and the service information includes a service ID, where the service ID is associated with the corresponding workflow. The nodes in the workflow comprise working nodes, and each working node is associated with a corresponding execution cluster; or the nodes in the workflow comprise rule nodes and working nodes, the rule nodes comprise a plurality of working nodes, and each working node is associated with a corresponding execution cluster.
The workflow may include a job orchestration diagram of job orchestration including work nodes and rule nodes, connections between nodes, or strings or concurrences. The rule nodes are associated with a sub-collaboration graph embedded in the job collaboration graph, which may include a plurality of work nodes. An execution cluster corresponding to a working node may include one or more execution machines (execution components). Each working node corresponds to a batch task, and an allowable instance is associated with physical deployment.
Further, the obtaining unit 11 is configured to select the workflow associated with the service ID according to the service ID in the service information.
In practical application, when the workflow is scheduled based on service information, the messages transmitted in the entire scheduling link all include service IDs, and different service information is distinguished by the service IDs.
By way of example and not limitation, the distributed scheduling system workflow processing method may be applied in a Horae (being the first decentralized token investment management platform worldwide) system.
And the scheduling unit 12 is used for scheduling the execution cluster associated with the nodes in the workflow to run according to the service information through a workflow execution engine.
Further, the scheduling unit 12 may query the node in the workflow that meets the scheduling condition according to the scheduling data in the service information, and the workflow execution engine determines the node to be operated according to the state of the node and the operation logic of the node in the workflow, and sends the scheduling instruction to the associated execution cluster according to the node to execute the corresponding operation.
The state priority ordering of the nodes is as follows from high to low in sequence: stop (stop), error (failed), pause (used), wait (await), timeout (timeout), run (running), ready (ready), complete (skip), etc. The workflow execution engine determines the working node which needs to be operated currently based on the state of the operated working node in the job cooperation graph and the operation logic, so that a scheduling instruction is generated according to the ID of the working node, the scheduling instruction is sent to a corresponding execution cluster through a transmission component, so that the execution cluster executes a corresponding task according to the received scheduling instruction, after execution is completed, the execution cluster sends feedback data to the workflow execution engine through the transmission component, so that the state of the corresponding working node is changed, a called closed link is formed, and the workflow execution engine can redetermine the working node which needs to be operated according to the state of the changed working node until the whole workflow is completed.
The monitoring unit 13 is configured to monitor, in each scheduling path, a first scheduling message of a scheduling instruction sent by the workflow execution engine according to a node in the workflow, a second scheduling message of the scheduling instruction received by the execution cluster associated with the node, a first feedback message of feedback data output by the execution cluster, and a second feedback message of the feedback data received by the workflow execution engine.
In this embodiment, the scheduling path of the scheduling instruction and the monitoring of the message are independent from each other.
And adopting an instruction tracking component to track and monitor the message at the key stage of the scheduling path, thereby achieving the purpose of acquiring the message transmission condition in the scheduling path in real time. Wherein the critical phase may include: the method comprises a stage of sending scheduling instructions by a workflow execution engine, a stage of receiving feedback data by the workflow execution engine, a stage of receiving the scheduling instructions by an execution cluster and a stage of outputting the feedback data by the execution cluster.
In practical application, the stage of monitoring the execution cluster to receive the scheduling instruction can be realized by monitoring the Zookeeper in the execution unit to receive the scheduling instruction; the stage of monitoring the cluster output feedback data can send the feedback data to the workflow execution engine through detecting the Zookeeper in the execution unit, in order to improve throughput, all the information returns according to a certain sequence, the parallelism can be improved by adopting the information middleware according to the information ID, each workflow corresponds to one storage partition, the state of each node in the workflow is recorded through the storage partition, and the information of the same workflow is stored in the same partition.
And the identifying unit 14 is configured to identify the working states of the node and the execution cluster in the scheduling path according to the first scheduling message, the second scheduling message, the first feedback message, and the second feedback message monitored in the scheduling path.
Further, after the first scheduling message is monitored, the identifying unit 14 may determine, based on the first scheduling message, whether the second scheduling message is received within a first preset time range, if so, it indicates that the execution cluster associated with the node is in a normal state, and if not, it indicates that an abnormality exists in a transmission message path between the node and the execution cluster associated with the node.
In this embodiment, the first scheduling message is used to record message sending information, where the sending information may include information such as a message ID, a working node ID, a service ID, and a timestamp of the sending message; the second scheduling message is used to record message reception information, which may include: message ID, working node ID, service ID, timestamp of received message, etc. After the timestamp of the sent message is monitored, judging whether a second scheduling message is received in a first preset time range (for example, 30 seconds), if so, indicating that the working node and the execution cluster associated with the node are in a normal state; if not, the abnormal transmission of the instruction between the working node and the execution cluster is indicated, and the working node can be triggered to reissue the scheduling instruction.
Further, after the first feedback message is monitored, the identifying unit 14 may further determine, based on the first feedback message, whether the second feedback message is received within a second preset time range, if yes, it indicates that the execution cluster associated with the node is in a normal state, and if no, it indicates that an abnormality exists in a feedback message path between the execution cluster and the associated node.
In this embodiment, the first feedback message is used to record transmission information of feedback data, where the transmission information may include information such as a message ID, a working node ID, a service ID, and a timestamp of the transmission message; the second feedback message is used for recording the receiving information of the feedback data, and the sending information can comprise information such as a message ID, a working node ID, a service ID, a timestamp of the receiving message and the like. After the timestamp of the sent message is monitored, judging whether a second feedback message is received within a second preset time range (for example, 15 seconds), if so, indicating that the executive cluster and the associated working node are in a normal state; if not, the abnormal transmission of the instruction between the execution cluster and the working node is indicated, and the working node can be triggered to reissue the scheduling instruction. If the first feedback message is still not monitored within the third preset time range (5 minutes) after the second scheduling message is monitored, the situation that the execution cluster is abnormal is indicated, whether the task runs on the execution cluster or not needs to be judged, and whether the task is lost or not does not need to be judged.
In this embodiment, when the workflow execution engine schedules the execution cluster associated with the node in the workflow according to the service information, the workflow execution engine in each scheduling path may be monitored by using the monitoring unit 13, and the first scheduling message of the scheduling instruction sent by the workflow execution engine, the second scheduling message of the scheduling instruction received by the execution cluster associated with the node, the first feedback message of the execution cluster output feedback data, and the second feedback message of the feedback data received by the workflow execution engine, so that the working state of the node and the execution cluster in the scheduling path transmitted by each scheduling message is known by using the identifying unit 14 according to the first scheduling message, the second scheduling message, the first feedback message, and the second feedback message monitored in the scheduling path, and the occurrence of an abnormality in the node, the execution cluster, or the transmission path may be fed back quickly, so as to facilitate the timely alarm or remedy, thereby ensuring the high availability of the scheduling instruction and providing a foundation for the quick recovery of the service.
Considering that a job-agreement graph of a workflow can include thousands or tens of thousands of nodes, when there are thousands or tens of thousands of job nodes, there are often various problems, such as incorrect job-agreement graphs, or problematic node configuration, or disabled nodes on job-agreement graphs, or execution clusters are abnormal, etc. When the workflow processing process has the problems, as each node processes real batch data, batch breakage occurs when the nodes run to half, so that the task processing efficiency is affected, and the iteration efficiency is low for operation and maintenance personnel.
Based on the above, in a preferred embodiment, the distributed scheduling system workflow processing apparatus 1 referring to fig. 4 may further include: a triggering unit 15 and a judging unit 16.
The triggering unit 15 and the judging unit 16 may be used to pre-detect the workflow and the execution clusters before the workflow processing.
The triggering unit 15 is configured to trigger, by using the workflow execution engine according to a logic sequence of the nodes in the workflow based on a preset instruction, execution clusters associated with the nodes in the workflow to run, and record a state of each node and a state of the execution clusters until the workflow is completed.
The preset instruction is a null instruction, such as cmd= "kp".
A judging unit 16, configured to judge whether the states of all the nodes in the workflow and the states of the execution clusters are in a normal state, and if so, schedule, by using a scheduling unit 12, execution clusters associated with the nodes in the workflow to operate according to the service information by using a workflow execution engine; if not, generating a message triggering path abnormality.
In this embodiment, when a working node on a workflow is triggered, a workflow execution engine sends a null instruction to an execution unit (including a Zookeeper and an execution cluster) through a transmission component, and a received task is fragmented by the Zookeeper running in the execution unit, so that the corresponding execution cluster executes a corresponding task. The states of the execution cluster may include a completion state (normal state) and a failure state (abnormal state), and when the execution cluster is in the failure state, the cause of the failure may be returned and the operation of the following nodes may be stopped.
In this embodiment, for a workflow with tens of thousands of nodes, the scheduled nodes are thousands or tens of thousands, if the data is run, it basically takes more than several hours, if the problem during running is not found in time, the problem is found during running, the work efficiency is extremely low due to the fact that the workflow is to be adjusted and re-run, in this embodiment, by adding the function of workflow idle running, the running problem on the workflow and the physical link can be verified through idle running, so that the operation and maintenance or developer can adjust in time, and the work efficiency is greatly improved.
Example III
In order to achieve the above objective, the present invention further provides a computer device 2, where the computer device 2 includes a plurality of computer devices 2, and the components of the workflow processing apparatus 1 of the distributed scheduling system of the second embodiment may be dispersed in different computer devices 2, and the computer device 2 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack-mounted server, a blade server, a tower server, or a rack-mounted server (including a stand-alone server, or a server cluster formed by a plurality of servers) that execute a program, or the like. The computer device 2 of the present embodiment includes at least, but is not limited to: the memory 21, the processor 23, the network interface 22, and the distributed scheduling system workflow processing apparatus 1 (refer to fig. 5) can be communicatively connected to each other through a system bus. It should be noted that fig. 5 only shows a computer device 2 having components, but it should be understood that not all of the illustrated components are required to be implemented, and that more or fewer components may be implemented instead.
In this embodiment, the memory 21 includes at least one type of computer readable storage medium, including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the computer device 2. Of course, the memory 21 may also comprise both an internal memory unit of the computer device 2 and an external memory device. In this embodiment, the memory 21 is generally used to store an operating system and various application software installed on the computer device 2, such as program codes of the workflow processing method of the distributed scheduling system of the first embodiment. Further, the memory 21 may be used to temporarily store various types of data that have been output or are to be output.
The processor 23 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 23 is typically used to control the overall operation of the computer device 2, e.g. to perform control and processing related to data interaction or communication with said computer device 2, etc. In this embodiment, the processor 23 is configured to execute the program code or process data stored in the memory 21, for example, execute the distributed scheduling system workflow processing apparatus 1.
The network interface 22 may comprise a wireless network interface or a wired network interface, which network interface 22 is typically used to establish a communication connection between the computer device 2 and other computer devices 2. For example, the network interface 22 is used to connect the computer device 2 to an external terminal through a network, establish a data transmission channel and a communication connection between the computer device 2 and the external terminal, and the like. The network may be an Intranet (Intranet), the Internet (Internet), a global system for mobile communications (Global System of Mobile communication, GSM), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA), a 4G network, a 5G network, bluetooth (Bluetooth), wi-Fi, or other wireless or wired network.
It is noted that fig. 5 only shows a computer device 2 having components 21-23, but it is understood that not all of the illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented.
In this embodiment, the distributed scheduling system workflow processing apparatus 1 stored in the memory 21 may also be divided into one or more program modules stored in the memory 21 and executed by one or more processors (the processor 23 in this embodiment) to complete the present invention.
Example IV
To achieve the above object, the present invention also provides a computer-readable storage medium including a plurality of storage media such as a flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by the processor 23, performs the corresponding functions. The computer readable storage medium of the present embodiment is used for storing the distributed scheduling system workflow processing apparatus 1, and when executed by the processor 23, implements the distributed scheduling system workflow processing method of the first embodiment.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (8)

1. A workflow processing method for a distributed scheduling system, comprising:
acquiring service information of a distributed scheduling system, and selecting a corresponding workflow according to the service information;
scheduling, by a workflow execution engine, execution clusters associated with nodes in the workflow to run according to the service information;
monitoring a first scheduling message of a scheduling instruction sent by the workflow execution engine according to a node in the workflow, a second scheduling message of the scheduling instruction received by the execution cluster associated with the node, a first feedback message of feedback data output by the execution cluster and a second feedback message of the feedback data received by the workflow execution engine in each scheduling path;
identifying the working states of the node and the execution cluster in the scheduling path according to the first scheduling message, the second scheduling message, the first feedback message and the second feedback message monitored in the scheduling path;
the method comprises the steps of scheduling, by a workflow execution engine, execution clusters associated with nodes in the workflow according to the service information, and comprising the following steps:
inquiring the node which accords with a scheduling condition in the current workflow according to scheduling data in the service information, determining the node which needs to be operated by the workflow execution engine according to the state of the node and the operation logic of the node in the workflow, and sending the scheduling instruction to the associated execution cluster according to the node to execute corresponding operation;
The identifying the working states of the node and the execution cluster in the scheduling path according to the first scheduling message, the second scheduling message, the first feedback message and the second feedback message monitored in the scheduling path includes:
after the first scheduling message is monitored, judging whether the second scheduling message is received within a first preset time range or not based on the first scheduling message, if so, indicating that the execution cluster associated with the node is in a normal state, and if not, indicating that an abnormality exists in a transmission message path between the node and the execution cluster associated with the node;
after the first feedback message is monitored, judging whether the second feedback message is received within a second preset time range or not based on the first feedback message, if so, indicating that the execution cluster associated with the node is in a normal state, and if not, indicating that a feedback message path between the execution cluster and the associated node is abnormal;
the first scheduling message is used for recording message sending information, and the sending information comprises a message ID, a working node ID, a service ID and a time stamp for sending the message; the second scheduling message is used for recording message receiving information, and the receiving information comprises a message ID, a working node ID, a service ID and a timestamp of the received message; the first feedback message is used for recording the sending information of the feedback data; the second feedback message is used for recording the receiving information of the feedback data;
The step of monitoring the workflow execution engine in each scheduling path to send a first scheduling message of a scheduling instruction according to a node in the workflow comprises the following steps:
adopting an instruction tracking component to track and monitor the message in the key stage of the scheduling path;
the stage of monitoring the execution cluster to receive the scheduling instruction is realized by monitoring the Zookeeper in the execution unit to receive the scheduling instruction; the stage of monitoring the cluster to output feedback data sends the feedback data to the workflow execution engine by detecting a Zookeeper in the execution unit.
2. The distributed scheduling system workflow processing method of claim 1, wherein the distributed scheduling system comprises a plurality of workflows, the business information comprising business IDs, the business IDs being associated with the respective workflows.
3. The distributed scheduling system workflow processing method of claim 1, wherein before scheduling execution clusters associated with nodes in the workflow to run by a workflow execution engine according to the traffic information, further comprising:
triggering execution clusters associated with nodes in the workflow to run by adopting the workflow execution engine based on a preset instruction according to the logic sequence of the nodes in the workflow, and recording the state of each node and the state of the execution clusters until the workflow is completed;
When the states of all the nodes in the workflow and the states of the execution clusters are in a normal state, scheduling the execution clusters associated with the nodes in the workflow to run through the workflow execution engine according to the service information; if not, generating a message triggering the path abnormality.
4. The workflow processing method of claim 3, wherein the predetermined instruction is a null instruction.
5. The distributed scheduling system workflow processing method of claim 1, wherein nodes in the workflow comprise working nodes, each working node being associated with a respective one of the execution clusters; or (b)
The nodes in the workflow comprise rule nodes and working nodes, the rule nodes comprise a plurality of working nodes, and each working node is associated with a corresponding execution cluster.
6. A distributed scheduling system workflow processing apparatus, comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring service information of a distributed scheduling system and selecting a corresponding workflow according to the service information;
the scheduling unit is used for scheduling the execution cluster associated with the nodes in the workflow to run according to the service information through a workflow execution engine;
The monitoring unit is used for monitoring a first scheduling message of a scheduling instruction sent by the workflow execution engine according to a node in the workflow, a second scheduling message of the scheduling instruction received by the execution cluster associated with the node, a first feedback message of feedback data output by the execution cluster and a second feedback message of the feedback data received by the workflow execution engine in each scheduling path;
the identification unit is used for identifying the working states of the node and the execution cluster in the scheduling path according to the first scheduling message, the second scheduling message, the first feedback message and the second feedback message which are monitored in the scheduling path;
wherein the scheduling unit is further configured to:
inquiring the node which accords with a scheduling condition in the current workflow according to scheduling data in the service information, determining the node which needs to be operated by the workflow execution engine according to the state of the node and the operation logic of the node in the workflow, and sending the scheduling instruction to the associated execution cluster according to the node to execute corresponding operation;
wherein the identification unit is further configured to:
After the first scheduling message is monitored, judging whether the second scheduling message is received within a first preset time range or not based on the first scheduling message, if so, indicating that the execution cluster associated with the node is in a normal state, and if not, indicating that an abnormality exists in a transmission message path between the node and the execution cluster associated with the node;
after the first feedback message is monitored, judging whether the second feedback message is received within a second preset time range or not based on the first feedback message, if so, indicating that the execution cluster associated with the node is in a normal state, and if not, indicating that a feedback message path between the execution cluster and the associated node is abnormal;
the first scheduling message is used for recording message sending information, and the sending information comprises a message ID, a working node ID, a service ID and a time stamp for sending the message; the second scheduling message is used for recording message receiving information, and the receiving information comprises a message ID, a working node ID, a service ID and a timestamp of the received message; the first feedback message is used for recording the sending information of the feedback data; the second feedback message is used for recording the receiving information of the feedback data;
The method for monitoring the first scheduling message of the scheduling instruction sent by the workflow execution engine according to the nodes in the workflow in each scheduling path comprises the following steps:
adopting an instruction tracking component to track and monitor the message in the key stage of the scheduling path;
the stage of monitoring the execution cluster to receive the scheduling instruction is realized by monitoring the Zookeeper in the execution unit to receive the scheduling instruction; the stage of monitoring the cluster to output feedback data sends the feedback data to the workflow execution engine by detecting a Zookeeper in the execution unit.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to any one of claims 1 to 5 when the computer program is executed.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 5.
CN202011500926.9A 2020-12-18 2020-12-18 Workflow processing method and device for distributed scheduling system, computer equipment and storage medium Active CN112529438B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011500926.9A CN112529438B (en) 2020-12-18 2020-12-18 Workflow processing method and device for distributed scheduling system, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011500926.9A CN112529438B (en) 2020-12-18 2020-12-18 Workflow processing method and device for distributed scheduling system, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112529438A CN112529438A (en) 2021-03-19
CN112529438B true CN112529438B (en) 2023-06-09

Family

ID=75001196

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011500926.9A Active CN112529438B (en) 2020-12-18 2020-12-18 Workflow processing method and device for distributed scheduling system, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112529438B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001069431A2 (en) * 2000-03-14 2001-09-20 Commerceroute, Inc. System and method for automating business processes and performing data interchange operations in a distributed computing environment
CN105630589A (en) * 2014-11-24 2016-06-01 航天恒星科技有限公司 Distributed process scheduling system and process scheduling and execution method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10467569B2 (en) * 2014-10-03 2019-11-05 Datameer, Inc. Apparatus and method for scheduling distributed workflow tasks
US10831633B2 (en) * 2018-09-28 2020-11-10 Optum Technology, Inc. Methods, apparatuses, and systems for workflow run-time prediction in a distributed computing system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001069431A2 (en) * 2000-03-14 2001-09-20 Commerceroute, Inc. System and method for automating business processes and performing data interchange operations in a distributed computing environment
CN105630589A (en) * 2014-11-24 2016-06-01 航天恒星科技有限公司 Distributed process scheduling system and process scheduling and execution method

Also Published As

Publication number Publication date
CN112529438A (en) 2021-03-19

Similar Documents

Publication Publication Date Title
US10061578B2 (en) System and method of configuring a data store for tracking and auditing real-time events across different software development tools in agile development environments
US8265980B2 (en) Workflow model for coordinating the recovery of IT outages based on integrated recovery plans
US10057285B2 (en) System and method for auditing governance, risk, and compliance using a pluggable correlation architecture
US20210279164A1 (en) Real Time Application Error Identification and Mitigation
US10073683B2 (en) System and method for providing software build violation detection and self-healing
CN105591821B (en) Monitoring system and service system
US20210133622A1 (en) Ml-based event handling
US9170821B1 (en) Automating workflow validation
US11805005B2 (en) Systems and methods for predictive assurance
CN110222535B (en) Processing device, method and storage medium for block chain configuration file
WO2015116064A1 (en) End user monitoring to automate issue tracking
CN112527484A (en) Workflow breakpoint continuous running method and device, computer equipment and readable storage medium
WO2019195482A9 (en) Database lock
US20050154734A1 (en) Method and system for monitoring and reporting backup results
US11068487B2 (en) Event-stream searching using compiled rule patterns
CN110275795A (en) A kind of O&M method and device based on alarm
CN112529438B (en) Workflow processing method and device for distributed scheduling system, computer equipment and storage medium
Yu Hard disk drive failure prediction challenges in machine learning for multi-variate time series
CN110069382B (en) Software monitoring method, server, terminal device, computer device and medium
Oditis et al. Asynchronous runtime verification of business processes
CN113592453B (en) Information system operation compliance examining method and system based on block chain
CN117667362B (en) Method, system, equipment and readable medium for scheduling process engine
CN117172975A (en) Big data information management system and method
CN113010424A (en) Interface automation test processing method, system, computer equipment and storage medium
CN116703317A (en) Method, system, equipment and medium for automatically processing Jira worksheet based on SOAR

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant