CN113127310B

CN113127310B - Task processing method and device, electronic equipment and storage medium

Info

Publication number: CN113127310B
Application number: CN202110487754.4A
Authority: CN
Inventors: 张俊帆
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2023-09-01
Anticipated expiration: 2041-04-30
Also published as: CN113127310A

Abstract

The application relates to a task processing method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: and monitoring the node task completion condition of the first processing node preconfigured in the first cluster by using the first monitoring node, and sending a task completion notification to the second monitoring node when the node task completion condition of the first processing node is monitored and the task completion notification sent by the second monitoring node is received, so that the second monitoring node stops the node task of the second processing node, wherein the second monitoring node is used for monitoring the node task completion condition of the second processing node preconfigured in the second cluster, and the first processing node and the second processing node are used for processing the same node task. The method and the device avoid the consequence that when all processing nodes of the workflow are deployed in the same cluster, if the cluster is in downtime and the like, the workflow cannot continue to run.

Description

Task processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of workflow technologies, and in particular, to a task processing method, a device, an electronic device, and a storage medium.

Background

With the continuous deep application of big data, the result of processing big data becomes an essential basis for the operation and user analysis of the company, and even the long-term strategy of the company can be affected to a certain extent.

At present, when data is processed in a certain mode (hereinafter referred to as a target processing mode), a plurality of nodes are needed, each node bears a respective computing task, after the data is processed by the computing tasks borne by the nodes, the data is finally processed in the target processing mode, and all the nodes form a workflow. Generally, a node in a workflow often depends on a cluster to run, if the cluster on which a certain node depends is abnormal, such as downtime, cluster computing resource congestion, cluster and faults thereof, the node cannot complete processing of data, and further the workflow on which the node is located cannot run normally, which results in that the processing of the data cannot be completed, and the result of the data processing cannot be obtained.

Disclosure of Invention

The application provides a task processing method, a device, electronic equipment and a storage medium, which aim to solve the problem that workflow cannot normally run to obtain a final data processing result caused by the fact that a certain node cannot finish data processing when a cluster on which the node depends is abnormal in the related technology at least to a certain extent.

According to a first aspect of the present application, there is provided a task processing method applied to a first monitoring node, the method comprising:

monitoring the node task completion condition of a first processing node in a first cluster;

when the completion of the node task of the first processing node is monitored, and a task completion notification sent by a second monitoring node is not received, sending a task completion notification to the second monitoring node, wherein the task completion notification is used for indicating the second monitoring node to stop the node task of the second processing node;

the second monitoring node is configured to monitor a node task completion condition of the second processing node in the second cluster, where the first processing node and the second processing node are configured to process a same node task.

According to a second aspect of the present application, there is provided a task processing method applied to a workflow configurator, the method comprising:

receiving a task processing request, wherein the task processing request is used for requesting high-availability configuration of a target workflow, and the target workflow comprises at least one target processing node;

performing high-availability configuration processing on a target processing node according to the task processing request to obtain a high-availability node group corresponding to the target processing node, wherein the high-availability node group at least comprises a first monitoring node, a second monitoring node, a first processing node and a second processing node;

Replacing the target processing node in the target workflow with the high-availability node group corresponding to the target workflow;

wherein the target processing node is at least one processing node in the target workflow; the node tasks of the first processing node and the node tasks of the second processing node are the same as the corresponding node tasks of the target processing node, the first processing node and the first monitoring node operate in a first cluster, and the second processing node and the second monitoring node operate in a second cluster; the first monitoring node is used for monitoring the node task completion condition of a first processing node in the first cluster; the second monitoring node is used for monitoring the node task completion condition of the second processing node in the second cluster.

According to a third aspect of the present application, there is provided a task processing device, the device comprising:

the monitoring module is used for monitoring the node task completion condition of the first processing node in the first cluster;

the sending module is used for sending a task completion notification to the second monitoring node when the completion of the node task of the first processing node is monitored and the task completion notification sent by the second monitoring node is not received, wherein the task completion notification is used for indicating the second monitoring node to stop the node task of the second processing node;

According to a fourth aspect of the present application, there is provided a task processing device, the device comprising:

the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving a task processing request, the task processing request is used for requesting high-availability configuration of a target workflow, and the target workflow comprises at least one target processing node;

the configuration module is used for carrying out high-availability configuration processing on the target processing node according to the task processing request to obtain a high-availability node group corresponding to the target processing node, wherein the high-availability node group at least comprises a first monitoring node, a second monitoring node, a first processing node and a second processing node;

a replacing module, configured to replace the target processing node in the target workflow with the high available node group corresponding to the target workflow;

According to a fifth aspect of the present application there is provided an electronic device comprising at least one processor and a memory;

the processor is configured to execute a task processing program stored in the memory, so as to implement the task processing method according to the first aspect or the second aspect of the present application.

According to a fourth aspect of the present application, there is provided a storage medium storing one or more programs which, when executed, implement the task processing method according to the first or second aspect of the present application.

The technical scheme provided by the application can comprise the following beneficial effects: the application utilizes a first monitoring node to monitor the node task completion condition of a first processing node pre-configured in a first cluster, and when the node task completion condition of the first processing node is monitored, and a task completion notification sent by a second monitoring node is received, the task completion notification is sent to the second monitoring node so as to enable the second monitoring node to stop the node task of the second processing node, wherein the second monitoring node is used for monitoring the node task completion condition of the second processing node pre-configured in the second cluster, and the first processing node and the second processing node are used for processing the same node task. The application configures processing nodes for processing the same node task in the first cluster and the second cluster respectively, namely the first processing node in the first cluster and the second processing node in the second cluster, so that the first cluster is not affected under the condition that the node task of the second processing node in the second cluster can not be completed quickly because the second cluster can not be operated slowly, and the node task of the first processing node in the first cluster is processed as usual because the first processing node and the second processing node are used for processing the same node task, and the whole task processing process comprising the node task is not stagnated due to the occurrence of a problem of the second cluster. And under the condition that the node tasks of the first processing node are finished first, a task completion notification can be sent to the second monitoring node to instruct the second monitoring node to stop the node tasks of the second processing node, that is, the same node tasks in other clusters do not need to be continuously processed as long as the node tasks of one processing node are finished, so that repeated processing of the same node tasks can be effectively avoided, and system calculation force is saved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a schematic block diagram of a related art task process using a workflow;

FIG. 2 is a flow chart of a task processing method according to an embodiment of the present application;

FIG. 3 is a schematic block diagram of task processing with workflow provided by one embodiment of the present application;

FIG. 4 is a flow chart of a task processing method according to another embodiment of the present application;

FIG. 5 is a schematic flow chart of task data of a synchronous node according to another embodiment of the present application;

FIG. 6 is a flow chart of another synchronous node task data provided by another embodiment of the present application;

FIG. 7 is a schematic diagram of a task processing device according to another embodiment of the present application;

FIG. 8 is a schematic diagram of a task processing device according to another embodiment of the present application;

Fig. 9 is a schematic structural diagram of an electronic device according to another embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

Referring to fig. 1, fig. 1 is a schematic block diagram of task processing using a workflow in the related art.

As shown in FIG. 1, one workflow in the related art has at least two processing nodes, one of which is Hive-1 and the other of which is Hive-2, and the two processing nodes are used for executing different node tasks in the workflow. The workflow will be deployed in a cluster, that is, both the node task corresponding to Hive-1 and the node task corresponding to Hive-2 will run in the cluster.

If the cluster is down or other conditions that node tasks cannot be operated, the node tasks corresponding to the processing nodes of the workflow deployed in the cluster cannot be operated, so that the workflow cannot be completed.

In order to avoid the problems, the application provides that processing nodes for executing the same task are deployed in different clusters, namely, high availability configuration is provided to ensure that one cluster fails, and other clusters can still provide operation threads for processing node tasks of the processing nodes so as to ensure that the node tasks corresponding to the processing nodes can normally run.

Specifically, referring to fig. 2, fig. 2 is a schematic flow chart of a task processing method according to an embodiment of the present application.

The task processing method as shown in fig. 2 may be applied in a workflow configurator to implement a high availability configuration for a target workflow, and specifically includes:

step S201, a task processing request is received, where the task processing request is used to request a high availability configuration for a target workflow, and the target workflow includes at least one target processing node.

In this step, a task processing request is used to request a high availability configuration of a target workflow, and in particular, the task processing request may include a workflow identifier of the target workflow. The task processing request may only include a workflow identifier of the target workflow, and at this time, all processing nodes in the target workflow may be configured to be high availability by default; of course, the task processing request may not only include the workflow identifier of the target workflow, but may also include an identifier character, where the configuration information may be used to customize the high availability configuration of the target workflow.

It should be noted that, in the workflow platform, each workflow will have a corresponding unique workflow identifier, for any workflow, the processing node included in the workflow is generally fixed, and the processing node that needs to be configured with high availability is often fixed, so for each workflow, there may be customized configuration information corresponding to the workflow.

The configuration information may be preset by a developer, or may be generated according to requirements. In general, the configuration information needs to include target processing nodes that need to perform high-availability coordination, and multiple target clusters corresponding to each target processing node.

When preset by a developer, the developer can manually select the processing nodes consuming the calculation power in the target workflow as target processing nodes, and the selected processing nodes can be one or more than one, and for target clusters, the developer can select a plurality of clusters with the most stable operation and/or the strongest calculation capability.

The developer may reserve configuration information required for making high availability configurations in the workflow configurator, respectively.

The configuration information corresponding to any workflow may include a node identifier of at least one target processing node in the workflow, which needs to be configured with high availability, and a plurality of target clusters to which node tasks of the target processing node need to be configured.

It should be noted that, the target cluster refers to a cluster in which a node task of a target processing node needs to be configured. Specifically, each target processing node corresponds to one target cluster set, for example, target processing node a may correspond to target cluster a, target cluster B, and target cluster C, and target processing node B may correspond to target cluster C, target cluster D, target cluster E, and target cluster F. That is, the target clusters corresponding to each target processing node may be the same or different, and the number of clusters may be the same or different.

In addition, taking the example that the target processing node a can correspond to the target cluster a, the target cluster B and the target cluster C, in the target cluster a, the target cluster B and the target cluster C, the node tasks of the target processing node a are processed in parallel.

To facilitate the ability of the high availability configuration to accommodate system environments that may change, such as cluster changes, a developer may customize configuration information reserved in the workflow configurator, such as the number of target clusters and cluster identification for each target cluster.

In addition, the task processing request can directly carry configuration information of the target workflow besides the workflow identifier of the target workflow, and based on the configuration information, a developer does not need to reserve corresponding configuration information in a workflow configurator, and can directly utilize the configuration information in the task processing request to carry out high-availability configuration.

Of course, priority may be set for the configuration information reserved in the workflow configurator and the configuration information included in the task processing request, the priority of the configuration information included in the task processing request being higher than the priority of the configuration information reserved in the workflow configurator. That is, for a target workflow, if the configuration information for the target workflow is included in the task processing request, the workflow configurator is limited to perform high availability configuration on the target workflow according to the configuration information in the task processing request, and if the configuration information for the target workflow is not included in the task processing request, the workflow configurator performs high availability configuration according to the configuration information reserved in the workflow configurator. Therefore, the flexibility of configuration can be improved, the operation of a developer can be simplified, and the configuration efficiency is improved.

When the target processing node is generated according to the requirement, the determination of the target processing node can be directly specified by a user, for example, the node identification of the target processing node is added into the task processing request, and when the high-availability configuration is carried out on the target workflow, the target processing node needing the high-availability configuration can be obtained according to the node identification in the task processing request.

Of course, a processing node satisfying the preset condition may be selected as the target processing node. The preset condition may be a weight of the whole target workflow, which is preset for the processing node, and the more important the node task of the processing node is, the greater the weight is, and the processing node with the weight higher than a preset weight threshold is determined as the target processing node. The preset condition may also be a processing failure rate of a node task of a processing node in a historical operation process of the target workflow, that is, a ratio of the number of times that the node task of a certain processing node is unsuccessfully processed and completed to the total number of times that the target workflow is operated, and the processing node with the processing failure rate greater than a preset failure rate threshold is determined as the target processing node.

The target cluster can be any cluster, a cluster selected by a user, or a more idle cluster obtained through a load balancing algorithm, wherein the more idle cluster refers to a cluster with fewer processing tasks or no processing tasks currently.

When the idle clusters are obtained by using the load balancing algorithm, the idle clusters can be processed based on the current task process number of the clusters, specifically, the current task process number of all the clusters can be obtained, and the least clusters are selected as target clusters; the method can also be processed based on the current residual running memory space of the clusters, specifically, the current residual running memory space of all the clusters can be obtained first, and a plurality of clusters with the largest running memory space are selected as target clusters.

It should be noted that, the number of target clusters selected here may be determined by the user at the discretion of the user, for example, the number of clusters required is encapsulated into the task processing request, and after the workflow configurator obtains the task processing request, the number of clusters required to be selected is parsed, and then a corresponding number of target clusters are selected through a load balancing algorithm.

Step S202, performing high-availability configuration processing on the target processing node according to the task processing request to obtain a high-availability node group corresponding to the target processing node.

The high availability node group includes at least a first monitoring node, a second monitoring node, a first processing node, and a second processing node.

In this step, in the face of any one of the three ways of presetting configuration information by a developer mentioned in step S201, the configuration information of the target workflow is obtained according to the task processing request, which may specifically be as follows:

for the first mode, that is, the task processing request includes the workflow identifier of the target workflow, the workflow configurator reserves configuration information for each workflow. The configuration information corresponding to the workflow identifier can be found from all the reserved configuration information.

For the second way, the task processing request includes the workflow identification of the target workflow and the configuration information for the target workflow. Configuration information for the target workflow may be parsed directly from the task processing request.

For the third way, that is, the configuration information reserved in the workflow configurator and the configuration information included in the task processing request are set with priority. The task processing request can be firstly analyzed, and if the analyzed result contains the configuration information of the target workflow, the analyzed configuration information is directly used as the configuration information for carrying out high-availability configuration on the target workflow; if the analysis result only contains the workflow identifier of the target workflow, the configuration information corresponding to the workflow identifier can be found from all the reserved configuration information, and the configuration information is used as the configuration information for carrying out high-availability configuration on the target workflow.

Similarly, in the case of generating the configuration information according to the requirement, the configuration information of the target workflow is obtained according to the task processing request, that is, the target processing node and the target cluster requiring the high available configuration in the target workflow are obtained, and the specific reference may be made to the related manner of obtaining the target processing node and the target cluster requiring the high available configuration in step S201, which is not described herein again.

After the configuration information is obtained, a high availability configuration may be performed on the target workflow based on the configuration information. It should be noted that, the configuration information may include configuration parameters for at least one target processing node in the target workflow.

For any target processing node, the corresponding configuration parameters may include a plurality of target clusters to which the node task of the target processing node needs to be configured, a first storage location in each cluster for storing the node task data finally, and a second storage location for temporarily storing the node task data of the target processing node.

When the specific configuration processing is high, the number of the target clusters can be determined first, then, processing nodes with the same number as the target clusters are generated, the processing nodes and the target processing nodes process the same node task, and it is to be noted that the target clusters at least comprise a first cluster and a second cluster, and correspondingly, the generated processing nodes at least comprise the first processing node and the second processing node, wherein the first processing node operates in the first cluster, and the second processing node operates in the second cluster.

In order to facilitate automatic monitoring of processing nodes running in each target cluster, monitoring nodes, namely a first monitoring node and a second monitoring node, are respectively configured for the first processing node and the second processing node when in high-availability configuration, wherein the first monitoring node runs in the first cluster and is used for monitoring the node task completion condition of the first processing node, and the second monitoring node runs in the second cluster and is used for monitoring the node task completion condition of the second processing node.

It should be noted that, after the high availability configuration processing is performed on the target processing node, a high availability node group corresponding to the target processing node is obtained, where the high availability node group at least includes the first monitoring node, the second monitoring node, the first processing node and the second processing node.

Correspondingly, in the case that the target cluster is a second cluster of the first cluster, in the first cluster, at least the first monitoring node and the first processing node need to be operated, and in the second cluster, at least the second monitoring node and the second processing node need to be operated. That is, at least one processing node is required in a target cluster, and a monitoring node monitors the completion of the node tasks of the processing node. It should be noted that, the first monitoring node is configured to monitor the node task completion situation of the first processing node, and the second monitoring node is configured to monitor the node task completion situation of the second processing node, and a specific monitoring manner will be described in a subsequent embodiment, which is not described herein.

It should be noted that a monitoring node may monitor the completion of the node tasks of at least one processing node. The monitoring nodes and the processing nodes can be in one-to-one correspondence; alternatively, one monitoring node can monitor the node task completion of multiple processing nodes simultaneously. For example, in the foregoing example, both the target processing node a and the target processing node B are configured in the target cluster C, and then there are the first processing node a with the same node task as the target processing node a and the first processing node B with the same node task as the target processing node B in the target cluster C, where only one monitoring node may be set in the target cluster C, and the node task completion situation of the first processing node a and the first processing node B may be monitored.

In order for the nodes in the high availability node group to exhibit a logical workflow sequence, a branch indication node and/or a branch merge node may also be included in the high availability node group in this embodiment.

The branch instruction node is arranged in front of a plurality of processing nodes with parallel processing requirements in the high-availability node group and is used for indicating node tasks of the processing nodes behind the parallel processing branch instruction node; the branch merging node is arranged behind a plurality of processing nodes with parallel processing requirements in the high-availability node group, and is used for indicating the node tasks of the processing nodes behind the branch merging node after waiting for the node tasks of the processing nodes behind the branch merging node to finish.

In a specific example, the configuration information is shown in table 1 for ease of illustration.

TABLE 1

The target processing nodes in table 1, i.e. the two processing nodes shown in fig. 1, can refer to fig. 3 after the high availability configuration processing in this step, and fig. 3 is a schematic block diagram of task processing using a workflow according to an embodiment of the present application.

As shown in FIG. 3, the high availability node group corresponding to Hive-1 may include a branch indication node (Fork 1 node); run in zone a: hive-1 node, HA-1 node, branch indication node (Fork 2 node) in Bd cluster; run in zone B: hive-1 node, HA-1 node, branch indication node (Fork 3 node) in bjzyx-g1 cluster. The HA-1 node is a monitoring node.

The high availability node group corresponding to Hive-2 may include operating in zone a: hive-2 node, HA-2 node, branch indication node (Fork 4 node), branch merging node (Join 1 node) in Bd cluster; run in zone B: hive-2 node, HA-2 node, branch indication node (Fork 5 node), branch merging node (Join 2 node) in bjzyx-g1 cluster; branch merge node (Join 3 node). The HA-2 node is a monitoring node.

After the workflow runs to the branch instruction node (Fork 1 node), the Fork1 node instructs parallel processing in the area a: the branches of the Bd cluster indicate node tasks and B areas of the nodes (Fork 2 nodes): the branches of the bjzyx-g1 cluster indicate the node tasks of the node (the Fork3 node).

Zone a: when the Fork2 nodes in the Bd cluster process the node tasks, the parallel processing area A is indicated: node tasks of Hive-1 nodes and node tasks of HA-1 nodes in Bd clusters, and area A: the node task of the HA-1 node in the Bd cluster is to monitor the A area: and completing the node task of the Hive-1 node in the Bd cluster.

Then, zone a: when the node task of the Hive-1 node of the Bd cluster is completed or stopped, the area A is processed: the node task of a branch indication node (Fork 4 node) in the Bd cluster indicates parallel processing A area: node tasks of Hive-2 nodes and node tasks of HA-2 nodes in Bd cluster, A area: the node task of the HA-2 node in the Bd cluster is to monitor the A area: and completing the node task of the Hive-2 node in the Bd cluster.

Zone a: the branch merge node (Join 1 node) in the Bd cluster will wait for zone a: after the node tasks of the Hive-2 node and the node tasks of the HA-2 node in the Bd cluster are all completed (or stopped), the node tasks of the processing nodes after processing are indicated.

Zone B: when the branch indication node (Fork 3 node) of the bjzyx-g1 cluster processes the node task, the parallel processing B area is indicated: node tasks of Hive-1 nodes and node tasks of HA-1 nodes in the bjzyx-g1 cluster, zone B: the node task of the HA-1 node in the bjzyx-g1 cluster is to monitor the B area: node task completion for the bjzyx-g1 cluster Hive-1 node.

Then, zone B: when the node task of the Hive-1 node of the bjzyx-g1 cluster is completed or stopped, the B area is processed: node tasks of branch indication nodes (Fork 5 nodes) in the bjzyx-g1 cluster indicate parallel processing zone B: node tasks of Hive-2 nodes and node tasks of HA-2 nodes in the bjzyx-g1 cluster, zone B: the node task of the HA-2 node in the bjzyx-g1 cluster is to monitor the B area: node task completion for Hive-2 nodes in the bjzyx-g1 cluster.

Zone B: the branch merge node (Join 2 node) in the bjzyx-g1 cluster would wait for zone B: after the node tasks of Hive-2 nodes and the node tasks of HA-2 nodes in the bjzyx-g1 cluster are all completed (or stopped), the node tasks of the processing nodes after processing are indicated.

Finally, the branch merge node (Join 3 node) waits to reach zone a: branch merging node (Join 1 node) and B area in Bd cluster: when all branch merging nodes (Join 2 nodes) in the bjzyx-g1 cluster indicate node tasks of processing nodes after processing, the node tasks of the processing nodes after processing are indicated.

It should be noted that, during the high availability configuration processing, the high availability configuration processing for a single target processing node may be faced, that is, the processing nodes before and after the target processing node in the target workflow do not need to perform the high availability configuration processing; alternatively, it is also possible that multiple target processing nodes in succession may all be faced with a need for high availability configuration processing, such as the example shown in FIG. 3.

For the case of a single target processing node, it is desirable to include both branch indication nodes pointing to different clusters and branch merge nodes hosting different clusters in the high availability node group.

For the case of a plurality of continuous target processing nodes, the high-availability node group corresponding to the target processing node with the forefront sequence only needs to include branch indication nodes pointing to different clusters, and the high-availability node group corresponding to the target processing node with the last sequence only needs to include branch merging nodes adapting to different clusters.

Step S203, the target processing nodes in the target workflow are replaced by the high-availability node groups corresponding to the target workflow.

In the step, the target processing nodes in the target workflow are replaced by the high-availability node groups corresponding to the target workflow, and the target workflow after the high-availability configuration processing can be obtained. The flow relationship between the target workflow processed by the high-availability configuration and the node task of the original target workflow is substantially the same.

In this embodiment, the high-availability configuration can be performed on the target workflow through preset configuration information or configuration information automatically generated according to requirements, so that a complicated process of manually configuring the processing nodes into the cluster is avoided, the monitoring nodes are introduced, and the defects that the maintenance process is complex and the maintenance cost is high possibly caused by manually performing maintenance are also avoided.

Specifically, referring to fig. 4, a process of performing task processing by using a target workflow after performing high-availability configuration processing may be referred to as a flow chart of a task processing method according to another embodiment of the present application, fig. 4 is a flow chart of the task processing method.

For convenience of explanation, the present embodiment is described taking an example in which the target cluster includes a first cluster and a second cluster, and one of the high availability node groups (a first monitoring node, a second monitoring node, a first processing node, and a second processing node).

As shown in fig. 4, the task processing method provided in this embodiment may be used for any monitoring node in the target workflow after the high availability configuration, and for convenience of explanation, this embodiment is described by taking application to the first monitoring node as an example, where the task processing method specifically may include:

step S401, monitoring a node task completion condition of a first processing node in the first cluster.

In this step, the first monitoring node monitors the node task completion condition of the first processing node and may determine by detecting whether the target data exists at the second preset position. It should be noted that, when the first processing node completes the node task, the node task data is temporarily stored in the second storage location of the first cluster.

In a specific example, the node task data may be stored in a second storage location in the form of a file named by the node identifier of the first processing node, for example, workflow-hive1.Done_tmp, where workflow may be the workflow identifier of the target workflow, hive1 is the node identifier of the first processing node, and in this step, it may be detected whether a file named as "workflow-hive1.Done_tmp" exists in the second storage location.

Specifically, whether target data exists in the second storage position of the first cluster or not can be periodically detected, wherein the target data is data generated when the node task of the first processing node is completed; and if the target data exists in the second storage position, determining that the node task of the first processing node in the first cluster is completed.

The target data may be a file named "workflow-hive1.Done_tmp" in the foregoing example, among others.

Step S402, when the completion of the node task of the first processing node is monitored, and the task completion notification sent by the second monitoring node is not received, the task completion notification is sent to the second monitoring node, and the task completion notification is used for indicating the second monitoring node to stop the node task of the second processing node.

In this step, the second monitoring node is configured to monitor a node task completion condition of a second processing node in the second cluster, where the first processing node and the second processing node are configured to process a same node task.

When the completion of the node task of the first processing node is monitored, and the task completion notification sent by the second monitoring node is not received, the node task of the first processing node is indicated to be completed first, and the second monitoring node can be notified to stop the node task of the second processing node. That is, the processing nodes in different clusters having the same node task, which of the processing nodes in the cluster has the first task completed, and the corresponding monitoring node in which cluster sends a task completion notification to the monitoring nodes in other clusters.

In this embodiment, since processing nodes that process the same node task are configured in the first cluster and the second cluster at the same time, that is, the first processing node in the first cluster and the second processing node in the second cluster, if the second cluster cannot run or runs slowly, so that the node task of the second processing node in the second cluster cannot be completed quickly, the first cluster is not affected, the node task of the first processing node in the first cluster is still processed as usual, and since the first processing node and the second processing node are used for processing the same node task, the whole task processing process including the node task is not stalled due to the occurrence of a problem in the second cluster. And under the condition that the node tasks of the first processing node are finished first, a task completion notification can be sent to the second monitoring node to instruct the second monitoring node to stop the node tasks of the second processing node, that is, the same node tasks in other clusters do not need to be continuously processed as long as the node tasks of one processing node are finished, so that repeated processing of the same node tasks can be effectively avoided, and system calculation force is saved.

After the node task of the first processing node is completed, the node task data may be stored in a first storage location of the first cluster and synchronized to other clusters, such as the second cluster. There are two ways to specifically synchronize.

In the first mode of synchronizing the node task data, when the node task of the first processing node is monitored to be completed and a task completion notification sent by the second monitoring node is not received, the mapping relationship between the node task data of the first processing node and the node identifier of the first processing node and the node task data of the first processing node are stored in a first storage position of the first cluster.

It should be noted that, the mapping relationship between the node task data of the first processing node and the node identifier of the first processing node may exist in the form of a file and a file name, for example, the node task data is stored in a file in a preset format, then the file name is named as the node identifier of the first processing node, and then the file named as the node identifier is stored in the first storage location. Based on this, since there is a mapping relationship of the node identification and the node task data in the first storage location, the node task data of each processing node is not cluttered.

In addition, under the condition that a data acquisition request carrying the node identification of the first processing node and sent by the second monitoring node is received, node task data corresponding to the node identification of the first processing node is called from the first storage position of the first cluster according to the node identification. And finally, the called node task data is sent to a second monitoring node.

Because the first storage file stores the file named as the node identifier, the file named as the node identifier of the first processing node can be found directly from the first storage position according to the node identifier of the first processing node, and then the file is sent to the second monitoring node.

In a second mode of synchronizing the node task data, when the node task of the first processing node is monitored to be completed and a task completion notification sent by the second monitoring node is not received, the node task data of the first processing node is packaged into the task completion notification; and sending the task completion notification to the second monitoring node.

It should be noted that, the method of sending task node data directly through task completion notification simplifies the interaction process between clusters, and can realize synchronization of node task data more quickly.

Because the target processing nodes configured in each cluster may be different, in order to ensure the consistency of the final node task data in each cluster, two storage positions are set in any cluster, taking the first cluster as an example, that is, a first storage position and a second storage position, where the second storage position is a temporary storage point of the node task data of the processing node, that is, a temporary storage point, where it is convenient for the first monitoring node to determine whether the node task of the first monitoring node is completed by whether the temporary storage point has the node task data of the first processing node, and after the determination is completed, the first monitoring node synchronizes the node task data of the temporary storage point to the final storage positions of the other clusters, that is, the first storage position, so as to ensure the consistency of the node task data in each cluster.

In addition, if the first processing node in the first cluster does not complete the node task first, that is, the second processing node in the second cluster completes the node task first, that is, before the node task of the first processing node is completed, the first control node receives the task completion notification sent by the second monitoring node, in order to avoid the repeated processing of the same node task, the first monitoring node stops the node task of the first processing node after receiving the task completion notification sent by the second monitoring node.

The specific manner of stopping the node task of the first processing node may be to directly kill the process corresponding to the node task in the first cluster, or to suspend the process corresponding to the node task in the first cluster, or the like.

In addition, if the task completion notification sent by the second monitoring node is received before the node task of the first processing node is completed, the node task data of the second processing node can be synchronized to the first storage position of the first cluster, so that the uniformity of the node task data of related target data streams in the first cluster and the second cluster is ensured, and the node task data of the target processing node with high available configuration can be conveniently obtained from any cluster after the target workflow is completed.

Specifically, there may be various manners of synchronizing task data of a node, and the first may be shown in fig. 5, where fig. 5 is a schematic flow diagram of task data of a synchronization node according to another embodiment of the present application.

As shown in fig. 5, the process of synchronizing task data of a node provided in this embodiment may include:

step S501, a data acquisition request carrying a node identifier of a second processing node is sent to a second monitoring node, where the data acquisition request is used for requesting node task data of the second processing node.

When storing the node task data, the mapping relationship between the node task data and the node identifier of the processing node may be stored together with the node task data. Because the mapping relation between the node identification and the node task data exists in the first storage position, the node task data of each processing node cannot be disordered.

Step S502, receiving node task data of a second processing node sent by a second monitoring node.

Step S503, storing the node task data of the second processing node in the first storage location of the first cluster.

It should be noted that, since the first processing node and the second processing node process the same node task, the first processing node and the second processing node may share a node identifier, such as an identifier of a target processing node (the original processing node processed by the high-availability configuration). Therefore, the storage of the node task data in the first cluster and the second cluster can be kept highly consistent, and the later extraction is convenient.

A second way of synchronizing task data of a node may be shown in fig. 6, where fig. 6 is a schematic flow diagram of task data of another synchronization node according to another embodiment of the present application.

As shown in fig. 6, another process for synchronizing task data of a node provided in this embodiment may include:

and step S601, analyzing a task completion notification sent by the second monitoring node to obtain node task data of the second processing node.

The premise of this step is that the task completion notification sent by the second monitoring node includes the node task data of the second processing node.

Step S602, storing node task data of the second processing node in a first storage location of the first cluster.

The manner of storing in the first storage location in this step may refer to step S503, which is not described herein.

Similarly, the second monitoring node stores the node task data in the second cluster in two ways, relative to the above-mentioned two ways of synchronizing the node task data. In order to maintain the consistency of the execution bodies in this embodiment, the following two ways of storing the task data of the node still take the execution of the first monitoring node as an example, and the execution of the second monitoring node is similar to the execution of the first monitoring node, and the execution of the second monitoring node will not be described in detail herein.

In a specific embodiment, taking the task processing procedure of Hive-1 node in the workflow after the high availability configuration illustrated in fig. 3 as an example, zone a: the HA-1 node of the Bd cluster monitors the area A firstly: when node task data of Hive-1 node appears in the temporary storage point (data_tmp path in table 1) of the Bd cluster, the node task data is sent to zone B: the HA-1 node of the bjzyx-g1 cluster sends a task completion notification, zone B: after the HA-1 node of the bjzyx-g1 cluster receives the task completion notification corresponding to the Hive-1 node, killing the B area: hive-1 nodes in the bjzyx-g1 cluster (stop node tasks for Hive-1 nodes in the bjzyx-g1 cluster), then zone A: the HA-1 node of the Bd cluster stores node task data in the temporary storage point into the A area: in the first storage location of the Bd cluster (data_final path in table 1) and synchronized to zone B: in the first storage location of the bjzyx-g1 cluster (data_final path in table 1).

It should be noted that, the method of the present application is not limited to being suitable for workflow scenes in which report data needs to be produced in time and has extremely high importance, and based on the method of the present application, the stability and timeliness of the operation of tasks corresponding to the workflow can be ensured, the cost of manual configuration is reduced by simple configuration (automatic configuration of a workflow configurator), the cost of manual maintenance is reduced by introducing monitoring nodes, and the iterative update and operation and maintenance of the workflow are facilitated.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a task processing device according to another embodiment of the application.

As shown in fig. 7, the task processing device provided in this embodiment may include:

the monitoring module 701 is configured to monitor a node task completion condition of a first processing node in the first cluster;

a sending module 702, configured to send a task completion notification to the second monitoring node when it is monitored that the node task of the first processing node is completed and a task completion notification sent by the second monitoring node is not received, where the task completion notification is used to instruct the second monitoring node to stop the node task of the second processing node;

the second monitoring node is used for monitoring the node task completion condition of a second processing node in the second cluster, and the first processing node and the second processing node are used for processing the same node task.

In an alternative embodiment, the apparatus further comprises:

the stopping module is used for stopping the node task of the first processing node if the task completion notification sent by the second monitoring node is received before the node task of the first processing node is completed;

and the synchronization module is used for synchronizing the node task data of the second processing node to the first storage position of the first cluster by the user.

In an alternative embodiment, the synchronization module includes:

the first sending unit is used for sending a data acquisition request carrying the node identifier of the second processing node to the second monitoring node, wherein the data acquisition request is used for requesting the node task data of the second processing node;

the receiving unit is used for receiving the node task data of the second processing node sent by the second monitoring node;

and the first storage unit is used for storing the node task data of the second processing node to a first storage position of the first cluster.

In an alternative embodiment, the task completion notification sent by the second monitoring node includes node task data of the second processing node;

the synchronization module includes:

the analyzing unit is used for analyzing the task completion notification sent by the second monitoring node to obtain node task data of the second processing node;

And the second storage unit is used for storing the node task data of the second processing node to the first storage position of the first cluster.

In an alternative embodiment, the apparatus further comprises:

the storage module is used for storing the mapping relation between the node task data of the first processing node and the node identification of the first processing node and the node task data of the first processing node to a first storage position of the first cluster when the completion of the node task of the first processing node is monitored and the task completion notification sent by the second monitoring node is not received;

the calling module is used for calling node task data corresponding to the node identification of the first processing node from the first storage position of the first cluster according to the node identification under the condition that a data acquisition request carrying the node identification of the first processing node and sent by the second monitoring node is received;

and the sending module is used for sending the called node task data to the second monitoring node.

In an alternative embodiment, the sending module includes:

the packaging unit is used for packaging the node task data of the first processing node into a task completion notification when the node task of the first processing node is monitored to be completed and the task completion notification sent by the second monitoring node is not received;

And the second sending unit is used for sending the task completion notification to the second monitoring node.

In an alternative embodiment, the monitoring module includes:

the detection unit is used for periodically detecting whether target data exists in the second storage position of the first cluster, wherein the target data is generated when the node task of the first processing node is completed;

and the determining unit is used for determining that the node task of the first processing node in the first cluster is completed if the target data exists in the second storage position.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a task processing device according to another embodiment of the present application.

A receiving module 801, configured to receive a task processing request, where the task processing request is used to request a high availability configuration for a target workflow, and the target workflow includes at least one target processing node;

the configuration module 802 is configured to perform high availability configuration processing on the target processing node according to the task processing request, so as to obtain a high availability node group corresponding to the target processing node, where the high availability node group at least includes a first monitoring node, a second monitoring node, a first processing node and a second processing node;

a replacing module 803, configured to replace a target processing node in the target workflow with a high available node group corresponding to the target workflow;

The target processing node is at least one processing node in the target workflow; the node tasks of the first processing node and the node tasks of the second processing node are the same as the node tasks of the corresponding target processing nodes, the first processing node and the first monitoring node operate in a first cluster, and the second processing node and the second monitoring node operate in a second cluster; the first monitoring node is used for monitoring the node task completion condition of the first processing node in the first cluster; the second monitoring node is used for monitoring the node task completion condition of the second processing node in the second cluster.

In an optional embodiment, for a high available node group corresponding to any target processing node, the high available node group further includes a branch instruction node and/or a branch merging node;

the branch indication node is arranged in front of a plurality of processing nodes with parallel processing requirements in the high-availability node group and is used for indicating node tasks of the processing nodes behind the parallel processing branch indication node;

the branch merging node is arranged behind a plurality of processing nodes with parallel processing requirements in the high-availability node group, and is used for indicating the node tasks of the processing nodes behind the branch merging node after waiting for the node tasks of the processing nodes behind the branch merging node to finish.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an electronic device according to another embodiment of the application.

As shown in fig. 9, the electronic device provided in this embodiment includes: at least one processor 901, memory 902, at least one network interface 903, and other user interfaces 904. The various components in the electronic device 900 are coupled together by a bus system 905. It is appreciated that the bus system 905 is employed to enable connected communications between these components. The bus system 905 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled as bus system 905 in fig. 9.

The user interface 904 may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, a trackball, a touch pad, or a touch screen, etc.).

It will be appreciated that the memory 902 in embodiments of the application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DRRAM). The memory 902 described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some implementations, the memory 902 stores the following elements, executable units or data structures, or a subset thereof, or an extended set thereof: an operating system 9021 and a second application 9022.

The operating system 9021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The second application 9022 includes various second applications, such as a Media Player (Media Player), a Browser (Browser), and the like, for implementing various application services. A program for implementing the method according to the embodiment of the present invention may be included in the second application 9022.

In the embodiment of the present invention, the processor 901 is configured to execute the method steps provided in the foregoing method embodiments by calling the program or the instruction stored in the memory 902, specifically, the program or the instruction stored in the second application 9022

The method disclosed in the above embodiment of the present invention may be applied to the processor 901 or implemented by the processor 901. Processor 901 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 901 or instructions in the form of software. The processor 901 may be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software elements in a decoding processor. The software elements may be located in a random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 902, and the processor 901 reads information in the memory 902 and performs the steps of the above method in combination with its hardware.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (dspev, DSPD), programmable logic devices (Programmable Logic Device, PLD), field programmable gate arrays (Field-Programmable Gate Array, FPGA), general purpose processors, controllers, microcontrollers, microprocessors, other electronic units for performing the functions of the application, or a combination thereof.

For a software implementation, the techniques herein may be implemented by means of units that perform the functions herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

The embodiment of the application also provides a storage medium (computer readable storage medium). The storage medium here stores one or more programs. Wherein the storage medium may comprise volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard disk, or solid state disk; the memory may also comprise a combination of the above types of memories.

When one or more programs in the storage medium are executable by one or more processors, the task processing method executed on the electronic device side is implemented.

The processor is configured to execute a task processing program stored in the memory, so as to implement the task processing method provided in the foregoing embodiment, which is executed on the electronic device side below.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims

1. A method of task processing, applied to a first monitoring node, the method comprising:

if the task completion notification sent by the second monitoring node is received before the node task of the first processing node is completed, stopping the node task of the first processing node;

synchronizing node task data of a second processing node to a first storage location of the first cluster;

2. The method of claim 1, wherein synchronizing the node task data of the second processing node to the first storage location of the first cluster comprises:

Transmitting a data acquisition request carrying a node identifier of a second processing node to the second monitoring node, wherein the data acquisition request is used for requesting node task data of the second processing node;

receiving node task data of the second processing node, which is sent by the second monitoring node;

and storing the node task data of the second processing node to a first storage position of the first cluster.

3. The method of claim 1, wherein the task completion notification issued by the second monitoring node includes node task data of the second processing node;

the synchronizing node task data of the second processing node to a first storage location of the first cluster includes:

analyzing a task completion notification sent by the second monitoring node to obtain node task data of the second processing node;

4. The method according to claim 1, wherein the method further comprises:

when the node task of the first processing node is monitored to be completed and a task completion notification sent by a second monitoring node is not received, storing the mapping relation between the node task data of the first processing node and the node identification of the first processing node and the node task data of the first processing node into a first storage position of the first cluster;

Under the condition that a data acquisition request carrying the node identification of the first processing node and sent by a second monitoring node is received, node task data corresponding to the node identification of the first processing node is called from a first storage position of the first cluster according to the node identification;

and sending the fetched node task data to the second monitoring node.

5. The method according to claim 1, wherein when the first processing node is monitored for completion of a node task and a task completion notification sent by a second monitoring node is not received, sending the task completion notification to the second monitoring node includes:

when the node task of the first processing node is monitored to be completed and a task completion notification sent by a second monitoring node is not received, encapsulating the node task data of the first processing node into a task completion notification;

and sending the task completion notification to the second monitoring node.

6. The method of claim 1, wherein monitoring node task completion of a first processing node in a first cluster comprises:

periodically detecting whether target data exists in a second storage position of the first cluster, wherein the target data is generated when a node task of the first processing node is completed;

And if the target data exists in the second storage position, determining that the node task of the first processing node in the first cluster is completed.

7. A method of task processing for a workflow configurator, the method comprising:

performing high-availability configuration processing on a target processing node according to the task processing request to obtain a high-availability node group corresponding to the target processing node, wherein the high-availability node group at least comprises a first monitoring node, a second monitoring node, a first processing node and a second processing node, and for the high-availability node group corresponding to any target processing node, the high-availability node group further comprises a branch indication node and/or a branch merging node; the branch indication node is arranged in front of a plurality of processing nodes with parallel processing requirements in the high-availability node group and is used for indicating the node tasks of the processing nodes after the branch indication node are processed in parallel; the branch merging node is arranged behind a plurality of processing nodes with parallel processing requirements in the high-availability node group, and is used for indicating any node of the processing nodes behind the branch merging node after the node task of the processing node behind the branch merging node is completed;

8. A task processing device, the device comprising:

The synchronization module is used for synchronizing the node task data of the second processing node to a first storage position of the first cluster by a user;

9. A task processing device, the device comprising:

the configuration module is used for carrying out high-availability configuration processing on the target processing nodes according to the task processing request to obtain high-availability node groups corresponding to the target processing nodes, wherein the high-availability node groups at least comprise a first monitoring node, a second monitoring node, a first processing node and a second processing node, and for the high-availability node groups corresponding to any target processing node, the high-availability node groups further comprise branch indication nodes and/or branch merging nodes; the branch indication node is arranged in front of a plurality of processing nodes with parallel processing requirements in the high-availability node group and is used for indicating the node tasks of the processing nodes after the branch indication node are processed in parallel; the branch merging node is arranged behind a plurality of processing nodes with parallel processing requirements in the high-availability node group, and is used for indicating any node of the processing nodes behind the branch merging node after the node task of the processing node behind the branch merging node is completed;

10. An electronic device, comprising: at least one processor and memory;

the processor is configured to execute a task processing program stored in the memory, so as to implement the task processing method according to any one of claims 1 to 7.

11. A storage medium storing one or more programs which, when executed, implement the task processing method of any one of claims 1-7.