WO2018000878A1

WO2018000878A1 - Distributed task processing method and apparatus

Info

Publication number: WO2018000878A1
Application number: PCT/CN2017/079230
Authority: WO
Inventors: 乔雷
Original assignee: 华为技术有限公司
Priority date: 2016-06-29
Filing date: 2017-04-01
Publication date: 2018-01-04
Also published as: CN107547608A

Abstract

Provided are a distributed task processing method and apparatus. The method comprises: a server acquiring an execution result of a current task on a node in a distributed system, wherein the distributed system comprises a plurality of nodes, and each node comprises at least one task of the same service flow, and the execution result carries an identifier of the current task; according to the identifier of the current task and the execution result, the server acquiring a first error correction policy corresponding to the current task from a pre-set error correction policy set, wherein the error correction policy set comprises a correlation between the identifier of the current task, the execution result and the first error correction policy, and the first error correction policy is used for indicating an action to be executed in the next step by the node; and the server sending the first error correction policy to the node. The distributed task processing method and apparatus provided in the present application can improve the reliability of distributed tasks.

Description

Distributed task processing method and device

Technical field

The present application relates to computer technology, and in particular, to a distributed task processing method and apparatus.

Background technique

A distributed system is a computer system that connects multiple computers at different locations or with different functions or with different data through a communication network to coordinate large-scale information processing tasks under the unified management and control of the control system. Currently, distributed systems can distribute the tasks included in any distributed task on different computers to improve the execution speed of distributed tasks.

In the prior art, when developers develop the above distributed tasks, the error correction code (ie, the code for error processing) of each task in the distributed task is coupled into the service code of each task, so that Distributed tasks can automatically handle errors of any task during the running process, improving the efficiency of distributed tasks.

However, since the error correction code of each task in the above distributed task is strongly coupled with the service code, the reliability of the distributed task is low.

Summary of the invention

The present application provides a distributed task processing method and apparatus for solving the problem that the error correction code of each task in the distributed task is coupled with the service code in the prior art, so that the reliability of the distributed task is low. Technical problem.

In a first aspect, the application provides a distributed task processing method, where the method includes:

The server obtains an execution result of the current task on the node in the distributed system; the distributed system includes multiple nodes, each node includes at least one task of the same service flow; and the execution result carries the identifier of the current task ;

Obtaining, by the server according to the identifier of the current task and the execution result, a first error correction policy corresponding to the current task from a preset error correction policy set; the error correction policy set includes the current task And a corresponding relationship between the execution result and the first error correction policy; the first error correction policy is used to indicate an action performed by the node in the next step;

The server sends the first error correction policy to the node.

According to the distributed task processing method provided by the first aspect, when the node of the distributed system performs different tasks of the same service flow, the server may obtain the execution result of the current task on each node, and may pre-predict according to the execution result. The first error correction policy corresponding to the execution result is obtained in the set of the error correction policy, so that the node automatically processes the error of the task during the operation by sending the first error correction policy to the node. In this way, error correction code can no longer be coupled to the business code of each task of the distributed task. Therefore, when the developer needs to add an error correction code segment to a task in the distributed task, only Updating the error correction policy set on the server side does not require recompiling the distributed task, thereby improving the reliability of the distributed task.

Optionally, in a possible implementation manner of the first aspect, the server, according to the identifier of the current task and the execution result, obtain, according to the preset error correction policy set, the current task corresponding to the first task An error correction strategy specifically includes:

Determining, by the server, a subset of error correction strategies corresponding to the current task according to the identifier of the current task, the first mapping relationship, and the second mapping relationship;

The server determines the first error correction policy according to the execution result and a third mapping relationship in the subset of error correction policies.

Optionally, in a possible implementation manner of the first aspect, the server obtains an execution result of a current task on a node in the distributed system, and specifically includes:

The server acquires the execution result according to a result returned by the node executing a callback function.

Optionally, in a possible implementation manner of the first aspect, after the server sends the first error correction policy to the node, the method further includes:

If the node is the last node that processes the traffic flow, the server generates an execution report of the distributed task.

The distributed task processing method provided by the possible implementation manner enables the maintenance personnel of the service flow to learn the error of the service flow during the execution process by consulting the execution report, thereby improving the user experience.

In a second aspect, the application provides a distributed task processing apparatus, where the apparatus includes:

a first obtaining module, configured to obtain an execution result of a current task on a node in the distributed system; the distributed system includes multiple nodes, each node including at least one task of the same service flow; and the execution result carries The identifier of the current task;

a second acquiring module, configured to acquire, according to the identifier of the current task and the execution result, a first error correction policy corresponding to the current task from a preset error correction policy set; the error correction policy set includes Determining, by the identifier of the current task, a correspondence between the execution result and the first error correction policy; the first error correction policy is used to indicate an action performed by the node in the next step;

And a sending module, configured to send the first error correction policy acquired by the second acquiring module to the node.

Optionally, in a possible implementation manner of the second aspect, the second acquiring module includes:

a first determining unit, configured to determine, according to the identifier of the current task, the first mapping relationship, and the second mapping relationship, a subset of error correction policies corresponding to the current task;

a second determining unit, configured to determine the first error correction policy according to the execution result and a third mapping relationship in the subset of error correction policies determined by the first determining unit.

Optionally, in a possible implementation manner of the second aspect, the first acquiring module is configured to obtain the execution result according to a result returned by the node executing a callback function.

Optionally, in a possible implementation manner of the second aspect, the device further includes:

And a generating module, configured to: after the sending module sends the first error correction policy to the node, if the node is the last node that processes the service flow, generate an execution report of the distributed task.

For the beneficial effects of the above-mentioned second aspect and the distributed task processing apparatus provided by the possible embodiments of the second aspect, reference may be made to the beneficial effects brought by the first aspect and the possible implementation manners of the first aspect, This will not be repeated here.

In a third aspect, the application provides a distributed task processing apparatus, where the apparatus includes:

And a processor, configured to obtain an execution result of the current task on the node in the distributed system, and obtain, according to the identifier of the current task and the execution result, the first task corresponding to the current task from the preset error correction policy set An error correction strategy; the distributed system includes a plurality of nodes, each node including at least one task of the same service flow; the execution result carries an identifier of the current task; and the error correction policy set includes the Corresponding relationship between the identifier of the current task, the execution result, and the first error correction policy; the first error correction policy is used to indicate an action performed by the node in the next step;

And a transmitter, configured to send the first error correction policy acquired by the processor to the node.

Optionally, in a possible implementation manner of the third aspect, the processor is configured to obtain, according to the identifier of the current task and the execution result, the current error correction policy set. The first error correction strategy corresponding to the task is specifically:

The processor is configured to determine, according to the identifier of the current task, the first mapping relationship, and the second mapping relationship, a subset of error correction policies corresponding to the current task, and according to the execution result and the location Determining the first error correction strategy by describing a third mapping relationship in the subset of error correction strategies.

Optionally, in a possible implementation manner of the third aspect, the processor is configured to obtain an execution result of a current task on a node in a distributed system, specifically:

The processor is specifically configured to obtain the execution result according to a result returned by the node executing a callback function.

Optionally, in a possible implementation manner of the third aspect, the processor is further configured to: after the sending, by the sender, the first error correction policy to the node, if the node is Processing the last node of the service flow generates an execution report of the distributed task.

For the beneficial effects of the distributed task processing apparatus provided by the foregoing third aspect and the possible implementation manners of the third aspect, reference may be made to the beneficial effects brought by the foregoing first aspect and the possible implementation manners of the first aspect, This will not be repeated here.

The error correction policy set is combined with the foregoing first aspect, and the possible implementation manners of the first aspect, the second aspect, and the possible implementation manners of the second aspect, the third aspect, and the possible implementation manners of the third aspect. Specifically include:

a first mapping relationship between an identifier of the current task and an identifier of the error correction policy subset to which the first error correction policy belongs, and an identifier of the error correction policy subset and the error correction strategy subset a second mapping relationship between the error correction policy subsets includes a third mapping relationship between the execution result and the first error correction policy.

With reference to the first aspect, and the possible implementation manners of the first aspect, the second aspect, and the possible implementation manners of the second aspect, the third aspect, and the possible implementation manners of the third aspect, the node next step The actions performed include: scrolling back, ignoring and continuing the action, pause, and retry.

The distributed task processing method and apparatus provided by the present application, when a node of a distributed system performs different tasks of the same service flow, the server may obtain an execution result of the current task on each node, and may pre-predict according to the execution result. The first error correction policy corresponding to the execution result is obtained in the set of the error correction policy, so that the node automatically processes the error of the task during the operation by sending the first error correction policy to the node. In this way, error correction code can no longer be coupled to the business code of each task of the distributed task. Therefore, when the developer needs to add an error correction code segment to a task in the distributed task, only Updating the error correction policy set on the server side does not require recompiling the distributed task, thereby improving the reliability of the distributed task.

DRAWINGS

1 is a system architecture diagram of a distributed task processing method provided by the present application;

2 is a schematic flowchart of a distributed task processing method provided by the present application;

3 is a schematic diagram of an error processing framework provided by the present application;

4 is a schematic flowchart of another distributed task processing method provided by the present application;

FIG. 5 is a schematic structural diagram of a distributed task processing apparatus according to the present application; FIG.

FIG. 6 is a schematic structural diagram of another distributed task processing apparatus provided by the present application; FIG.

FIG. 7 is a schematic structural diagram of still another distributed task processing apparatus provided by the present application; FIG.

FIG. 8 is a schematic structural diagram of still another distributed task processing apparatus provided by the present application.

detailed description

FIG. 1 is a system architecture diagram of a distributed task processing method provided by the present application. As shown in FIG. 1, the distributed system is a typical distributed system, and the distributed system may include a server for managing each node (for example, the management server shown in FIG. 1), and multiple processing distributed tasks. A node (for example, the service server shown in Figure 1), a switch for communication between the server and each node. Optionally, the foregoing distributed system may further include: at least one server provided with a database (for example, the database server shown in FIG. 1), and a server for storing related data in the distributed system (for example: FIG. 1 The storage server shown). The above-described server can coordinate the nodes to perform tasks in distributed tasks in parallel or serially, wherein each node can perform one or more serial tasks in the distributed tasks to improve the execution speed of the distributed tasks.

It should be noted that the distributed system shown in FIG. 1 is only an example, and the distributed task processing method provided by the present application can be applied to any distributed system, and is not limited to FIG. 1 .

In the prior art, when developers develop distributed tasks, the error correction code (for example, error code, exception handling) of each task is coupled into the service code of the task. Therefore, when a certain node of the distributed system is running the task, the error in the running process of the task can be automatically processed according to the error correction code segment corresponding to the execution result of the task, so as to improve the running efficiency of the distributed task. .

However, since the error correction code coupled with the business code in each of the above tasks is written by the developer based on the experience of historical processing errors, the error correction code in each of the above tasks may not cover the actual operation of the task. All errors that occurred during the process. Therefore, when the execution result obtained by the above task in the execution process has no corresponding error correction code segment, that is, a new error that cannot be processed by the error correction code occurs, the developer needs to correct the error of the new error. The code snippet is added to the business code of the task and the distributed task needs to be recompiled. Since the distributed task has the risk of being inoperable after recompilation, the reliability of the distributed task is low.

The distributed task processing method provided by the present application no longer couples the error correction code of each task into the service code of each task, but presets the error correction code in a distributed system for controlling each node. On the server. In this way, when the node in the distributed system performs the task, the server can determine the next action of the node by using the error correction code preset by the server by obtaining the execution result of the task, and instruct the node to perform the action to correct the node. The error that occurs in the task is to achieve that the node can automatically handle the error of the task during the running process. In this way, error correction code can no longer be coupled to the business code of each task of the distributed task. Therefore, even if the subsequent developer needs to add an error correction code segment to a task in the distributed task, it only needs to update the error correction code on the server side, and does not need to recompile the distributed task, thereby improving the distribution. The reliability of the task. Therefore, the distributed task processing method and apparatus provided by the present application are aimed at solving the problem that the error correction code of each task in the distributed task is coupled with the service code in the prior art, so that the reliability of the distributed task is better. Low technical issues.

FIG. 2 is a schematic flowchart diagram of a distributed task processing method provided by the present application. As shown in FIG. 2, this embodiment relates to a specific process for a server to determine an error correction policy for a node according to an execution result of a current task on a node. As shown in FIG. 1, the method can include:

S101: The server obtains an execution result of a current task on a node in the distributed system, where the distributed system includes multiple nodes, each node includes at least one task of the same service flow; and the execution result carries the identifier of the current task. .

Specifically, the distributed system may include multiple nodes. The node may be, for example, the service server shown in FIG. 1 above, or a computer, a virtual machine, or the like that can perform tasks in the service flow. The foregoing service flow may be: opening a virtual machine service flow, opening a cloud desktop service flow, opening a cloud application service flow, and an operation and maintenance management service flow, where the service flow may include multiple tasks, and multiple tasks of the service flow may be distributed. Executed on different nodes, so the above traffic can be seen as a distributed task.

In this embodiment, the server in the distributed system (for example, the management server shown in FIG. 1 above) is not only used to coordinate each node of the distributed system to execute one of the service flows (ie, distributed tasks) in parallel. Or multiple tasks, at the same time, after each node executes each task on it, the execution result of each task can be obtained. Optionally, the server may obtain the execution result of the current task returned by each node in the distributed system by sending a message requesting the execution result of the current task to each node, and may also receive the active sending by each node. The execution result of the current task of the node is used to obtain the execution result of the current task on each node in the distributed system. The foregoing execution result may carry the identifier of the current task.

S102: The server obtains a first error correction policy corresponding to the current task from the preset error correction policy set according to the identifier and the execution result of the current task, where the error correction policy set includes the identifier, execution result, and first of the current task. Correspondence between error correction strategies; the first error correction strategy is used to indicate the actions performed by the node in the next step.

Specifically, in this embodiment, the server pre-sets an error correction policy set of the service flow (ie, a distributed task), and the error correction policy set may include each task of the service flow (ie, a distributed task). Error correction strategies corresponding to different execution results. In this way, after the server obtains the execution result of the current task on the node of the distributed system, the server may search for the preset error correction policy set according to the execution result and the identifier of the current task carried in the execution result. The error correction strategy corresponding to the current task under the execution result, that is, the first error correction strategy. The first error correction policy may be used to indicate an action performed by the node in the next step. The action described herein may be, for example, scrolling back, ignoring and continuing the action, pausing, retrying, and the like. It should be noted that the action performed by the node indicated by the first error correction policy may be specifically set according to the needs of the user or the type of the task, which is not limited in this application.

S103. The server sends the first error correction policy to the node.

Specifically, after the server obtains the first error correction policy corresponding to the current task from the preset error correction policy set according to the identifier and the execution result of the current task, the first error correction policy may be returned to the node, so that The node may perform an action corresponding to the first error correction policy according to the first error correction policy. In this way, you can make According to the first error correction policy sent by the server, the node automatically processes the error of the task during the running process, and improves the running efficiency of the distributed task.

The distributed task processing method provided by the application, when the node of the distributed system performs different tasks of the same service flow, the server can obtain the execution result of the current task on each node, and can be preset according to the execution result. The first error correction policy corresponding to the execution result is obtained in the error correction policy set, so that the node automatically processes the task to have an error during the operation by sending the first error correction policy to the node. In this way, error correction code can no longer be coupled to the business code of each task of the distributed task. Therefore, when the developer needs to add an error correction code segment to a task in the distributed task, only Updating the error correction policy set on the server side does not require recompiling the distributed task, thereby improving the reliability of the distributed task.

Further, on the basis of the foregoing embodiment, in the embodiment, an error processing framework is run on the server, and FIG. 3 is a schematic diagram of an error processing framework provided by the present application. As shown in FIG. 3, the error processing framework can receive and And storing the error correction policy set input by the user, and having the function of obtaining the execution result of the current task on the node, and determining the function of the error correction policy corresponding to the execution result according to the execution result and the preset error correction policy set. The distributed task processing method provided by the present application is described as an example in which the server executes the distributed task processing method through the error processing framework.

FIG. 4 is a schematic flowchart of another distributed task processing method provided by the present application. As shown in FIG. 4, the embodiment relates to a server acquiring an execution result of a current task on a node through an error processing framework, and passing the error. The processing framework determines the specific process of the error correction strategy for the node.

Before implementing this method, you need to do the following preparations:

Step 1: The user presets a set of error correction strategies in the error handling framework.

Specifically, the user may directly preset the error correction policy set in the error processing framework of the server, and may further preset the error correction policy set by using a model definition manner. When the user presets the foregoing error correction policy set by means of model definition and orchestration, the above-mentioned error correction policy set may be preset by defining a model, or the above error correction strategy may be preset by defining two models. set. If the user presets the error correction policy set by defining two models, the error correction policy set may include a first between an identifier of each task in the service flow and an identifier of the error correction policy subset corresponding to the task. a mapping relationship, and a second mapping relationship between the identifier of each subset of error correction strategies and each subset of error correction strategies; wherein the identifier of each of the tasks is a subset of the error correction strategy corresponding to the task The first mapping relationship between the identifiers may be a model (in the embodiment, the model is simply referred to as a service model), and the identifier of each of the error correction strategy subsets and the second between each error correction strategy subset The mapping relationship can be a single model (in the present embodiment, the model is simply referred to as a policy model). That is, the service model of the foregoing error correction policy set may include a first mapping relationship between the identifier of the current task and the identifier of the error correction policy subset to which the first error correction policy belongs, and the error correction policy set. The policy model may include a second mapping relationship between the identifier of the subset of error correction strategies and the subset of error correction strategies; wherein the subset of error correction strategies includes a third between the execution result and the first error correction strategy Mapping relations. The above FIG. 3 shows an error processing framework in which the above-described error correction policy set is preset by defining a business model and a policy model.

Continuing to refer to FIG. 3, the user sets the foregoing error correction policy set by defining a service model and a policy model in the error processing framework as an example. In specific implementation, the user can use the computer language supported by the error processing framework. First define and orchestrate the above business models and policy models based on the tasks included in the business flow, such as YAML, XML, JSON, and so on. The above business model may include not only the identity of each task in the service flow. The first mapping relationship between the identifiers of the error correction policy subsets corresponding to the task may also represent the execution order between the tasks included in the service flow.

Exemplarily, taking the YAML language as an example, the definition and layout file of the business model shown in FIG. 3 above may be as follows:

In the above file, work represents a service flow, and Job represents a "serial task sequence" that is executed in parallel with each other in the service flow. The above one job may include at least one task task (the number of tasks included in the above job is specific to the service flow) Related to the architecture), the name under each task task is the identifier of the task, and the policy under each task task is the identifier of the subset of error correction strategies corresponding to each task task. That is to say, the first mapping relationship between the identifier of each task and the identifier of the subset of error correction policies corresponding to each task is established by name and policy under each task. For example, task1.1 is the identifier of the first task under the serial task sequence Job1, and policy-1.1 in task1.1 is the identifier of the error correction strategy subset corresponding to task1.1.

Optionally, the execution order between tasks in the business model shown in the above example may also be the business model on the left side in the “business and policy model” in FIG. 3 . As shown in FIG. 3, in the business model, work represents a business flow, a job represents a "serial task sequence" that is executed in parallel with each other in a business flow, a task represents a task in a business flow, and an arrow between each task points to Indicates the order of execution between tasks in the business flow.

The above policy model may include a second mapping relationship between the identifier of the error correction strategy subset corresponding to the identifier of each task and the subset of error correction strategies. In a specific implementation, the user may define the error correction strategy subset according to the error processing situation in the history execution process according to the task corresponding to the error correction policy subset. The subset of error correction strategies corresponding to each task may include: an error correction strategy corresponding to the execution result obtained by the task under different error conditions.

Exemplarily, taking the YAML language as an example, the definition and layout file of the second mapping relationship between the identifier of a subset of the error correction strategy in the policy model and the subset of the error correction strategy may be as follows:

The above document shows that the identifier of the error correction policy subset in the foregoing service model is a subset of the error correction strategy corresponding to policy-1.1, wherein each rule in the rules represents each execution result and correction in the subset of the error correction strategy. The correspondence between the wrong strategies, wherein the condition in each rule represents the execution result, and the action represents the error correction strategy corresponding to the execution result. For example, in the above code, policy-1.1 includes three rules (ie, rule1, rule2, and rule3), that is, the correspondence between the three execution results and the error correction strategy is included, taking rule1 as an example, and the rule1 indicates the execution result ( That is, when the condition is 0, the error correction policy is to ignore and continue the action, that is, the action to be performed by the node in the next step is the action of ignoring and continuing. Optionally, if the action corresponding to the error correction policy (ie, the action) is a retry, the error correction policy may further include the number of retry attempts, and if the action corresponding to the error correction policy (ie, the action) is a rollback, Then, the above error correction strategy may further include an operation code for performing rollback, and the like.

It should be noted that each of the foregoing error correction policies may correspond to only one execution result, or may correspond to multiple execution results, and may be determined according to the specific content and type of the task.

Further, the subset of the error correction strategy corresponding to one task in the policy model shown in the above example may also be shown in the policy model on the right side in the service and policy model in FIG. 3, as shown in FIG. A certain task in the service flow, the Policy represents a subset of the error correction strategy corresponding to the task, and the correspondence between the condition and the action may represent a correspondence between the execution result of the error correction strategy subset and the error correction strategy.

It should be noted that the specific content corresponding to each error correction policy shown in the above example is only an example, and those skilled in the art can understand that the above error correction strategy is not limited thereto. It can be modified or set according to the type of task or the needs of the user.

When the user inputs the above defined and arranged business model file and strategy model file in the error handling framework of the server, the setting of the business model and the policy model is completed, and the purpose of the preset error correction strategy set is achieved. Further, after receiving the business model and the policy model input by the user, the error processing framework may first parse the business model and the policy model to be converted into computer speech recognizable by the error processing framework, and convert the converted The business model and the policy model are stored in the distributed system to store the stored storage database, for example, a relational database, a file system database, etc., to prevent the business model and the policy model from being lost due to server power failure. The foregoing storage database may be a database integrated on the server, or may be a database server independent of the server, which is not limited in this application.

Step 2: The server imports the preset error correction policy set into the error handling framework.

Specifically, in this embodiment, the error processing framework is provided with an execution result coordinator, and the execution result coordinator may import a preset error correction policy set, so that the execution result coordinator can execute according to the acquired node. The execution result of any task of the service flow, and the preset error correction policy, determines an error correction strategy corresponding to the execution result. In this way, when the node on the distributed system executes the service flow, the server can store the business model and the policy model stored in the storage database (ie, the preset error correction strategy by running the execution result coordinator in the error processing framework). The collection is imported into the error handling framework (ie the model input shown in Figure 3).

As shown in FIG. 4, the method may include:

S201. The server obtains an execution result according to a result returned by the node execution callback function.

Specifically, in this embodiment, the end of the service code of each task in the foregoing service flow is provided with a callback function, and the callback function may carry a communication address of the execution result coordinator on the error processing framework (for example: IP address (address), so that the node performing the task can send the execution result of the task to the execution result coordinator in the error processing framework on the server through the callback function. Thus, when a node in a distributed system performs a task in a service flow, the server can monitor the node through an execution result coordinator on the error handling framework running thereon to obtain the currently executed task on the node. Execution result (that is, the process monitoring function shown in Figure 3). The above callback function may be, for example, a PostEnd function in the hook function HOOK shown in FIG. 3. It should be noted that, in the foregoing, how to set the callback function in each task of the service flow can be referred to the prior art, and the details are not described herein again.

S202. The server determines, according to the identifier of the current task, the first mapping relationship, and the second mapping relationship, a subset of the error correction policies corresponding to the current task.

Specifically, after the execution result coordinator in the error processing framework running on the server obtains the execution result of the current task on the node in the distributed system, the execution result coordinator can identify the first mapping according to the current task. The relationship (ie, the business model) and the second mapping relationship (ie, the policy model) determine a subset of the error correction strategies corresponding to the current task (ie, the policy match shown in FIG. 3). In a specific implementation, the execution result coordinator may first obtain the identifier of the error correction policy subset corresponding to the identifier of the current task in the service model according to the identifier of the current task carried in the execution result, and further, according to the error correction strategy. The identifier of the set obtains a subset of the error correction strategy corresponding to the identifier of the subset of the error correction strategy in the policy model, so that the error correction policy corresponding to the execution result may be searched in the error correction policy subset according to the execution result.

With continued reference to the above example, it is assumed that the execution result received by the execution result coordinator is 1 and the task identifier carried by the execution result is task1.1, and the execution result coordinator can first search for the service model according to the task identifier. The identifier of the error correction policy subset corresponding to the task identifier, that is, policy-1.1, and then, according to the identifier of the error correction policy subset, find a subset of the error correction strategy corresponding to the identifier of the error correction policy subset in the policy model.

S203. The server determines the first error correction policy according to the execution result and the third mapping relationship in the subset of error correction policies.

Specifically, after the execution result coordinator in the error processing framework running on the server determines the subset of the error correction policy corresponding to the current task, the execution result may be found and executed in the error correction policy subset corresponding to the current task according to the execution result. The result corresponds to the first error correction strategy (ie, the strategy match shown in Figure 3).

Continuing to refer to the example of S202, the first error correction policy corresponding to the execution result found in policy-1.1 is a retry according to the execution result 1 described above. Optionally, if the execution result is 0, the first error correction policy corresponding to the execution result is ignored and continues, indicating that the execution result is a correct execution result.

S204. The server sends the first error correction policy to the node.

Specifically, after obtaining the first error correction policy, the execution result coordinator in the error processing framework running on the server may send the first error correction policy to the node through the callback function in the task, so that the node The action corresponding to the first error correction strategy (ie, the action execution shown in FIG. 3) may be performed.

Optionally, if the node is a node that processes the last task of the service flow, after the foregoing S204, the method may further include: the server generating an execution report of the distributed task (that is, the execution report shown in FIG. 3), By checking the execution report, the maintenance personnel of the service flow can know the errors that occur in the execution of the service flow and improve the user experience.

The distributed task processing method provided by the application, by using the error processing framework running on the server, can obtain the execution result of the current task on each node when the nodes of the distributed system perform different tasks of the same service flow. And obtaining, according to the execution result, a first error correction policy corresponding to the execution result in a preset error correction policy set, so that the node can be automatically sent by sending the first error correction policy to the node. Handling the task encountered an error during the run. In this way, error correction code can no longer be coupled to the business code of each task of the distributed task. Therefore, when the developer needs to add an error correction code segment to a task in the distributed task, only Updating the error correction policy set in the error handling framework of the server does not require recompiling the distributed task, thereby improving the reliability of the distributed task.

FIG. 5 is a schematic structural diagram of a distributed task processing apparatus provided by the present application. The distributed task processing apparatus may be implemented as part or all of a server by software, hardware, or a combination of the two. As shown in FIG. 5, the distributed task processing apparatus may include: a first obtaining module 11, a second acquiring module 12, and a sending module 13;

The first obtaining module 11 is configured to obtain an execution result of a current task on a node in the distributed system; the distributed system includes multiple nodes, each node includes at least one task of the same service flow; and the execution result carries the current task Identification

The second obtaining module 12 is configured to obtain, according to the identifier and the execution result of the current task, the first error correction policy corresponding to the current task from the preset error correction policy set; the error correction policy set includes the identifier of the current task, the execution result, and Corresponding relationship between the first error correction strategies; the first error correction strategy is used to indicate the actions performed by the node in the next step;

The sending module 13 is configured to send the first error correction policy acquired by the second obtaining module 12 to the node.

The distributed task processing apparatus provided by the present application may perform the foregoing method embodiments, and the implementation principles and technical effects thereof are similar, and details are not described herein again.

Optionally, in an implementation manner of the application, the action performed by the node in the foregoing may include: scrolling back, ignoring, and continuing, stopping, and retrying.

Optionally, in an implementation manner of the application, the foregoing error correction policy set may include: a first mapping relationship between an identifier of the current task and an identifier of the error correction policy subset to which the first error correction policy belongs, And a second mapping relationship between the identifier of the error correction strategy subset and the error correction strategy subset; the error correction strategy subset includes a third mapping relationship between the execution result and the first error correction strategy.

In this implementation manner, FIG. 6 is a schematic structural diagram of another distributed task processing apparatus provided by the present application. As shown in FIG. 6, on the basis of the foregoing embodiment shown in FIG. 5, the second acquiring module is used. 12 may include: a first determining unit 121 and a second determining unit 122; wherein

The first determining unit 121 is configured to determine, according to the identifier of the current task, the first mapping relationship, and the second mapping relationship, a subset of the error correction policies corresponding to the current task;

The second determining unit 122 is configured to determine the first error correction policy according to the execution result and the third mapping relationship in the subset of error correction policies determined by the first determining unit 121.

In this implementation manner, the foregoing first obtaining module 11 may be specifically configured to obtain an execution result according to a result returned by the node execution callback function.

FIG. 7 is a schematic structural diagram of still another distributed task processing apparatus according to the present application. As shown in FIG. 7, the distributed task processing apparatus may further include:

The generating module 14 is configured to: after the sending module 13 sends the first error correction policy to the node, if the node is a processing industry The last node of the transaction flow generates an execution report of the distributed task.

FIG. 8 is a schematic structural diagram of still another distributed task processing apparatus provided by the present application. As shown in FIG. 8, the distributed task processing apparatus may include: a processor 21 and a transmitter 22;

The processor 21 is configured to obtain an execution result of the current task on the node in the distributed system, and obtain a first error correction policy corresponding to the current task from the preset error correction policy set according to the identifier and the execution result of the current task. The distributed system includes multiple nodes, each node including at least one task of the same service flow; the execution result carries the identifier of the current task; the error correction policy set includes the identifier of the current task, the execution result and the first error correction policy Correspondence relationship; the first error correction strategy is used to indicate the action performed by the node in the next step;

The transmitter 22 is configured to send the first error correction policy acquired by the processor 21 to the node.

In this implementation, the processor 21 is configured to obtain an execution result of the current task on the node in the distributed system. Specifically, the processor 21 obtains an execution result according to the result returned by the node execution callback function.

In this implementation, the processor 21 is configured to obtain, according to the identifier and the execution result of the current task, the first error correction policy corresponding to the current task from the preset error correction policy set, where the processor 21 may be: Determining, according to the identifier of the current task, the first mapping relationship and the second mapping relationship, the subset of the error correction policy corresponding to the current task, and determining the first error correction policy according to the execution result and the third mapping relationship in the subset of the error correction policy.

Further, on the basis of the foregoing embodiment, the processor 21 may be further configured to: after the transmitter 22 sends the first error correction policy to the node, if the node is the last node of the processing service flow, generate the distributed The execution report of the task.

Claims

A distributed task processing method, characterized in that the method comprises:

The server obtains an execution result of the current task on the node in the distributed system; the distributed system includes multiple nodes, each node includes at least one task of the same service flow; and the execution result carries the identifier of the current task ;

Obtaining, by the server according to the identifier of the current task and the execution result, a first error correction policy corresponding to the current task from a preset error correction policy set; the error correction policy set includes the current task And a corresponding relationship between the execution result and the first error correction policy; the first error correction policy is used to indicate an action performed by the node in the next step;

The server sends the first error correction policy to the node.
The method according to claim 1, wherein the error correction policy set specifically comprises:

a first mapping relationship between an identifier of the current task and an identifier of the error correction policy subset to which the first error correction policy belongs, and an identifier of the error correction policy subset and the error correction strategy subset a second mapping relationship between the error correction policy subsets includes a third mapping relationship between the execution result and the first error correction policy.
The method according to claim 2, wherein the server acquires a first error correction strategy corresponding to the current task from a preset error correction policy set according to the identifier of the current task and the execution result. Specifically, including:

Determining, by the server, a subset of error correction strategies corresponding to the current task according to the identifier of the current task, the first mapping relationship, and the second mapping relationship;

The server determines the first error correction policy according to the execution result and a third mapping relationship in the subset of error correction policies.
The method according to any one of claims 1-3, wherein the action performed by the node in the next step comprises: scrolling back, ignoring and continuing the action, pause, and retry.
The method according to claim 4, wherein the server obtains an execution result of the current task on the node in the distributed system, and specifically includes:

The server acquires the execution result according to a result returned by the node executing a callback function.
The method according to any one of claims 1-5, wherein after the server sends the first error correction policy to the node, the method further includes:

If the node is the last node that processes the traffic flow, the server generates an execution report of the distributed task.
A distributed task processing apparatus, characterized in that the apparatus comprises:

a first obtaining module, configured to obtain an execution result of a current task on a node in the distributed system; the distributed system includes multiple nodes, each node including at least one task of the same service flow; and the execution result carries The identifier of the current task;

a second acquiring module, configured to acquire, according to the identifier of the current task and the execution result, a first error correction policy corresponding to the current task from a preset error correction policy set; the error correction policy set includes Determining, by the identifier of the current task, a correspondence between the execution result and the first error correction policy; the first error correction policy is used to indicate an action performed by the node in the next step;

And a sending module, configured to send the first error correction policy acquired by the second acquiring module to the node.
The apparatus according to claim 7, wherein the error correction policy set specifically includes:

a first mapping relationship between an identifier of the current task and an identifier of the error correction policy subset to which the first error correction policy belongs, and an identifier of the error correction policy subset and the error correction strategy subset a second mapping relationship between the error correction policy subsets includes a third mapping relationship between the execution result and the first error correction policy.
The device according to claim 8, wherein the second obtaining module comprises:

a first determining unit, configured to determine, according to the identifier of the current task, the first mapping relationship, and the second mapping relationship, a subset of error correction policies corresponding to the current task;

a second determining unit, configured to determine the first error correction policy according to the execution result and a third mapping relationship in the subset of error correction policies determined by the first determining unit.
The apparatus according to any one of claims 7-9, wherein the action performed by the node in the next step comprises: scrolling back, ignoring and continuing the action, pause, and retry.
The apparatus according to claim 10, wherein the first obtaining module is specifically configured to acquire the execution result according to a result returned by the node executing a callback function.
The device according to any one of claims 7 to 11, wherein the device further comprises:

And a generating module, configured to: after the sending module sends the first error correction policy to the node, if the node is the last node that processes the service flow, generate an execution report of the distributed task.
A distributed task processing apparatus, characterized in that the apparatus comprises:

And a processor, configured to obtain an execution result of the current task on the node in the distributed system, and obtain, according to the identifier of the current task and the execution result, the first task corresponding to the current task from the preset error correction policy set An error correction strategy; the distributed system includes a plurality of nodes, each node including at least one task of the same service flow; the execution result carries an identifier of the current task; and the error correction policy set includes the Corresponding relationship between the identifier of the current task, the execution result, and the first error correction policy; the first error correction policy is used to indicate an action performed by the node in the next step;

And a transmitter, configured to send the first error correction policy acquired by the processor to the node.
The device according to claim 13, wherein the error correction policy set specifically includes:

a first mapping relationship between an identifier of the current task and an identifier of the error correction policy subset to which the first error correction policy belongs, and an identifier of the error correction policy subset and the error correction strategy subset a second mapping relationship between the error correction policy subsets includes a third mapping relationship between the execution result and the first error correction policy.
The device according to claim 14, wherein the processor is configured to acquire, according to the identifier of the current task and the execution result, a first corresponding to the current task from a preset error correction policy set. A corrective strategy, specifically:

The processor is configured to determine, according to the identifier of the current task, the first mapping relationship, and the second mapping relationship, a subset of error correction policies corresponding to the current task, and according to the execution result and the location Determining the first error correction strategy by describing a third mapping relationship in the subset of error correction strategies.
The apparatus according to any one of claims 13-15, wherein the action performed by the node in the next step comprises: scrolling back, ignoring and continuing the action, pausing, retrying.
The device according to claim 16, wherein the processor is configured to obtain an execution result of a current task on a node in a distributed system, specifically:

The processor is specifically configured to obtain the execution result according to a result returned by the node executing a callback function.
The device according to any one of claims 13-17, wherein the processor is further configured to: after the transmitter sends the first error correction policy to the node, if the node To process the last node of the traffic flow, an execution report of the distributed task is generated.