WO2018000878A1 - Distributed task processing method and apparatus - Google Patents

Distributed task processing method and apparatus Download PDF

Info

Publication number
WO2018000878A1
WO2018000878A1 PCT/CN2017/079230 CN2017079230W WO2018000878A1 WO 2018000878 A1 WO2018000878 A1 WO 2018000878A1 CN 2017079230 W CN2017079230 W CN 2017079230W WO 2018000878 A1 WO2018000878 A1 WO 2018000878A1
Authority
WO
WIPO (PCT)
Prior art keywords
error correction
node
task
execution result
correction policy
Prior art date
Application number
PCT/CN2017/079230
Other languages
French (fr)
Chinese (zh)
Inventor
乔雷
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018000878A1 publication Critical patent/WO2018000878A1/en

Links

Images

Definitions

  • the present application relates to computer technology, and in particular, to a distributed task processing method and apparatus.
  • a distributed system is a computer system that connects multiple computers at different locations or with different functions or with different data through a communication network to coordinate large-scale information processing tasks under the unified management and control of the control system.
  • distributed systems can distribute the tasks included in any distributed task on different computers to improve the execution speed of distributed tasks.
  • the error correction code ie, the code for error processing
  • the service code of each task is coupled into the service code of each task, so that Distributed tasks can automatically handle errors of any task during the running process, improving the efficiency of distributed tasks.
  • the present application provides a distributed task processing method and apparatus for solving the problem that the error correction code of each task in the distributed task is coupled with the service code in the prior art, so that the reliability of the distributed task is low.
  • the application provides a distributed task processing method, where the method includes:
  • the server obtains an execution result of the current task on the node in the distributed system;
  • the distributed system includes multiple nodes, each node includes at least one task of the same service flow; and the execution result carries the identifier of the current task ;
  • the server sends the first error correction policy to the node.
  • the server may obtain the execution result of the current task on each node, and may pre-predict according to the execution result.
  • the first error correction policy corresponding to the execution result is obtained in the set of the error correction policy, so that the node automatically processes the error of the task during the operation by sending the first error correction policy to the node.
  • error correction code can no longer be coupled to the business code of each task of the distributed task. Therefore, when the developer needs to add an error correction code segment to a task in the distributed task, only Updating the error correction policy set on the server side does not require recompiling the distributed task, thereby improving the reliability of the distributed task.
  • the server according to the identifier of the current task and the execution result, obtain, according to the preset error correction policy set, the current task corresponding to the first task
  • An error correction strategy specifically includes:
  • the server determines the first error correction policy according to the execution result and a third mapping relationship in the subset of error correction policies.
  • the server obtains an execution result of a current task on a node in the distributed system, and specifically includes:
  • the server acquires the execution result according to a result returned by the node executing a callback function.
  • the method further includes:
  • the server If the node is the last node that processes the traffic flow, the server generates an execution report of the distributed task.
  • the distributed task processing method provided by the possible implementation manner enables the maintenance personnel of the service flow to learn the error of the service flow during the execution process by consulting the execution report, thereby improving the user experience.
  • the application provides a distributed task processing apparatus, where the apparatus includes:
  • a first obtaining module configured to obtain an execution result of a current task on a node in the distributed system;
  • the distributed system includes multiple nodes, each node including at least one task of the same service flow; and the execution result carries The identifier of the current task;
  • a second acquiring module configured to acquire, according to the identifier of the current task and the execution result, a first error correction policy corresponding to the current task from a preset error correction policy set;
  • the error correction policy set includes Determining, by the identifier of the current task, a correspondence between the execution result and the first error correction policy; the first error correction policy is used to indicate an action performed by the node in the next step;
  • a sending module configured to send the first error correction policy acquired by the second acquiring module to the node.
  • the second acquiring module includes:
  • a first determining unit configured to determine, according to the identifier of the current task, the first mapping relationship, and the second mapping relationship, a subset of error correction policies corresponding to the current task
  • a second determining unit configured to determine the first error correction policy according to the execution result and a third mapping relationship in the subset of error correction policies determined by the first determining unit.
  • the first acquiring module is configured to obtain the execution result according to a result returned by the node executing a callback function.
  • the device further includes:
  • a generating module configured to: after the sending module sends the first error correction policy to the node, if the node is the last node that processes the service flow, generate an execution report of the distributed task.
  • the application provides a distributed task processing apparatus, where the apparatus includes:
  • a processor configured to obtain an execution result of the current task on the node in the distributed system, and obtain, according to the identifier of the current task and the execution result, the first task corresponding to the current task from the preset error correction policy set
  • An error correction strategy the distributed system includes a plurality of nodes, each node including at least one task of the same service flow; the execution result carries an identifier of the current task; and the error correction policy set includes the Corresponding relationship between the identifier of the current task, the execution result, and the first error correction policy; the first error correction policy is used to indicate an action performed by the node in the next step;
  • a transmitter configured to send the first error correction policy acquired by the processor to the node.
  • the processor is configured to obtain, according to the identifier of the current task and the execution result, the current error correction policy set.
  • the first error correction strategy corresponding to the task is specifically:
  • the processor is configured to determine, according to the identifier of the current task, the first mapping relationship, and the second mapping relationship, a subset of error correction policies corresponding to the current task, and according to the execution result and the location Determining the first error correction strategy by describing a third mapping relationship in the subset of error correction strategies.
  • the processor is configured to obtain an execution result of a current task on a node in a distributed system, specifically:
  • the processor is specifically configured to obtain the execution result according to a result returned by the node executing a callback function.
  • the processor is further configured to: after the sending, by the sender, the first error correction policy to the node, if the node is Processing the last node of the service flow generates an execution report of the distributed task.
  • the error correction policy set is combined with the foregoing first aspect, and the possible implementation manners of the first aspect, the second aspect, and the possible implementation manners of the second aspect, the third aspect, and the possible implementation manners of the third aspect. Specifically include:
  • the node next step The actions performed include: scrolling back, ignoring and continuing the action, pause, and retry.
  • the server may obtain an execution result of the current task on each node, and may pre-predict according to the execution result.
  • the first error correction policy corresponding to the execution result is obtained in the set of the error correction policy, so that the node automatically processes the error of the task during the operation by sending the first error correction policy to the node.
  • error correction code can no longer be coupled to the business code of each task of the distributed task. Therefore, when the developer needs to add an error correction code segment to a task in the distributed task, only Updating the error correction policy set on the server side does not require recompiling the distributed task, thereby improving the reliability of the distributed task.
  • FIG. 1 is a system architecture diagram of a distributed task processing method provided by the present application.
  • FIG. 2 is a schematic flowchart of a distributed task processing method provided by the present application.
  • FIG. 3 is a schematic diagram of an error processing framework provided by the present application.
  • FIG. 5 is a schematic structural diagram of a distributed task processing apparatus according to the present application.
  • FIG. 6 is a schematic structural diagram of another distributed task processing apparatus provided by the present application.
  • FIG. 7 is a schematic structural diagram of still another distributed task processing apparatus provided by the present application.
  • FIG. 8 is a schematic structural diagram of still another distributed task processing apparatus provided by the present application.
  • FIG. 1 is a system architecture diagram of a distributed task processing method provided by the present application.
  • the distributed system is a typical distributed system, and the distributed system may include a server for managing each node (for example, the management server shown in FIG. 1), and multiple processing distributed tasks.
  • a node for example, the service server shown in Figure 1, a switch for communication between the server and each node.
  • the foregoing distributed system may further include: at least one server provided with a database (for example, the database server shown in FIG. 1), and a server for storing related data in the distributed system (for example: FIG. 1 The storage server shown).
  • the above-described server can coordinate the nodes to perform tasks in distributed tasks in parallel or serially, wherein each node can perform one or more serial tasks in the distributed tasks to improve the execution speed of the distributed tasks.
  • FIG. 1 is only an example, and the distributed task processing method provided by the present application can be applied to any distributed system, and is not limited to FIG. 1 .
  • the error correction code for example, error code, exception handling
  • the error in the running process of the task can be automatically processed according to the error correction code segment corresponding to the execution result of the task, so as to improve the running efficiency of the distributed task.
  • the error correction code coupled with the business code in each of the above tasks is written by the developer based on the experience of historical processing errors, the error correction code in each of the above tasks may not cover the actual operation of the task. All errors that occurred during the process. Therefore, when the execution result obtained by the above task in the execution process has no corresponding error correction code segment, that is, a new error that cannot be processed by the error correction code occurs, the developer needs to correct the error of the new error.
  • the code snippet is added to the business code of the task and the distributed task needs to be recompiled. Since the distributed task has the risk of being inoperable after recompilation, the reliability of the distributed task is low.
  • the distributed task processing method provided by the present application no longer couples the error correction code of each task into the service code of each task, but presets the error correction code in a distributed system for controlling each node.
  • the server can determine the next action of the node by using the error correction code preset by the server by obtaining the execution result of the task, and instruct the node to perform the action to correct the node.
  • the error that occurs in the task is to achieve that the node can automatically handle the error of the task during the running process. In this way, error correction code can no longer be coupled to the business code of each task of the distributed task.
  • the distributed task processing method and apparatus provided by the present application are aimed at solving the problem that the error correction code of each task in the distributed task is coupled with the service code in the prior art, so that the reliability of the distributed task is better. Low technical issues.
  • FIG. 2 is a schematic flowchart diagram of a distributed task processing method provided by the present application. As shown in FIG. 2, this embodiment relates to a specific process for a server to determine an error correction policy for a node according to an execution result of a current task on a node. As shown in FIG. 1, the method can include:
  • the server obtains an execution result of a current task on a node in the distributed system, where the distributed system includes multiple nodes, each node includes at least one task of the same service flow; and the execution result carries the identifier of the current task. .
  • the distributed system may include multiple nodes.
  • the node may be, for example, the service server shown in FIG. 1 above, or a computer, a virtual machine, or the like that can perform tasks in the service flow.
  • the foregoing service flow may be: opening a virtual machine service flow, opening a cloud desktop service flow, opening a cloud application service flow, and an operation and maintenance management service flow, where the service flow may include multiple tasks, and multiple tasks of the service flow may be distributed. Executed on different nodes, so the above traffic can be seen as a distributed task.
  • the server in the distributed system (for example, the management server shown in FIG. 1 above) is not only used to coordinate each node of the distributed system to execute one of the service flows (ie, distributed tasks) in parallel. Or multiple tasks, at the same time, after each node executes each task on it, the execution result of each task can be obtained.
  • the server may obtain the execution result of the current task returned by each node in the distributed system by sending a message requesting the execution result of the current task to each node, and may also receive the active sending by each node.
  • the execution result of the current task of the node is used to obtain the execution result of the current task on each node in the distributed system.
  • the foregoing execution result may carry the identifier of the current task.
  • the server obtains a first error correction policy corresponding to the current task from the preset error correction policy set according to the identifier and the execution result of the current task, where the error correction policy set includes the identifier, execution result, and first of the current task. Correspondence between error correction strategies; the first error correction strategy is used to indicate the actions performed by the node in the next step.
  • the server pre-sets an error correction policy set of the service flow (ie, a distributed task), and the error correction policy set may include each task of the service flow (ie, a distributed task). Error correction strategies corresponding to different execution results.
  • the server may search for the preset error correction policy set according to the execution result and the identifier of the current task carried in the execution result.
  • the error correction strategy corresponding to the current task under the execution result that is, the first error correction strategy.
  • the first error correction policy may be used to indicate an action performed by the node in the next step.
  • the action described herein may be, for example, scrolling back, ignoring and continuing the action, pausing, retrying, and the like. It should be noted that the action performed by the node indicated by the first error correction policy may be specifically set according to the needs of the user or the type of the task, which is not limited in this application.
  • the server sends the first error correction policy to the node.
  • the first error correction policy may be returned to the node, so that The node may perform an action corresponding to the first error correction policy according to the first error correction policy.
  • the node automatically processes the error of the task during the running process, and improves the running efficiency of the distributed task.
  • the distributed task processing method provided by the application when the node of the distributed system performs different tasks of the same service flow, the server can obtain the execution result of the current task on each node, and can be preset according to the execution result.
  • the first error correction policy corresponding to the execution result is obtained in the error correction policy set, so that the node automatically processes the task to have an error during the operation by sending the first error correction policy to the node.
  • error correction code can no longer be coupled to the business code of each task of the distributed task. Therefore, when the developer needs to add an error correction code segment to a task in the distributed task, only Updating the error correction policy set on the server side does not require recompiling the distributed task, thereby improving the reliability of the distributed task.
  • FIG. 3 is a schematic diagram of an error processing framework provided by the present application.
  • the error processing framework can receive and And storing the error correction policy set input by the user, and having the function of obtaining the execution result of the current task on the node, and determining the function of the error correction policy corresponding to the execution result according to the execution result and the preset error correction policy set.
  • the distributed task processing method provided by the present application is described as an example in which the server executes the distributed task processing method through the error processing framework.
  • FIG. 4 is a schematic flowchart of another distributed task processing method provided by the present application. As shown in FIG. 4, the embodiment relates to a server acquiring an execution result of a current task on a node through an error processing framework, and passing the error.
  • the processing framework determines the specific process of the error correction strategy for the node.
  • Step 1 The user presets a set of error correction strategies in the error handling framework.
  • the user may directly preset the error correction policy set in the error processing framework of the server, and may further preset the error correction policy set by using a model definition manner.
  • the above-mentioned error correction policy set may be preset by defining a model, or the above error correction strategy may be preset by defining two models. set. If the user presets the error correction policy set by defining two models, the error correction policy set may include a first between an identifier of each task in the service flow and an identifier of the error correction policy subset corresponding to the task.
  • the first mapping relationship between the identifiers may be a model (in the embodiment, the model is simply referred to as a service model), and the identifier of each of the error correction strategy subsets and the second between each error correction strategy subset
  • the mapping relationship can be a single model (in the present embodiment, the model is simply referred to as a policy model).
  • the service model of the foregoing error correction policy set may include a first mapping relationship between the identifier of the current task and the identifier of the error correction policy subset to which the first error correction policy belongs, and the error correction policy set.
  • the policy model may include a second mapping relationship between the identifier of the subset of error correction strategies and the subset of error correction strategies; wherein the subset of error correction strategies includes a third between the execution result and the first error correction strategy Mapping relations.
  • FIG. 3 shows an error processing framework in which the above-described error correction policy set is preset by defining a business model and a policy model.
  • the user sets the foregoing error correction policy set by defining a service model and a policy model in the error processing framework as an example.
  • the user can use the computer language supported by the error processing framework.
  • the above business model may include not only the identity of each task in the service flow.
  • the first mapping relationship between the identifiers of the error correction policy subsets corresponding to the task may also represent the execution order between the tasks included in the service flow.
  • the definition and layout file of the business model shown in FIG. 3 above may be as follows:
  • work represents a service flow
  • Job represents a "serial task sequence" that is executed in parallel with each other in the service flow.
  • the above one job may include at least one task task (the number of tasks included in the above job is specific to the service flow)
  • the name under each task task is the identifier of the task
  • the policy under each task task is the identifier of the subset of error correction strategies corresponding to each task task. That is to say, the first mapping relationship between the identifier of each task and the identifier of the subset of error correction policies corresponding to each task is established by name and policy under each task.
  • task1.1 is the identifier of the first task under the serial task sequence Job1
  • policy-1.1 in task1.1 is the identifier of the error correction strategy subset corresponding to task1.1.
  • the execution order between tasks in the business model shown in the above example may also be the business model on the left side in the “business and policy model” in FIG. 3 .
  • work represents a business flow
  • a job represents a "serial task sequence" that is executed in parallel with each other in a business flow
  • a task represents a task in a business flow
  • an arrow between each task points to Indicates the order of execution between tasks in the business flow.
  • the above policy model may include a second mapping relationship between the identifier of the error correction strategy subset corresponding to the identifier of each task and the subset of error correction strategies.
  • the user may define the error correction strategy subset according to the error processing situation in the history execution process according to the task corresponding to the error correction policy subset.
  • the subset of error correction strategies corresponding to each task may include: an error correction strategy corresponding to the execution result obtained by the task under different error conditions.
  • the definition and layout file of the second mapping relationship between the identifier of a subset of the error correction strategy in the policy model and the subset of the error correction strategy may be as follows:
  • the identifier of the error correction policy subset in the foregoing service model is a subset of the error correction strategy corresponding to policy-1.1, wherein each rule in the rules represents each execution result and correction in the subset of the error correction strategy.
  • policy-1.1 includes three rules (ie, rule1, rule2, and rule3), that is, the correspondence between the three execution results and the error correction strategy is included, taking rule1 as an example, and the rule1 indicates the execution result ( That is, when the condition is 0, the error correction policy is to ignore and continue the action, that is, the action to be performed by the node in the next step is the action of ignoring and continuing.
  • the error correction policy may further include the number of retry attempts, and if the action corresponding to the error correction policy (ie, the action) is a rollback, Then, the above error correction strategy may further include an operation code for performing rollback, and the like.
  • each of the foregoing error correction policies may correspond to only one execution result, or may correspond to multiple execution results, and may be determined according to the specific content and type of the task.
  • the subset of the error correction strategy corresponding to one task in the policy model shown in the above example may also be shown in the policy model on the right side in the service and policy model in FIG. 3, as shown in FIG. A certain task in the service flow, the Policy represents a subset of the error correction strategy corresponding to the task, and the correspondence between the condition and the action may represent a correspondence between the execution result of the error correction strategy subset and the error correction strategy.
  • the error processing framework may first parse the business model and the policy model to be converted into computer speech recognizable by the error processing framework, and convert the converted
  • the business model and the policy model are stored in the distributed system to store the stored storage database, for example, a relational database, a file system database, etc., to prevent the business model and the policy model from being lost due to server power failure.
  • the foregoing storage database may be a database integrated on the server, or may be a database server independent of the server, which is not limited in this application.
  • Step 2 The server imports the preset error correction policy set into the error handling framework.
  • the error processing framework is provided with an execution result coordinator, and the execution result coordinator may import a preset error correction policy set, so that the execution result coordinator can execute according to the acquired node.
  • the execution result of any task of the service flow, and the preset error correction policy determines an error correction strategy corresponding to the execution result.
  • the server can store the business model and the policy model stored in the storage database (ie, the preset error correction strategy by running the execution result coordinator in the error processing framework).
  • the collection is imported into the error handling framework (ie the model input shown in Figure 3).
  • the method may include:
  • the server obtains an execution result according to a result returned by the node execution callback function.
  • the end of the service code of each task in the foregoing service flow is provided with a callback function
  • the callback function may carry a communication address of the execution result coordinator on the error processing framework (for example: IP address (address), so that the node performing the task can send the execution result of the task to the execution result coordinator in the error processing framework on the server through the callback function.
  • IP address address
  • the server can monitor the node through an execution result coordinator on the error handling framework running thereon to obtain the currently executed task on the node.
  • Execution result that is, the process monitoring function shown in Figure 3).
  • the above callback function may be, for example, a PostEnd function in the hook function HOOK shown in FIG. 3. It should be noted that, in the foregoing, how to set the callback function in each task of the service flow can be referred to the prior art, and the details are not described herein again.
  • the server determines, according to the identifier of the current task, the first mapping relationship, and the second mapping relationship, a subset of the error correction policies corresponding to the current task.
  • the execution result coordinator can identify the first mapping according to the current task.
  • the relationship (ie, the business model) and the second mapping relationship (ie, the policy model) determine a subset of the error correction strategies corresponding to the current task (ie, the policy match shown in FIG. 3).
  • the execution result coordinator may first obtain the identifier of the error correction policy subset corresponding to the identifier of the current task in the service model according to the identifier of the current task carried in the execution result, and further, according to the error correction strategy.
  • the identifier of the set obtains a subset of the error correction strategy corresponding to the identifier of the subset of the error correction strategy in the policy model, so that the error correction policy corresponding to the execution result may be searched in the error correction policy subset according to the execution result.
  • the execution result received by the execution result coordinator is 1 and the task identifier carried by the execution result is task1.1, and the execution result coordinator can first search for the service model according to the task identifier.
  • the identifier of the error correction policy subset corresponding to the task identifier that is, policy-1.1, and then, according to the identifier of the error correction policy subset, find a subset of the error correction strategy corresponding to the identifier of the error correction policy subset in the policy model.
  • the server determines the first error correction policy according to the execution result and the third mapping relationship in the subset of error correction policies.
  • the execution result coordinator in the error processing framework running on the server determines the subset of the error correction policy corresponding to the current task
  • the execution result may be found and executed in the error correction policy subset corresponding to the current task according to the execution result.
  • the result corresponds to the first error correction strategy (ie, the strategy match shown in Figure 3).
  • the first error correction policy corresponding to the execution result found in policy-1.1 is a retry according to the execution result 1 described above.
  • the first error correction policy corresponding to the execution result is ignored and continues, indicating that the execution result is a correct execution result.
  • the server sends the first error correction policy to the node.
  • the execution result coordinator in the error processing framework running on the server may send the first error correction policy to the node through the callback function in the task, so that the node
  • the action corresponding to the first error correction strategy ie, the action execution shown in FIG. 3 may be performed.
  • the method may further include: the server generating an execution report of the distributed task (that is, the execution report shown in FIG. 3), By checking the execution report, the maintenance personnel of the service flow can know the errors that occur in the execution of the service flow and improve the user experience.
  • the distributed task processing method provided by the application by using the error processing framework running on the server, can obtain the execution result of the current task on each node when the nodes of the distributed system perform different tasks of the same service flow. And obtaining, according to the execution result, a first error correction policy corresponding to the execution result in a preset error correction policy set, so that the node can be automatically sent by sending the first error correction policy to the node. Handling the task encountered an error during the run. In this way, error correction code can no longer be coupled to the business code of each task of the distributed task. Therefore, when the developer needs to add an error correction code segment to a task in the distributed task, only Updating the error correction policy set in the error handling framework of the server does not require recompiling the distributed task, thereby improving the reliability of the distributed task.
  • FIG. 5 is a schematic structural diagram of a distributed task processing apparatus provided by the present application.
  • the distributed task processing apparatus may be implemented as part or all of a server by software, hardware, or a combination of the two.
  • the distributed task processing apparatus may include: a first obtaining module 11, a second acquiring module 12, and a sending module 13;
  • the first obtaining module 11 is configured to obtain an execution result of a current task on a node in the distributed system; the distributed system includes multiple nodes, each node includes at least one task of the same service flow; and the execution result carries the current task Identification
  • the second obtaining module 12 is configured to obtain, according to the identifier and the execution result of the current task, the first error correction policy corresponding to the current task from the preset error correction policy set; the error correction policy set includes the identifier of the current task, the execution result, and Corresponding relationship between the first error correction strategies; the first error correction strategy is used to indicate the actions performed by the node in the next step;
  • the sending module 13 is configured to send the first error correction policy acquired by the second obtaining module 12 to the node.
  • the distributed task processing apparatus may perform the foregoing method embodiments, and the implementation principles and technical effects thereof are similar, and details are not described herein again.
  • the action performed by the node in the foregoing may include: scrolling back, ignoring, and continuing, stopping, and retrying.
  • the foregoing error correction policy set may include: a first mapping relationship between an identifier of the current task and an identifier of the error correction policy subset to which the first error correction policy belongs, And a second mapping relationship between the identifier of the error correction strategy subset and the error correction strategy subset; the error correction strategy subset includes a third mapping relationship between the execution result and the first error correction strategy.
  • FIG. 6 is a schematic structural diagram of another distributed task processing apparatus provided by the present application.
  • the second acquiring module is used. 12 may include: a first determining unit 121 and a second determining unit 122; wherein
  • the first determining unit 121 is configured to determine, according to the identifier of the current task, the first mapping relationship, and the second mapping relationship, a subset of the error correction policies corresponding to the current task;
  • the second determining unit 122 is configured to determine the first error correction policy according to the execution result and the third mapping relationship in the subset of error correction policies determined by the first determining unit 121.
  • the foregoing first obtaining module 11 may be specifically configured to obtain an execution result according to a result returned by the node execution callback function.
  • the distributed task processing apparatus may perform the foregoing method embodiments, and the implementation principles and technical effects thereof are similar, and details are not described herein again.
  • FIG. 7 is a schematic structural diagram of still another distributed task processing apparatus according to the present application. As shown in FIG. 7, the distributed task processing apparatus may further include:
  • the generating module 14 is configured to: after the sending module 13 sends the first error correction policy to the node, if the node is a processing industry The last node of the transaction flow generates an execution report of the distributed task.
  • the distributed task processing apparatus may perform the foregoing method embodiments, and the implementation principles and technical effects thereof are similar, and details are not described herein again.
  • FIG. 8 is a schematic structural diagram of still another distributed task processing apparatus provided by the present application.
  • the distributed task processing apparatus may include: a processor 21 and a transmitter 22;
  • the processor 21 is configured to obtain an execution result of the current task on the node in the distributed system, and obtain a first error correction policy corresponding to the current task from the preset error correction policy set according to the identifier and the execution result of the current task.
  • the distributed system includes multiple nodes, each node including at least one task of the same service flow; the execution result carries the identifier of the current task; the error correction policy set includes the identifier of the current task, the execution result and the first error correction policy Correspondence relationship; the first error correction strategy is used to indicate the action performed by the node in the next step;
  • the transmitter 22 is configured to send the first error correction policy acquired by the processor 21 to the node.
  • the distributed task processing apparatus may perform the foregoing method embodiments, and the implementation principles and technical effects thereof are similar, and details are not described herein again.
  • the action performed by the node in the foregoing may include: scrolling back, ignoring, and continuing, stopping, and retrying.
  • the foregoing error correction policy set may include: a first mapping relationship between an identifier of the current task and an identifier of the error correction policy subset to which the first error correction policy belongs, And a second mapping relationship between the identifier of the error correction strategy subset and the error correction strategy subset; the error correction strategy subset includes a third mapping relationship between the execution result and the first error correction strategy.
  • the processor 21 is configured to obtain an execution result of the current task on the node in the distributed system. Specifically, the processor 21 obtains an execution result according to the result returned by the node execution callback function.
  • the processor 21 is configured to obtain, according to the identifier and the execution result of the current task, the first error correction policy corresponding to the current task from the preset error correction policy set, where the processor 21 may be: Determining, according to the identifier of the current task, the first mapping relationship and the second mapping relationship, the subset of the error correction policy corresponding to the current task, and determining the first error correction policy according to the execution result and the third mapping relationship in the subset of the error correction policy.
  • the processor 21 may be further configured to: after the transmitter 22 sends the first error correction policy to the node, if the node is the last node of the processing service flow, generate the distributed The execution report of the task.
  • the distributed task processing apparatus may perform the foregoing method embodiments, and the implementation principles and technical effects thereof are similar, and details are not described herein again.

Abstract

Provided are a distributed task processing method and apparatus. The method comprises: a server acquiring an execution result of a current task on a node in a distributed system, wherein the distributed system comprises a plurality of nodes, and each node comprises at least one task of the same service flow, and the execution result carries an identifier of the current task; according to the identifier of the current task and the execution result, the server acquiring a first error correction policy corresponding to the current task from a pre-set error correction policy set, wherein the error correction policy set comprises a correlation between the identifier of the current task, the execution result and the first error correction policy, and the first error correction policy is used for indicating an action to be executed in the next step by the node; and the server sending the first error correction policy to the node. The distributed task processing method and apparatus provided in the present application can improve the reliability of distributed tasks.

Description

分布式任务处理方法和装置Distributed task processing method and device 技术领域Technical field
本申请涉及计算机技术,尤其涉及一种分布式任务处理方法和装置。The present application relates to computer technology, and in particular, to a distributed task processing method and apparatus.
背景技术Background technique
分布式系统是将不同地点、或者具有不同功能、或者拥有不同数据的多台计算机通过通信网络连接起来,在控制系统的统一管理控制下,协调完成大规模信息处理任务的计算机系统。目前,分布式系统可以将任一分布式任务所包括的各个任务分布在不同计算机上执行,以提高分布式任务的执行速度。A distributed system is a computer system that connects multiple computers at different locations or with different functions or with different data through a communication network to coordinate large-scale information processing tasks under the unified management and control of the control system. Currently, distributed systems can distribute the tasks included in any distributed task on different computers to improve the execution speed of distributed tasks.
现有技术中,开发人员在开发上述分布式任务时,会将分布式任务中的每个任务的纠错代码(即用于错误处理的代码)耦合在每一任务的业务代码中,以使得分布式任务在运行过程中,能够自动处理任意任务的错误,提高了分布式任务的运行效率。In the prior art, when developers develop the above distributed tasks, the error correction code (ie, the code for error processing) of each task in the distributed task is coupled into the service code of each task, so that Distributed tasks can automatically handle errors of any task during the running process, improving the efficiency of distributed tasks.
然而,由于上述分布式任务中的每个任务的纠错代码与业务代码耦合性较强,使得分布式任务的可靠性较低。However, since the error correction code of each task in the above distributed task is strongly coupled with the service code, the reliability of the distributed task is low.
发明内容Summary of the invention
本申请提供一种分布式任务处理方法和装置,用于解决现有技术中因分布式任务中的每个任务的纠错代码与业务代码耦合性较强,使得分布式任务的可靠性较低的技术问题。The present application provides a distributed task processing method and apparatus for solving the problem that the error correction code of each task in the distributed task is coupled with the service code in the prior art, so that the reliability of the distributed task is low. Technical problem.
第一方面,本申请提供一种分布式任务处理方法,所述方法包括:In a first aspect, the application provides a distributed task processing method, where the method includes:
服务器获取分布式系统中节点上的当前任务的执行结果;所述分布式系统包括多个节点,每个节点上包括同一业务流的至少一个任务;所述执行结果中携带所述当前任务的标识;The server obtains an execution result of the current task on the node in the distributed system; the distributed system includes multiple nodes, each node includes at least one task of the same service flow; and the execution result carries the identifier of the current task ;
所述服务器根据所述当前任务的标识和所述执行结果,从预设的纠错策略集合中获取所述当前任务对应的第一纠错策略;所述纠错策略集合包括所述当前任务的标识、所述执行结果与所述第一纠错策略之间的对应关系;所述第一纠错策略用于指示所述节点下一步所执行的动作;Obtaining, by the server according to the identifier of the current task and the execution result, a first error correction policy corresponding to the current task from a preset error correction policy set; the error correction policy set includes the current task And a corresponding relationship between the execution result and the first error correction policy; the first error correction policy is used to indicate an action performed by the node in the next step;
所述服务器将所述第一纠错策略发送给所述节点。The server sends the first error correction policy to the node.
通过第一方面提供的分布式任务处理方法,服务器在分布式系统的节点执行同一业务流的不同任务时,可以获取每个节点上的当前任务的执行结果,并可以根据该执行结果,在预设的纠错策略集合中获取该执行结果所对应的第一纠错策略,从而通过将该第一纠错策略发送给节点的方式,可以使节点自动处理该任务在运行过程中出现错误。通过这种方式,可以使分布式任务的每个任务的业务代码中不再耦合有纠错代码,因此,开发人员后续需要为分布式任务中的某一任务新增纠错代码段时,仅在服务器侧更新该纠错策略集合即可,不需要再重新编译该分布式任务,提高了分布式任务的可靠性。 According to the distributed task processing method provided by the first aspect, when the node of the distributed system performs different tasks of the same service flow, the server may obtain the execution result of the current task on each node, and may pre-predict according to the execution result. The first error correction policy corresponding to the execution result is obtained in the set of the error correction policy, so that the node automatically processes the error of the task during the operation by sending the first error correction policy to the node. In this way, error correction code can no longer be coupled to the business code of each task of the distributed task. Therefore, when the developer needs to add an error correction code segment to a task in the distributed task, only Updating the error correction policy set on the server side does not require recompiling the distributed task, thereby improving the reliability of the distributed task.
可选的,在第一方面的一种可能的实施方式中,所述服务器根据所述当前任务的标识和所述执行结果,从预设的纠错策略集合中获取所述当前任务对应的第一纠错策略,具体包括:Optionally, in a possible implementation manner of the first aspect, the server, according to the identifier of the current task and the execution result, obtain, according to the preset error correction policy set, the current task corresponding to the first task An error correction strategy specifically includes:
所述服务器根据所述当前任务的标识、所述第一映射关系和所述第二映射关系确定所述当前任务对应的纠错策略子集;Determining, by the server, a subset of error correction strategies corresponding to the current task according to the identifier of the current task, the first mapping relationship, and the second mapping relationship;
所述服务器根据所述执行结果和所述纠错策略子集中的第三映射关系,确定所述第一纠错策略。The server determines the first error correction policy according to the execution result and a third mapping relationship in the subset of error correction policies.
可选的,在第一方面的一种可能的实施方式中,所述服务器获取分布式系统中节点上的当前任务的执行结果,具体包括:Optionally, in a possible implementation manner of the first aspect, the server obtains an execution result of a current task on a node in the distributed system, and specifically includes:
所述服务器根据所述节点执行回调函数返回的结果,获取所述执行结果。The server acquires the execution result according to a result returned by the node executing a callback function.
可选的,在第一方面的一种可能的实施方式中,所述服务器将所述第一纠错策略发送给所述节点之后,所述方法还包括:Optionally, in a possible implementation manner of the first aspect, after the server sends the first error correction policy to the node, the method further includes:
若所述节点为处理所述业务流的最后一个节点,则所述服务器生成分布式任务的执行报告。If the node is the last node that processes the traffic flow, the server generates an execution report of the distributed task.
通过该可能的实施方式提供的分布式任务处理方法,使得该业务流的维护人员通过查阅该执行报告,可以获知该业务流在执行过程中所出现的错误,提高了用户体验。The distributed task processing method provided by the possible implementation manner enables the maintenance personnel of the service flow to learn the error of the service flow during the execution process by consulting the execution report, thereby improving the user experience.
第二方面,本申请提供一种分布式任务处理装置,所述装置包括:In a second aspect, the application provides a distributed task processing apparatus, where the apparatus includes:
第一获取模块,用于获取分布式系统中节点上的当前任务的执行结果;所述分布式系统包括多个节点,每个节点上包括同一业务流的至少一个任务;所述执行结果中携带所述当前任务的标识;a first obtaining module, configured to obtain an execution result of a current task on a node in the distributed system; the distributed system includes multiple nodes, each node including at least one task of the same service flow; and the execution result carries The identifier of the current task;
第二获取模块,用于根据所述当前任务的标识和所述执行结果,从预设的纠错策略集合中获取所述当前任务对应的第一纠错策略;所述纠错策略集合包括所述当前任务的标识、所述执行结果与所述第一纠错策略之间的对应关系;所述第一纠错策略用于指示所述节点下一步所执行的动作;a second acquiring module, configured to acquire, according to the identifier of the current task and the execution result, a first error correction policy corresponding to the current task from a preset error correction policy set; the error correction policy set includes Determining, by the identifier of the current task, a correspondence between the execution result and the first error correction policy; the first error correction policy is used to indicate an action performed by the node in the next step;
发送模块,用于将所述第二获取模块获取的所述第一纠错策略发送给所述节点。And a sending module, configured to send the first error correction policy acquired by the second acquiring module to the node.
可选的,在第二方面的一种可能的实施方式中,所述第二获取模块包括:Optionally, in a possible implementation manner of the second aspect, the second acquiring module includes:
第一确定单元,用于根据所述当前任务的标识、所述第一映射关系和所述第二映射关系确定所述当前任务对应的纠错策略子集;a first determining unit, configured to determine, according to the identifier of the current task, the first mapping relationship, and the second mapping relationship, a subset of error correction policies corresponding to the current task;
第二确定单元,用于根据所述执行结果和所述第一确定单元确定的所述纠错策略子集中的第三映射关系,确定所述第一纠错策略。a second determining unit, configured to determine the first error correction policy according to the execution result and a third mapping relationship in the subset of error correction policies determined by the first determining unit.
可选的,在第二方面的一种可能的实施方式中,所述第一获取模块,具体用于根据所述节点执行回调函数返回的结果,获取所述执行结果。Optionally, in a possible implementation manner of the second aspect, the first acquiring module is configured to obtain the execution result according to a result returned by the node executing a callback function.
可选的,在第二方面的一种可能的实施方式中,所述装置还包括:Optionally, in a possible implementation manner of the second aspect, the device further includes:
生成模块,用于在所述发送模块将所述第一纠错策略发送给所述节点之后,若所述节点为处理所述业务流的最后一个节点,则生成分布式任务的执行报告。And a generating module, configured to: after the sending module sends the first error correction policy to the node, if the node is the last node that processes the service flow, generate an execution report of the distributed task.
上述第二方面以及第二方面的各可能的实施方式所提供的分布式任务处理装置,其有益效果可以参见上述第一方面和第一方面的各可能的实施方式所带来的有益效果,在此不再赘述。For the beneficial effects of the above-mentioned second aspect and the distributed task processing apparatus provided by the possible embodiments of the second aspect, reference may be made to the beneficial effects brought by the first aspect and the possible implementation manners of the first aspect, This will not be repeated here.
第三方面,本申请提供一种分布式任务处理装置,所述装置包括: In a third aspect, the application provides a distributed task processing apparatus, where the apparatus includes:
处理器,用于获取分布式系统中节点上的当前任务的执行结果,并根据所述当前任务的标识和所述执行结果,从预设的纠错策略集合中获取所述当前任务对应的第一纠错策略;所述分布式系统包括多个节点,每个节点上包括同一业务流的至少一个任务;所述执行结果中携带所述当前任务的标识;所述纠错策略集合包括所述当前任务的标识、所述执行结果与所述第一纠错策略之间的对应关系;所述第一纠错策略用于指示所述节点下一步所执行的动作;And a processor, configured to obtain an execution result of the current task on the node in the distributed system, and obtain, according to the identifier of the current task and the execution result, the first task corresponding to the current task from the preset error correction policy set An error correction strategy; the distributed system includes a plurality of nodes, each node including at least one task of the same service flow; the execution result carries an identifier of the current task; and the error correction policy set includes the Corresponding relationship between the identifier of the current task, the execution result, and the first error correction policy; the first error correction policy is used to indicate an action performed by the node in the next step;
发送器,用于将所述处理器获取的所述第一纠错策略发送给所述节点。And a transmitter, configured to send the first error correction policy acquired by the processor to the node.
可选的,在第三方面的一种可能的实施方式中,所述处理器,用于根据所述当前任务的标识和所述执行结果,从预设的纠错策略集合中获取所述当前任务对应的第一纠错策略,具体为:Optionally, in a possible implementation manner of the third aspect, the processor is configured to obtain, according to the identifier of the current task and the execution result, the current error correction policy set. The first error correction strategy corresponding to the task is specifically:
所述处理器,具体用于根据所述当前任务的标识、所述第一映射关系和所述第二映射关系确定所述当前任务对应的纠错策略子集,并根据所述执行结果和所述纠错策略子集中的第三映射关系,确定所述第一纠错策略。The processor is configured to determine, according to the identifier of the current task, the first mapping relationship, and the second mapping relationship, a subset of error correction policies corresponding to the current task, and according to the execution result and the location Determining the first error correction strategy by describing a third mapping relationship in the subset of error correction strategies.
可选的,在第三方面的一种可能的实施方式中,所述处理器,用于获取分布式系统中节点上的当前任务的执行结果,具体为:Optionally, in a possible implementation manner of the third aspect, the processor is configured to obtain an execution result of a current task on a node in a distributed system, specifically:
所述处理器,具体用于根据所述节点执行回调函数返回的结果,获取所述执行结果。The processor is specifically configured to obtain the execution result according to a result returned by the node executing a callback function.
可选的,在第三方面的一种可能的实施方式中,所述处理器,还用于在所述发送器将所述第一纠错策略发送给所述节点之后,若所述节点为处理所述业务流的最后一个节点,则生成分布式任务的执行报告。Optionally, in a possible implementation manner of the third aspect, the processor is further configured to: after the sending, by the sender, the first error correction policy to the node, if the node is Processing the last node of the service flow generates an execution report of the distributed task.
上述第三方面以及第三方面的各可能的实施方式所提供的分布式任务处理装置,其有益效果可以参见上述第一方面和第一方面的各可能的实施方式所带来的有益效果,在此不再赘述。For the beneficial effects of the distributed task processing apparatus provided by the foregoing third aspect and the possible implementation manners of the third aspect, reference may be made to the beneficial effects brought by the foregoing first aspect and the possible implementation manners of the first aspect, This will not be repeated here.
结合上述第一方面以及第一方面的各可能的实施方式、第二方面以及第二方面的各可能的实施方式、第三方面以及第三方面的各可能的实施方式,所述纠错策略集合具体包括:The error correction policy set is combined with the foregoing first aspect, and the possible implementation manners of the first aspect, the second aspect, and the possible implementation manners of the second aspect, the third aspect, and the possible implementation manners of the third aspect. Specifically include:
所述当前任务的标识与所述第一纠错策略所属的纠错策略子集的标识之间的第一映射关系,以及,所述纠错策略子集的标识与所述纠错策略子集之间的第二映射关系;所述纠错策略子集中包括所述执行结果与所述第一纠错策略的之间的第三映射关系。a first mapping relationship between an identifier of the current task and an identifier of the error correction policy subset to which the first error correction policy belongs, and an identifier of the error correction policy subset and the error correction strategy subset a second mapping relationship between the error correction policy subsets includes a third mapping relationship between the execution result and the first error correction policy.
结合上述第一方面以及第一方面的各可能的实施方式、第二方面以及第二方面的各可能的实施方式、第三方面以及第三方面的各可能的实施方式,所述节点下一步所执行的动作包括:回滚动作、忽略并继续动作、暂停、重试中的任一个。With reference to the first aspect, and the possible implementation manners of the first aspect, the second aspect, and the possible implementation manners of the second aspect, the third aspect, and the possible implementation manners of the third aspect, the node next step The actions performed include: scrolling back, ignoring and continuing the action, pause, and retry.
本申请提供的分布式任务处理方法和装置,服务器在分布式系统的节点执行同一业务流的不同任务时,可以获取每个节点上的当前任务的执行结果,并可以根据该执行结果,在预设的纠错策略集合中获取该执行结果所对应的第一纠错策略,从而通过将该第一纠错策略发送给节点的方式,可以使节点自动处理该任务在运行过程中出现错误。通过这种方式,可以使分布式任务的每个任务的业务代码中不再耦合有纠错代码,因此,开发人员后续需要为分布式任务中的某一任务新增纠错代码段时,仅在服务器侧更新该纠错策略集合即可,不需要再重新编译该分布式任务,提高了分布式任务的可靠性。 The distributed task processing method and apparatus provided by the present application, when a node of a distributed system performs different tasks of the same service flow, the server may obtain an execution result of the current task on each node, and may pre-predict according to the execution result. The first error correction policy corresponding to the execution result is obtained in the set of the error correction policy, so that the node automatically processes the error of the task during the operation by sending the first error correction policy to the node. In this way, error correction code can no longer be coupled to the business code of each task of the distributed task. Therefore, when the developer needs to add an error correction code segment to a task in the distributed task, only Updating the error correction policy set on the server side does not require recompiling the distributed task, thereby improving the reliability of the distributed task.
附图说明DRAWINGS
图1为本申请提供的分布式任务处理方法的系统架构图;1 is a system architecture diagram of a distributed task processing method provided by the present application;
图2为本申请提供的一种分布式任务处理方法的流程示意图;2 is a schematic flowchart of a distributed task processing method provided by the present application;
图3为本申请提供的错误处理框架示意图;3 is a schematic diagram of an error processing framework provided by the present application;
图4为本申请提供的另一种分布式任务处理方法的流程示意图;4 is a schematic flowchart of another distributed task processing method provided by the present application;
图5为本申请提供的一种分布式任务处理装置的结构示意图;FIG. 5 is a schematic structural diagram of a distributed task processing apparatus according to the present application; FIG.
图6为本申请提供的另一种分布式任务处理装置的结构示意图;FIG. 6 is a schematic structural diagram of another distributed task processing apparatus provided by the present application; FIG.
图7为本申请提供的又一种分布式任务处理装置的结构示意图;FIG. 7 is a schematic structural diagram of still another distributed task processing apparatus provided by the present application; FIG.
图8为本申请提供的又一种分布式任务处理装置的结构示意图。FIG. 8 is a schematic structural diagram of still another distributed task processing apparatus provided by the present application.
具体实施方式detailed description
图1为本申请提供的分布式任务处理方法的系统架构图。如图1所示,该分布式系统为一个典型的分布式系统,该分布式系统可以包括一个用于管理各个节点的服务器(例如:图1所示的管理服务器)、多个处理分布式任务的节点(例如:图1所示的业务服务器)、用于服务器和各节点之间进行通信的交换机。可选的,上述分布式系统中还可以包括:至少一个设置有数据库的服务器(例如:图1所示的数据库服务器)、以及用于存储分布式系统中的相关数据的服务器(例如:图1所示的存储服务器)。上述服务器可以协调各个节点并行或串行执行分布式任务中的任务,其中,每个节点可以执行分布式任务中的一个或多个的串行任务,以提高分布式任务的执行速度。FIG. 1 is a system architecture diagram of a distributed task processing method provided by the present application. As shown in FIG. 1, the distributed system is a typical distributed system, and the distributed system may include a server for managing each node (for example, the management server shown in FIG. 1), and multiple processing distributed tasks. A node (for example, the service server shown in Figure 1), a switch for communication between the server and each node. Optionally, the foregoing distributed system may further include: at least one server provided with a database (for example, the database server shown in FIG. 1), and a server for storing related data in the distributed system (for example: FIG. 1 The storage server shown). The above-described server can coordinate the nodes to perform tasks in distributed tasks in parallel or serially, wherein each node can perform one or more serial tasks in the distributed tasks to improve the execution speed of the distributed tasks.
需要说明的是,上述图1所示的分布式系统仅为一种示例,本申请所提供的分布式任务处理方法可以适用于任一分布式系统,并不以图1为限。It should be noted that the distributed system shown in FIG. 1 is only an example, and the distributed task processing method provided by the present application can be applied to any distributed system, and is not limited to FIG. 1 .
现有技术中,开发人员在开发分布式任务时,会将每个任务的纠错代码(例如:错误码(即error code)、异常处理(即exception handling))耦合在该任务的业务代码中,以使得上述分布式系统的某一节点在运行该任务时,可以根据该任务的执行结果所对应的纠错代码段,自动处理该任务运行过程中的错误,以提高分布式任务的运行效率。In the prior art, when developers develop distributed tasks, the error correction code (for example, error code, exception handling) of each task is coupled into the service code of the task. Therefore, when a certain node of the distributed system is running the task, the error in the running process of the task can be automatically processed according to the error correction code segment corresponding to the execution result of the task, so as to improve the running efficiency of the distributed task. .
然而,由于上述每个任务中与业务代码耦合在一起的纠错代码,为开发人员根据历史处理错误的经验所编写的,使得上述每个任务中的纠错代码可能无法涵盖该任务在实际运行过程中出现的全部错误。因此,当上述某一任务在执行过程中所得到的执行结果无对应的纠错代码段时,即出现纠错代码无法处理的新的错误时,开发人员需要将处理该新的错误的纠错代码段添加至该任务的业务代码中,并需要重新编译该分布式任务。由于该分布式任务在重新编译后存在无法运行的风险,使得该分布式任务的可靠性较低。However, since the error correction code coupled with the business code in each of the above tasks is written by the developer based on the experience of historical processing errors, the error correction code in each of the above tasks may not cover the actual operation of the task. All errors that occurred during the process. Therefore, when the execution result obtained by the above task in the execution process has no corresponding error correction code segment, that is, a new error that cannot be processed by the error correction code occurs, the developer needs to correct the error of the new error. The code snippet is added to the business code of the task and the distributed task needs to be recompiled. Since the distributed task has the risk of being inoperable after recompilation, the reliability of the distributed task is low.
而本申请提供的分布式任务处理方法,不再将每个任务的纠错代码耦合在每个任务的业务代码中,而是将该纠错代码预设在分布式系统中用于控制各个节点的服务器上。这样,在分布式系统中的节点执行该任务时,服务器可以通过获取该任务的执行结果的方式,使用自身预设的纠错代码来确定节点的下一步动作,并指示节点执行该动作以纠正该任务所出现的错误,以实现节点可以自动处理该任务在运行过程中出现错误的目的。通过这种方式,可以使分布式任务的每个任务的业务代码中不再耦合有纠错代码,因 此,即便后续开发人员需要为分布式任务中的某一任务新增纠错代码段时,仅需要在服务器侧更新该纠错代码即可,不需要再重新编译该分布式任务,提高了分布式任务的可靠性。因此,本申请提供的分布式任务处理方法和装置,旨在解决现有技术中因分布式任务中的每个任务的纠错代码与业务代码耦合性较强,使得分布式任务的可靠性较低的技术问题。The distributed task processing method provided by the present application no longer couples the error correction code of each task into the service code of each task, but presets the error correction code in a distributed system for controlling each node. On the server. In this way, when the node in the distributed system performs the task, the server can determine the next action of the node by using the error correction code preset by the server by obtaining the execution result of the task, and instruct the node to perform the action to correct the node. The error that occurs in the task is to achieve that the node can automatically handle the error of the task during the running process. In this way, error correction code can no longer be coupled to the business code of each task of the distributed task. Therefore, even if the subsequent developer needs to add an error correction code segment to a task in the distributed task, it only needs to update the error correction code on the server side, and does not need to recompile the distributed task, thereby improving the distribution. The reliability of the task. Therefore, the distributed task processing method and apparatus provided by the present application are aimed at solving the problem that the error correction code of each task in the distributed task is coupled with the service code in the prior art, so that the reliability of the distributed task is better. Low technical issues.
图2为本申请提供的一种分布式任务处理方法的流程示意图。如图2所示,本实施例涉及的是服务器根据节点上的当前任务的执行结果,为节点确定纠错策略的具体过程。如图1所示,该方法可以包括:FIG. 2 is a schematic flowchart diagram of a distributed task processing method provided by the present application. As shown in FIG. 2, this embodiment relates to a specific process for a server to determine an error correction policy for a node according to an execution result of a current task on a node. As shown in FIG. 1, the method can include:
S101、服务器获取分布式系统中节点上的当前任务的执行结果;其中,该分布式系统包括多个节点,每个节点上包括同一业务流的至少一个任务;该执行结果中携带当前任务的标识。S101: The server obtains an execution result of a current task on a node in the distributed system, where the distributed system includes multiple nodes, each node includes at least one task of the same service flow; and the execution result carries the identifier of the current task. .
具体的,上述分布式系统可以包括多个节点,这里所说的节点例如可以为上述图1所示的业务服务器,还可以为可以执行业务流中的任务的计算机、虚拟机等。上述业务流例如可以为:开通虚拟机业务流、开通云桌面业务流、开通云应用业务流、运维管理业务流等,该业务流可以包括多个任务,该业务流的多个任务可以分布在不同的节点上执行,因此,上述业务流可以看作一个分布式任务。Specifically, the distributed system may include multiple nodes. The node may be, for example, the service server shown in FIG. 1 above, or a computer, a virtual machine, or the like that can perform tasks in the service flow. The foregoing service flow may be: opening a virtual machine service flow, opening a cloud desktop service flow, opening a cloud application service flow, and an operation and maintenance management service flow, where the service flow may include multiple tasks, and multiple tasks of the service flow may be distributed. Executed on different nodes, so the above traffic can be seen as a distributed task.
在本实施例中,上述分布式系统中的服务器(例如可以为上述图1所示的管理服务器)不仅用于协调分布式系统的各个节点并行执行该业务流(即分布式任务)中的一个或多个任务,同时,还可以在每个节点执行完其上的每个任务后,获取每个任务的执行结果。可选的,上述服务器可以通过向每个节点发送请求获取当前任务的执行结果的消息,来获取分布式系统中每个节点返回的当前任务的执行结果,还可以通过接收每个节点主动发送的该节点当前任务的执行结果,来获取分布式系统中每个节点上的当前任务的执行结果。其中,上述执行结果中可以携带有当前任务的标识。In this embodiment, the server in the distributed system (for example, the management server shown in FIG. 1 above) is not only used to coordinate each node of the distributed system to execute one of the service flows (ie, distributed tasks) in parallel. Or multiple tasks, at the same time, after each node executes each task on it, the execution result of each task can be obtained. Optionally, the server may obtain the execution result of the current task returned by each node in the distributed system by sending a message requesting the execution result of the current task to each node, and may also receive the active sending by each node. The execution result of the current task of the node is used to obtain the execution result of the current task on each node in the distributed system. The foregoing execution result may carry the identifier of the current task.
S102、服务器根据当前任务的标识和执行结果,从预设的纠错策略集合中获取当前任务对应的第一纠错策略;其中,该纠错策略集合包括当前任务的标识、执行结果与第一纠错策略之间的对应关系;该第一纠错策略用于指示节点下一步所执行的动作。S102: The server obtains a first error correction policy corresponding to the current task from the preset error correction policy set according to the identifier and the execution result of the current task, where the error correction policy set includes the identifier, execution result, and first of the current task. Correspondence between error correction strategies; the first error correction strategy is used to indicate the actions performed by the node in the next step.
具体的,在本实施例中,上述服务器预设有该业务流(即分布式任务)的纠错策略集合,该纠错策略集合可以包括该业务流(即分布式任务)的每个任务的不同执行结果所对应的纠错策略。这样,当上述服务器获取到分布式系统的节点上得当前任务的执行结果之后,就可以根据该执行结果、以及执行结果中携带的当前任务的标识,在自身预设的纠错策略集合中查找该当前任务在该执行结果下所对应的纠错策略,即第一纠错策略。其中,该第一纠错策略可以用于指示节点下一步所执行的动作,这里所说的动作例如可以为回滚动作、忽略并继续动作、暂停、重试等。需要说明的是,该第一纠错策略所指示的节点下一步所执行的动作具体可以根据用户的需求,或者,该任务的类型设定,本申请对此不做限制。Specifically, in this embodiment, the server pre-sets an error correction policy set of the service flow (ie, a distributed task), and the error correction policy set may include each task of the service flow (ie, a distributed task). Error correction strategies corresponding to different execution results. In this way, after the server obtains the execution result of the current task on the node of the distributed system, the server may search for the preset error correction policy set according to the execution result and the identifier of the current task carried in the execution result. The error correction strategy corresponding to the current task under the execution result, that is, the first error correction strategy. The first error correction policy may be used to indicate an action performed by the node in the next step. The action described herein may be, for example, scrolling back, ignoring and continuing the action, pausing, retrying, and the like. It should be noted that the action performed by the node indicated by the first error correction policy may be specifically set according to the needs of the user or the type of the task, which is not limited in this application.
S103、服务器将第一纠错策略发送给节点。S103. The server sends the first error correction policy to the node.
具体的,当服务器根据当前任务的标识和执行结果,从预设的纠错策略集合中获取当前任务对应的第一纠错策略之后,就可以将该第一纠错策略返回给节点,以使得节点可以根据该第一纠错策略,执行该第一纠错策略所对应的动作。通过这种方式,可以使 节点根据服务器所发送的第一纠错策略,自动处理该任务在运行过程中出现错误,提高了分布式任务的运行效率。Specifically, after the server obtains the first error correction policy corresponding to the current task from the preset error correction policy set according to the identifier and the execution result of the current task, the first error correction policy may be returned to the node, so that The node may perform an action corresponding to the first error correction policy according to the first error correction policy. In this way, you can make According to the first error correction policy sent by the server, the node automatically processes the error of the task during the running process, and improves the running efficiency of the distributed task.
本申请提供的分布式任务处理方法,服务器在分布式系统的节点执行同一业务流的不同任务时,可以获取每个节点上的当前任务的执行结果,并可以根据该执行结果,在预设的纠错策略集合中获取该执行结果所对应的第一纠错策略,从而通过将该第一纠错策略发送给节点的方式,可以使节点自动处理该任务在运行过程中出现错误。通过这种方式,可以使分布式任务的每个任务的业务代码中不再耦合有纠错代码,因此,开发人员后续需要为分布式任务中的某一任务新增纠错代码段时,仅在服务器侧更新该纠错策略集合即可,不需要再重新编译该分布式任务,提高了分布式任务的可靠性。The distributed task processing method provided by the application, when the node of the distributed system performs different tasks of the same service flow, the server can obtain the execution result of the current task on each node, and can be preset according to the execution result. The first error correction policy corresponding to the execution result is obtained in the error correction policy set, so that the node automatically processes the task to have an error during the operation by sending the first error correction policy to the node. In this way, error correction code can no longer be coupled to the business code of each task of the distributed task. Therefore, when the developer needs to add an error correction code segment to a task in the distributed task, only Updating the error correction policy set on the server side does not require recompiling the distributed task, thereby improving the reliability of the distributed task.
进一步地,在上述实施例的基础上,本实施例中,上述服务器上运行有错误处理框架,图3为本申请提供的错误处理框架示意图,如图3所示,该错误处理框架可以接收和存储用户输入的纠错策略集合,以及,具有获取节点上的当前任务的执行结果,并根据该执行结果和预设的纠错策略集合确定该执行结果对应的纠错策略的功能。下面以服务器通过错误处理框架来执行分布式任务处理方法为例,对本申请提供的分布式任务处理方法进行介绍。Further, on the basis of the foregoing embodiment, in the embodiment, an error processing framework is run on the server, and FIG. 3 is a schematic diagram of an error processing framework provided by the present application. As shown in FIG. 3, the error processing framework can receive and And storing the error correction policy set input by the user, and having the function of obtaining the execution result of the current task on the node, and determining the function of the error correction policy corresponding to the execution result according to the execution result and the preset error correction policy set. The distributed task processing method provided by the present application is described as an example in which the server executes the distributed task processing method through the error processing framework.
图4为本申请提供的另一种分布式任务处理方法的流程示意图,如图4所示,本实施例涉及的是服务器通过错误处理框架获取节点上的当前任务的执行结果,并通过该错误处理框架为节点确定纠错策略的具体过程。FIG. 4 is a schematic flowchart of another distributed task processing method provided by the present application. As shown in FIG. 4, the embodiment relates to a server acquiring an execution result of a current task on a node through an error processing framework, and passing the error. The processing framework determines the specific process of the error correction strategy for the node.
在具体实施该方法之前,需要做如下准备工作:Before implementing this method, you need to do the following preparations:
第一步:用户在错误处理框架中预设纠错策略集合。Step 1: The user presets a set of error correction strategies in the error handling framework.
具体的,上述用户可以在服务器的错误处理框架中直接预设纠错策略集合,还可以通过模型定义的方式预设上述纠错策略集合。当用户通过模型定义和编排的方式预设上述纠错策略集合时,可以通过定义一个模型的方式来预设上述纠错策略集合,也可以通过定义两个模型的方式来预设上述纠错策略集合。若用户通过定义两个模型的方式来预设上述纠错策略集合,则上述纠错策略集合可以包括业务流中每个任务的标识与该任务对应的纠错策略子集的标识之间的第一映射关系,以及,每个纠错策略子集的标识与每个纠错策略子集之间的第二映射关系;其中,上述每个任务的标识与该任务对应的纠错策略子集的标识之间的第一映射关系可以为一个模型(在本实施例中,该模型简称为业务模型),上述每个纠错策略子集的标识与每个纠错策略子集之间的第二映射关系可以为一个单独的模型(在本实施例中,该模型简称为策略模型)。也就是说,上述纠错策略集合的业务模型可以包括上述所说的当前任务的标识与第一纠错策略所属的纠错策略子集的标识之间的第一映射关系,上述纠错策略集合的策略模型可以包括该纠错策略子集的标识与纠错策略子集之间的第二映射关系;其中,该纠错策略子集中包括执行结果与第一纠错策略的之间的第三映射关系。上述图3中示出的是以定义业务模型和策略模型来预设上述纠错策略集合为例的错误处理框架。Specifically, the user may directly preset the error correction policy set in the error processing framework of the server, and may further preset the error correction policy set by using a model definition manner. When the user presets the foregoing error correction policy set by means of model definition and orchestration, the above-mentioned error correction policy set may be preset by defining a model, or the above error correction strategy may be preset by defining two models. set. If the user presets the error correction policy set by defining two models, the error correction policy set may include a first between an identifier of each task in the service flow and an identifier of the error correction policy subset corresponding to the task. a mapping relationship, and a second mapping relationship between the identifier of each subset of error correction strategies and each subset of error correction strategies; wherein the identifier of each of the tasks is a subset of the error correction strategy corresponding to the task The first mapping relationship between the identifiers may be a model (in the embodiment, the model is simply referred to as a service model), and the identifier of each of the error correction strategy subsets and the second between each error correction strategy subset The mapping relationship can be a single model (in the present embodiment, the model is simply referred to as a policy model). That is, the service model of the foregoing error correction policy set may include a first mapping relationship between the identifier of the current task and the identifier of the error correction policy subset to which the first error correction policy belongs, and the error correction policy set. The policy model may include a second mapping relationship between the identifier of the subset of error correction strategies and the subset of error correction strategies; wherein the subset of error correction strategies includes a third between the execution result and the first error correction strategy Mapping relations. The above FIG. 3 shows an error processing framework in which the above-described error correction policy set is preset by defining a business model and a policy model.
继续参照图3,以用户在上述错误处理框架中通过定义业务模型和策略模型的方式来预设上述纠错策略集合为例,具体实现时,用户可以使用该错误处理框架所支持的计算机语言,先根据业务流所包括的任务,定义和编排上述业务模型和策略模型,例如:YAML、XML、JSON等。其中,上述业务模型不仅可以包括业务流中的每个任务的标识 与该任务对应的纠错策略子集的标识之间的第一映射关系,还可以表征业务流所包括的各任务之间的执行顺序。Continuing to refer to FIG. 3, the user sets the foregoing error correction policy set by defining a service model and a policy model in the error processing framework as an example. In specific implementation, the user can use the computer language supported by the error processing framework. First define and orchestrate the above business models and policy models based on the tasks included in the business flow, such as YAML, XML, JSON, and so on. The above business model may include not only the identity of each task in the service flow. The first mapping relationship between the identifiers of the error correction policy subsets corresponding to the task may also represent the execution order between the tasks included in the service flow.
示例性的,以YAML语言为例,上述图3中所示的业务模型的定义和编排文件例如可以如下:Exemplarily, taking the YAML language as an example, the definition and layout file of the business model shown in FIG. 3 above may be as follows:
Figure PCTCN2017079230-appb-000001
Figure PCTCN2017079230-appb-000001
Figure PCTCN2017079230-appb-000002
Figure PCTCN2017079230-appb-000002
在上述文件中,work表示业务流,Job表示业务流中相互并行执行的“串行任务序列”,上述一个Job可以包括至少一个任务task(上述Job中所包括的task的个数具体与业务流的架构有关),每个任务task下的name为该任务的标识,每个任务task下的policy为每个任务task所对应的纠错策略子集的标识。也就是说,通过每个task下的name和policy建立了每个任务的标识与每个任务对应的纠错策略子集的标识之间的第一映射关系。例如:task1.1为串行任务序列Job1下的第一个任务的标识,task1.1中的policy-1.1即为task1.1所对应的纠错策略子集的标识。In the above file, work represents a service flow, and Job represents a "serial task sequence" that is executed in parallel with each other in the service flow. The above one job may include at least one task task (the number of tasks included in the above job is specific to the service flow) Related to the architecture), the name under each task task is the identifier of the task, and the policy under each task task is the identifier of the subset of error correction strategies corresponding to each task task. That is to say, the first mapping relationship between the identifier of each task and the identifier of the subset of error correction policies corresponding to each task is established by name and policy under each task. For example, task1.1 is the identifier of the first task under the serial task sequence Job1, and policy-1.1 in task1.1 is the identifier of the error correction strategy subset corresponding to task1.1.
可选的,上述示例所示的业务模型中各任务之间的执行顺序还可以如图3中的“业务和策略模型”中左侧的业务模型。如图3所示,在该业务模型中,work表示业务流,Job表示业务流中相互并行执行的“串行任务序列”,task表示业务流中的一个任务,各task之间的箭头指向用于表示业务流中各任务之间的执行顺序。Optionally, the execution order between tasks in the business model shown in the above example may also be the business model on the left side in the “business and policy model” in FIG. 3 . As shown in FIG. 3, in the business model, work represents a business flow, a job represents a "serial task sequence" that is executed in parallel with each other in a business flow, a task represents a task in a business flow, and an arrow between each task points to Indicates the order of execution between tasks in the business flow.
上述策略模型可以包括每个任务的标识所对应的纠错策略子集的标识与纠错策略子集之间的第二映射关系。具体实现时,用户可以根据该纠错策略子集所对应的任务,在历史执行过程中的错误处理情况来定义该纠错策略子集。其中,每个任务对应的纠错策略子集可以包括:该任务在不同错误情况下所得到的执行结果对应的纠错策略。The above policy model may include a second mapping relationship between the identifier of the error correction strategy subset corresponding to the identifier of each task and the subset of error correction strategies. In a specific implementation, the user may define the error correction strategy subset according to the error processing situation in the history execution process according to the task corresponding to the error correction policy subset. The subset of error correction strategies corresponding to each task may include: an error correction strategy corresponding to the execution result obtained by the task under different error conditions.
示例性的,以YAML语言为例,上述策略模型中的一个纠错策略子集的标识与该纠错策略子集之间的第二映射关系的定义和编排文件例如可以如下:Exemplarily, taking the YAML language as an example, the definition and layout file of the second mapping relationship between the identifier of a subset of the error correction strategy in the policy model and the subset of the error correction strategy may be as follows:
Figure PCTCN2017079230-appb-000003
Figure PCTCN2017079230-appb-000003
Figure PCTCN2017079230-appb-000004
Figure PCTCN2017079230-appb-000004
上述文件示出了上述业务模型中纠错策略子集的标识为policy-1.1对应的纠错策略子集,其中,rules中的每个rule表示该纠错策略子集中的每个执行结果与纠错策略之间的对应关系,其中,每个rule中的condition表示执行结果,action表示该执行结果所对应的纠错策略。例如:上述代码中policy-1.1包括3个rule(即rule1、rule2、rule3),即包括了3个执行结果与纠错策略之间的对应关系,以rule1为例,该rule1表示当执行结果(即condition)为0时,纠错策略(action)为忽略并继续动作,即节点下一步要执行的动作为忽略并继续的动作。可选的,若上述纠错策略(即action)对应的动作为重试,则上述纠错策略还可以包括重试的次数,若上述纠错策略(即action)对应的动作为回滚动作,则上述纠错策略还可以包括执行回滚的操作代码等。The above document shows that the identifier of the error correction policy subset in the foregoing service model is a subset of the error correction strategy corresponding to policy-1.1, wherein each rule in the rules represents each execution result and correction in the subset of the error correction strategy. The correspondence between the wrong strategies, wherein the condition in each rule represents the execution result, and the action represents the error correction strategy corresponding to the execution result. For example, in the above code, policy-1.1 includes three rules (ie, rule1, rule2, and rule3), that is, the correspondence between the three execution results and the error correction strategy is included, taking rule1 as an example, and the rule1 indicates the execution result ( That is, when the condition is 0, the error correction policy is to ignore and continue the action, that is, the action to be performed by the node in the next step is the action of ignoring and continuing. Optionally, if the action corresponding to the error correction policy (ie, the action) is a retry, the error correction policy may further include the number of retry attempts, and if the action corresponding to the error correction policy (ie, the action) is a rollback, Then, the above error correction strategy may further include an operation code for performing rollback, and the like.
需要说明的是,上述每个纠错策略可以只对应一个执行结果,也可以对应多个执行结果,具体可以根据任务的具体内容和类型确定。It should be noted that each of the foregoing error correction policies may correspond to only one execution result, or may correspond to multiple execution results, and may be determined according to the specific content and type of the task.
进一步地,上述示例所示的策略模型中一个任务所对应的纠错策略子集,还可以如图3中的业务和策略模型中右侧的策略模型所示,如图3所示,task表示业务流中的某一任务,Policy表示该任务对应的纠错策略子集,condition与action之间的对应关系可以表示该纠错策略子集中的执行结果与纠错策略之间的对应关系。Further, the subset of the error correction strategy corresponding to one task in the policy model shown in the above example may also be shown in the policy model on the right side in the service and policy model in FIG. 3, as shown in FIG. A certain task in the service flow, the Policy represents a subset of the error correction strategy corresponding to the task, and the correspondence between the condition and the action may represent a correspondence between the execution result of the error correction strategy subset and the error correction strategy.
需要说明的是,上述示例中所示的各纠错策略(action)所对应的具体内容仅仅是一种示例,本领域技术人员可以理解的是,上述纠错策略并不以此为限,具体可以根据任务的类型或者用户的需求进行修正或者设定。It should be noted that the specific content corresponding to each error correction policy shown in the above example is only an example, and those skilled in the art can understand that the above error correction strategy is not limited thereto. It can be modified or set according to the type of task or the needs of the user.
当用户在服务器的错误处理框架中,输入上述定义和编排好的业务模型文件和策略模型文件,就完成了业务模型和策略模型的设置,进而达到了预设纠错策略集合的目的。进一步地,上述错误处理框架在接收到用户输入的业务模型和策略模型之后,可以先对该业务模型和策略模型进行解析,以转化成该错误处理框架可识别的计算机语音,并将该转化后的业务模型和策略模型存入分布式系统中可以持久化存储的存储数据库中,例如:关系型数据库、文件系统数据库等存储数据库,以避免该业务模型和策略模型因服务器掉电而丢失。其中,上述存储数据库可以为集成在服务器上的数据库,也可以为独立于服务器之外的数据库服务器,本申请对此不进行限定。When the user inputs the above defined and arranged business model file and strategy model file in the error handling framework of the server, the setting of the business model and the policy model is completed, and the purpose of the preset error correction strategy set is achieved. Further, after receiving the business model and the policy model input by the user, the error processing framework may first parse the business model and the policy model to be converted into computer speech recognizable by the error processing framework, and convert the converted The business model and the policy model are stored in the distributed system to store the stored storage database, for example, a relational database, a file system database, etc., to prevent the business model and the policy model from being lost due to server power failure. The foregoing storage database may be a database integrated on the server, or may be a database server independent of the server, which is not limited in this application.
第二步:服务器将预设的纠错策略集合导入错误处理框架。Step 2: The server imports the preset error correction policy set into the error handling framework.
具体的,在本实施例中,上述错误处理框架中设置有执行结果协调器,该执行结果协调器可以导入预设的纠错策略集合,以使得执行结果协调器可以根据所获取到的节点执行该业务流的任一任务的执行结果,以及该预设的纠错策略,确定该执行结果对应的纠错策略。这样,当分布式系统上的节点执行该业务流时,服务器通过运行该错误处理框架中的执行结果协调器,可以将存储在存储数据库中的业务模型和策略模型(即预设的纠错策略集合)导入错误处理框架中(即图3所示的模型输入)。Specifically, in this embodiment, the error processing framework is provided with an execution result coordinator, and the execution result coordinator may import a preset error correction policy set, so that the execution result coordinator can execute according to the acquired node. The execution result of any task of the service flow, and the preset error correction policy, determines an error correction strategy corresponding to the execution result. In this way, when the node on the distributed system executes the service flow, the server can store the business model and the policy model stored in the storage database (ie, the preset error correction strategy by running the execution result coordinator in the error processing framework). The collection is imported into the error handling framework (ie the model input shown in Figure 3).
如图4所示,该方法可以包括:As shown in FIG. 4, the method may include:
S201、服务器根据节点执行回调函数返回的结果,获取执行结果。S201. The server obtains an execution result according to a result returned by the node execution callback function.
具体的,在本实施例中,上述业务流中的每个任务的业务代码的末端设置有回调函数,该回调函数可以携带有错误处理框架上的执行结果协调器的通信地址(例如:IP地 址),以使得执行任务的节点可以通过该回调函数,将任务的执行结果发送给服务器上的错误处理框架中的执行结果协调器。这样,当分布式系统中的节点执行业务流中的任务时,服务器就可以通过其上运行的错误处理框架上的执行结果协调器来监控该节点,以获取该节点上当前所执行的任务的执行结果(即图3所示的流程监控功能)。其中,上述回调函数例如可以为图3中示出的钩子函数HOOK中的PostEnd函数。需要说明的是,上述如何在业务流的每个任务中设置回调函数具体可以参见现有技术,本申请对此不再赘述。Specifically, in this embodiment, the end of the service code of each task in the foregoing service flow is provided with a callback function, and the callback function may carry a communication address of the execution result coordinator on the error processing framework (for example: IP address (address), so that the node performing the task can send the execution result of the task to the execution result coordinator in the error processing framework on the server through the callback function. Thus, when a node in a distributed system performs a task in a service flow, the server can monitor the node through an execution result coordinator on the error handling framework running thereon to obtain the currently executed task on the node. Execution result (that is, the process monitoring function shown in Figure 3). The above callback function may be, for example, a PostEnd function in the hook function HOOK shown in FIG. 3. It should be noted that, in the foregoing, how to set the callback function in each task of the service flow can be referred to the prior art, and the details are not described herein again.
S202、服务器根据当前任务的标识、第一映射关系和第二映射关系确定当前任务对应的纠错策略子集。S202. The server determines, according to the identifier of the current task, the first mapping relationship, and the second mapping relationship, a subset of the error correction policies corresponding to the current task.
具体的,当服务器上运行的错误处理框架中的执行结果协调器获取到分布式系统中的节点上的当前任务的执行结果之后,该执行结果协调器就可以根据当前任务的标识、第一映射关系(即业务模型)和第二映射关系(即策略模型)确定当前任务对应的纠错策略子集(即图3所示的策略匹配)。具体实现时,上述执行结果协调器可以先根据执行结果中携带的当前任务的标识,先在业务模型中获取该当前任务的标识对应的纠错策略子集的标识,进而根据该纠错策略子集的标识,在策略模型中获取该纠错策略子集的标识对应的纠错策略子集,从而可以根据执行结果,在纠错策略子集中查找与该执行结果对应的纠错策略。Specifically, after the execution result coordinator in the error processing framework running on the server obtains the execution result of the current task on the node in the distributed system, the execution result coordinator can identify the first mapping according to the current task. The relationship (ie, the business model) and the second mapping relationship (ie, the policy model) determine a subset of the error correction strategies corresponding to the current task (ie, the policy match shown in FIG. 3). In a specific implementation, the execution result coordinator may first obtain the identifier of the error correction policy subset corresponding to the identifier of the current task in the service model according to the identifier of the current task carried in the execution result, and further, according to the error correction strategy. The identifier of the set obtains a subset of the error correction strategy corresponding to the identifier of the subset of the error correction strategy in the policy model, so that the error correction policy corresponding to the execution result may be searched in the error correction policy subset according to the execution result.
继续参照上述示例,假定上述执行结果协调器所接收到的执行结果为1,执行结果携带的任务标识为task1.1,则执行结果协调器根据该任务标识,可以先在业务模型中查找与该任务标识对应的纠错策略子集的标识,即policy-1.1,进而根据该纠错策略子集的标识,在策略模型中查找与该纠错策略子集的标识对应的纠错策略子集。With continued reference to the above example, it is assumed that the execution result received by the execution result coordinator is 1 and the task identifier carried by the execution result is task1.1, and the execution result coordinator can first search for the service model according to the task identifier. The identifier of the error correction policy subset corresponding to the task identifier, that is, policy-1.1, and then, according to the identifier of the error correction policy subset, find a subset of the error correction strategy corresponding to the identifier of the error correction policy subset in the policy model.
S203、服务器根据执行结果和纠错策略子集中的第三映射关系,确定第一纠错策略。S203. The server determines the first error correction policy according to the execution result and the third mapping relationship in the subset of error correction policies.
具体的,上述服务器上运行的错误处理框架中的执行结果协调器在确定当前任务对应的纠错策略子集之后,就可以根据执行结果,在当前任务对应的纠错策略子集中查找与该执行结果对应的第一纠错策略(即图3所示的策略匹配)。Specifically, after the execution result coordinator in the error processing framework running on the server determines the subset of the error correction policy corresponding to the current task, the execution result may be found and executed in the error correction policy subset corresponding to the current task according to the execution result. The result corresponds to the first error correction strategy (ie, the strategy match shown in Figure 3).
继续参照上述S202的示例,则根据上述执行结果1,在policy-1.1中查找到的与该执行结果对应的第一纠错策略即为重试。可选的,若假定上述执行结果为0,则该执行结果对应的第一纠错策略为忽略并继续,说明该执行结果为正确的执行结果。Continuing to refer to the example of S202, the first error correction policy corresponding to the execution result found in policy-1.1 is a retry according to the execution result 1 described above. Optionally, if the execution result is 0, the first error correction policy corresponding to the execution result is ignored and continues, indicating that the execution result is a correct execution result.
S204、服务器将第一纠错策略发送给节点。S204. The server sends the first error correction policy to the node.
具体的,上述服务器上运行的错误处理框架中的执行结果协调器在获取到第一纠错策略之后,就可以将该第一纠错策略通过该任务中的回调函数发送给节点,以使得节点可以执行该第一纠错策略所对应的动作(即图3所示的动作执行)。Specifically, after obtaining the first error correction policy, the execution result coordinator in the error processing framework running on the server may send the first error correction policy to the node through the callback function in the task, so that the node The action corresponding to the first error correction strategy (ie, the action execution shown in FIG. 3) may be performed.
可选的,若上述节点为处理业务流的最后一个任务的节点,则在上述S204之后,该方法还可以包括:服务器生成分布式任务的执行报告(即图3所示的执行报告),以使得该业务流的维护人员通过查阅该执行报告,可以获知该业务流在执行过程中所出现的错误,提高了用户体验。Optionally, if the node is a node that processes the last task of the service flow, after the foregoing S204, the method may further include: the server generating an execution report of the distributed task (that is, the execution report shown in FIG. 3), By checking the execution report, the maintenance personnel of the service flow can know the errors that occur in the execution of the service flow and improve the user experience.
本申请提供的分布式任务处理方法,服务器通过其上运行的错误处理框架,可以在分布式系统的节点执行同一业务流的不同任务时,获取每个节点上的当前任务的执行结 果,并可以根据该执行结果,在预设的纠错策略集合中获取该执行结果所对应的第一纠错策略,从而通过将该第一纠错策略发送给节点的方式,可以使节点自动处理该任务在运行过程中出现错误。通过这种方式,可以使分布式任务的每个任务的业务代码中不再耦合有纠错代码,因此,开发人员后续需要为分布式任务中的某一任务新增纠错代码段时,仅在服务器的错误处理框架中更新该纠错策略集合即可,不需要再重新编译该分布式任务,提高了分布式任务的可靠性。The distributed task processing method provided by the application, by using the error processing framework running on the server, can obtain the execution result of the current task on each node when the nodes of the distributed system perform different tasks of the same service flow. And obtaining, according to the execution result, a first error correction policy corresponding to the execution result in a preset error correction policy set, so that the node can be automatically sent by sending the first error correction policy to the node. Handling the task encountered an error during the run. In this way, error correction code can no longer be coupled to the business code of each task of the distributed task. Therefore, when the developer needs to add an error correction code segment to a task in the distributed task, only Updating the error correction policy set in the error handling framework of the server does not require recompiling the distributed task, thereby improving the reliability of the distributed task.
图5为本申请提供的一种分布式任务处理装置的结构示意图,该分布式任务处理装置可以通过软件、硬件或者两者的结合实现成为服务器的部分或者全部。如图5所示,上述分布式任务处理装置可以包括:第一获取模块11、第二获取模块12和发送模块13;其中,FIG. 5 is a schematic structural diagram of a distributed task processing apparatus provided by the present application. The distributed task processing apparatus may be implemented as part or all of a server by software, hardware, or a combination of the two. As shown in FIG. 5, the distributed task processing apparatus may include: a first obtaining module 11, a second acquiring module 12, and a sending module 13;
第一获取模块11,用于获取分布式系统中节点上的当前任务的执行结果;分布式系统包括多个节点,每个节点上包括同一业务流的至少一个任务;执行结果中携带当前任务的标识;The first obtaining module 11 is configured to obtain an execution result of a current task on a node in the distributed system; the distributed system includes multiple nodes, each node includes at least one task of the same service flow; and the execution result carries the current task Identification
第二获取模块12,用于根据当前任务的标识和执行结果,从预设的纠错策略集合中获取当前任务对应的第一纠错策略;纠错策略集合包括当前任务的标识、执行结果与第一纠错策略之间的对应关系;第一纠错策略用于指示节点下一步所执行的动作;The second obtaining module 12 is configured to obtain, according to the identifier and the execution result of the current task, the first error correction policy corresponding to the current task from the preset error correction policy set; the error correction policy set includes the identifier of the current task, the execution result, and Corresponding relationship between the first error correction strategies; the first error correction strategy is used to indicate the actions performed by the node in the next step;
发送模块13,用于将第二获取模块12获取的第一纠错策略发送给节点。The sending module 13 is configured to send the first error correction policy acquired by the second obtaining module 12 to the node.
本申请提供的分布式任务处理装置,可以执行上述方法实施例,其实现原理和技术效果类似,在此不再赘述。The distributed task processing apparatus provided by the present application may perform the foregoing method embodiments, and the implementation principles and technical effects thereof are similar, and details are not described herein again.
可选的,在本申请的一种实现方式中,上述所说的节点下一步所执行的动作可以包括:回滚动作、忽略并继续动作、暂停、重试中的任一个。Optionally, in an implementation manner of the application, the action performed by the node in the foregoing may include: scrolling back, ignoring, and continuing, stopping, and retrying.
可选的,在本申请的一种实现方式中,上述纠错策略集合具体可以包括:当前任务的标识与第一纠错策略所属的纠错策略子集的标识之间的第一映射关系,以及,纠错策略子集的标识与纠错策略子集之间的第二映射关系;纠错策略子集中包括执行结果与第一纠错策略的之间的第三映射关系。Optionally, in an implementation manner of the application, the foregoing error correction policy set may include: a first mapping relationship between an identifier of the current task and an identifier of the error correction policy subset to which the first error correction policy belongs, And a second mapping relationship between the identifier of the error correction strategy subset and the error correction strategy subset; the error correction strategy subset includes a third mapping relationship between the execution result and the first error correction strategy.
则在该实现方式下,图6为本申请提供的另一种分布式任务处理装置的结构示意图,如图6所示,在上述图5所示的实施例的基础上,上述第二获取模块12可以包括:第一确定单元121和第二确定单元122;其中,In this implementation manner, FIG. 6 is a schematic structural diagram of another distributed task processing apparatus provided by the present application. As shown in FIG. 6, on the basis of the foregoing embodiment shown in FIG. 5, the second acquiring module is used. 12 may include: a first determining unit 121 and a second determining unit 122; wherein
第一确定单元121,用于根据当前任务的标识、第一映射关系和第二映射关系确定当前任务对应的纠错策略子集;The first determining unit 121 is configured to determine, according to the identifier of the current task, the first mapping relationship, and the second mapping relationship, a subset of the error correction policies corresponding to the current task;
第二确定单元122,用于根据执行结果和第一确定单元121确定的纠错策略子集中的第三映射关系,确定第一纠错策略。The second determining unit 122 is configured to determine the first error correction policy according to the execution result and the third mapping relationship in the subset of error correction policies determined by the first determining unit 121.
则该实现方式下,上述第一获取模块11,具体可以用于根据节点执行回调函数返回的结果,获取执行结果。In this implementation manner, the foregoing first obtaining module 11 may be specifically configured to obtain an execution result according to a result returned by the node execution callback function.
本申请提供的分布式任务处理装置,可以执行上述方法实施例,其实现原理和技术效果类似,在此不再赘述。The distributed task processing apparatus provided by the present application may perform the foregoing method embodiments, and the implementation principles and technical effects thereof are similar, and details are not described herein again.
图7为本申请提供的又一种分布式任务处理装置的结构示意图,如图7所示,在上述图5所示的实施例的基础上,该分布式任务处理装置还可以包括:FIG. 7 is a schematic structural diagram of still another distributed task processing apparatus according to the present application. As shown in FIG. 7, the distributed task processing apparatus may further include:
生成模块14,用于在发送模块13将第一纠错策略发送给节点之后,若节点为处理业 务流的最后一个节点,则生成分布式任务的执行报告。The generating module 14 is configured to: after the sending module 13 sends the first error correction policy to the node, if the node is a processing industry The last node of the transaction flow generates an execution report of the distributed task.
本申请提供的分布式任务处理装置,可以执行上述方法实施例,其实现原理和技术效果类似,在此不再赘述。The distributed task processing apparatus provided by the present application may perform the foregoing method embodiments, and the implementation principles and technical effects thereof are similar, and details are not described herein again.
图8为本申请提供的又一种分布式任务处理装置的结构示意图,如图8所示,上述分布式任务处理装置可以包括:处理器21和发送器22;其中,FIG. 8 is a schematic structural diagram of still another distributed task processing apparatus provided by the present application. As shown in FIG. 8, the distributed task processing apparatus may include: a processor 21 and a transmitter 22;
处理器21,用于获取分布式系统中节点上的当前任务的执行结果,并根据当前任务的标识和执行结果,从预设的纠错策略集合中获取当前任务对应的第一纠错策略;分布式系统包括多个节点,每个节点上包括同一业务流的至少一个任务;执行结果中携带当前任务的标识;纠错策略集合包括当前任务的标识、执行结果与第一纠错策略之间的对应关系;第一纠错策略用于指示节点下一步所执行的动作;The processor 21 is configured to obtain an execution result of the current task on the node in the distributed system, and obtain a first error correction policy corresponding to the current task from the preset error correction policy set according to the identifier and the execution result of the current task. The distributed system includes multiple nodes, each node including at least one task of the same service flow; the execution result carries the identifier of the current task; the error correction policy set includes the identifier of the current task, the execution result and the first error correction policy Correspondence relationship; the first error correction strategy is used to indicate the action performed by the node in the next step;
发送器22,用于将处理器21获取的第一纠错策略发送给节点。The transmitter 22 is configured to send the first error correction policy acquired by the processor 21 to the node.
本申请提供的分布式任务处理装置,可以执行上述方法实施例,其实现原理和技术效果类似,在此不再赘述。The distributed task processing apparatus provided by the present application may perform the foregoing method embodiments, and the implementation principles and technical effects thereof are similar, and details are not described herein again.
可选的,在本申请的一种实现方式中,上述所说的节点下一步所执行的动作可以包括:回滚动作、忽略并继续动作、暂停、重试中的任一个。Optionally, in an implementation manner of the application, the action performed by the node in the foregoing may include: scrolling back, ignoring, and continuing, stopping, and retrying.
可选的,在本申请的一种实现方式中,上述纠错策略集合具体可以包括:当前任务的标识与第一纠错策略所属的纠错策略子集的标识之间的第一映射关系,以及,纠错策略子集的标识与纠错策略子集之间的第二映射关系;纠错策略子集中包括执行结果与第一纠错策略的之间的第三映射关系。Optionally, in an implementation manner of the application, the foregoing error correction policy set may include: a first mapping relationship between an identifier of the current task and an identifier of the error correction policy subset to which the first error correction policy belongs, And a second mapping relationship between the identifier of the error correction strategy subset and the error correction strategy subset; the error correction strategy subset includes a third mapping relationship between the execution result and the first error correction strategy.
则该实现方式下,上述处理器21,用于获取分布式系统中节点上的当前任务的执行结果,具体可以为:上述处理器21,根据节点执行回调函数返回的结果,获取执行结果。In this implementation, the processor 21 is configured to obtain an execution result of the current task on the node in the distributed system. Specifically, the processor 21 obtains an execution result according to the result returned by the node execution callback function.
则该实现方式下,上述处理器21,用于根据当前任务的标识和执行结果,从预设的纠错策略集合中获取当前任务对应的第一纠错策略,具体可以为:上述处理器21根据当前任务的标识、第一映射关系和第二映射关系确定当前任务对应的纠错策略子集,并根据执行结果和纠错策略子集中的第三映射关系,确定第一纠错策略。In this implementation, the processor 21 is configured to obtain, according to the identifier and the execution result of the current task, the first error correction policy corresponding to the current task from the preset error correction policy set, where the processor 21 may be: Determining, according to the identifier of the current task, the first mapping relationship and the second mapping relationship, the subset of the error correction policy corresponding to the current task, and determining the first error correction policy according to the execution result and the third mapping relationship in the subset of the error correction policy.
进一步地,在上述实施例的基础上,上述处理器21,还可以用于在发送器22将第一纠错策略发送给节点之后,若节点为处理业务流的最后一个节点,则生成分布式任务的执行报告。Further, on the basis of the foregoing embodiment, the processor 21 may be further configured to: after the transmitter 22 sends the first error correction policy to the node, if the node is the last node of the processing service flow, generate the distributed The execution report of the task.
本申请提供的分布式任务处理装置,可以执行上述方法实施例,其实现原理和技术效果类似,在此不再赘述。 The distributed task processing apparatus provided by the present application may perform the foregoing method embodiments, and the implementation principles and technical effects thereof are similar, and details are not described herein again.

Claims (18)

  1. 一种分布式任务处理方法,其特征在于,所述方法包括:A distributed task processing method, characterized in that the method comprises:
    服务器获取分布式系统中节点上的当前任务的执行结果;所述分布式系统包括多个节点,每个节点上包括同一业务流的至少一个任务;所述执行结果中携带所述当前任务的标识;The server obtains an execution result of the current task on the node in the distributed system; the distributed system includes multiple nodes, each node includes at least one task of the same service flow; and the execution result carries the identifier of the current task ;
    所述服务器根据所述当前任务的标识和所述执行结果,从预设的纠错策略集合中获取所述当前任务对应的第一纠错策略;所述纠错策略集合包括所述当前任务的标识、所述执行结果与所述第一纠错策略之间的对应关系;所述第一纠错策略用于指示所述节点下一步所执行的动作;Obtaining, by the server according to the identifier of the current task and the execution result, a first error correction policy corresponding to the current task from a preset error correction policy set; the error correction policy set includes the current task And a corresponding relationship between the execution result and the first error correction policy; the first error correction policy is used to indicate an action performed by the node in the next step;
    所述服务器将所述第一纠错策略发送给所述节点。The server sends the first error correction policy to the node.
  2. 根据权利要求1所述的方法,其特征在于,所述纠错策略集合具体包括:The method according to claim 1, wherein the error correction policy set specifically comprises:
    所述当前任务的标识与所述第一纠错策略所属的纠错策略子集的标识之间的第一映射关系,以及,所述纠错策略子集的标识与所述纠错策略子集之间的第二映射关系;所述纠错策略子集中包括所述执行结果与所述第一纠错策略的之间的第三映射关系。a first mapping relationship between an identifier of the current task and an identifier of the error correction policy subset to which the first error correction policy belongs, and an identifier of the error correction policy subset and the error correction strategy subset a second mapping relationship between the error correction policy subsets includes a third mapping relationship between the execution result and the first error correction policy.
  3. 根据权利要求2所述的方法,其特征在于,所述服务器根据所述当前任务的标识和所述执行结果,从预设的纠错策略集合中获取所述当前任务对应的第一纠错策略,具体包括:The method according to claim 2, wherein the server acquires a first error correction strategy corresponding to the current task from a preset error correction policy set according to the identifier of the current task and the execution result. Specifically, including:
    所述服务器根据所述当前任务的标识、所述第一映射关系和所述第二映射关系确定所述当前任务对应的纠错策略子集;Determining, by the server, a subset of error correction strategies corresponding to the current task according to the identifier of the current task, the first mapping relationship, and the second mapping relationship;
    所述服务器根据所述执行结果和所述纠错策略子集中的第三映射关系,确定所述第一纠错策略。The server determines the first error correction policy according to the execution result and a third mapping relationship in the subset of error correction policies.
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述节点下一步所执行的动作包括:回滚动作、忽略并继续动作、暂停、重试中的任一个。The method according to any one of claims 1-3, wherein the action performed by the node in the next step comprises: scrolling back, ignoring and continuing the action, pause, and retry.
  5. 根据权利要求4所述的方法,其特征在于,所述服务器获取分布式系统中节点上的当前任务的执行结果,具体包括:The method according to claim 4, wherein the server obtains an execution result of the current task on the node in the distributed system, and specifically includes:
    所述服务器根据所述节点执行回调函数返回的结果,获取所述执行结果。The server acquires the execution result according to a result returned by the node executing a callback function.
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述服务器将所述第一纠错策略发送给所述节点之后,所述方法还包括:The method according to any one of claims 1-5, wherein after the server sends the first error correction policy to the node, the method further includes:
    若所述节点为处理所述业务流的最后一个节点,则所述服务器生成分布式任务的执行报告。If the node is the last node that processes the traffic flow, the server generates an execution report of the distributed task.
  7. 一种分布式任务处理装置,其特征在于,所述装置包括:A distributed task processing apparatus, characterized in that the apparatus comprises:
    第一获取模块,用于获取分布式系统中节点上的当前任务的执行结果;所述分布式系统包括多个节点,每个节点上包括同一业务流的至少一个任务;所述执行结果中携带所述当前任务的标识;a first obtaining module, configured to obtain an execution result of a current task on a node in the distributed system; the distributed system includes multiple nodes, each node including at least one task of the same service flow; and the execution result carries The identifier of the current task;
    第二获取模块,用于根据所述当前任务的标识和所述执行结果,从预设的纠错策略集合中获取所述当前任务对应的第一纠错策略;所述纠错策略集合包括所述当前任务的标识、所述执行结果与所述第一纠错策略之间的对应关系;所述第一纠错策略用于指示所述节点下一步所执行的动作; a second acquiring module, configured to acquire, according to the identifier of the current task and the execution result, a first error correction policy corresponding to the current task from a preset error correction policy set; the error correction policy set includes Determining, by the identifier of the current task, a correspondence between the execution result and the first error correction policy; the first error correction policy is used to indicate an action performed by the node in the next step;
    发送模块,用于将所述第二获取模块获取的所述第一纠错策略发送给所述节点。And a sending module, configured to send the first error correction policy acquired by the second acquiring module to the node.
  8. 根据权利要求7所述的装置,其特征在于,所述纠错策略集合具体包括:The apparatus according to claim 7, wherein the error correction policy set specifically includes:
    所述当前任务的标识与所述第一纠错策略所属的纠错策略子集的标识之间的第一映射关系,以及,所述纠错策略子集的标识与所述纠错策略子集之间的第二映射关系;所述纠错策略子集中包括所述执行结果与所述第一纠错策略的之间的第三映射关系。a first mapping relationship between an identifier of the current task and an identifier of the error correction policy subset to which the first error correction policy belongs, and an identifier of the error correction policy subset and the error correction strategy subset a second mapping relationship between the error correction policy subsets includes a third mapping relationship between the execution result and the first error correction policy.
  9. 根据权利要求8所述的装置,其特征在于,所述第二获取模块包括:The device according to claim 8, wherein the second obtaining module comprises:
    第一确定单元,用于根据所述当前任务的标识、所述第一映射关系和所述第二映射关系确定所述当前任务对应的纠错策略子集;a first determining unit, configured to determine, according to the identifier of the current task, the first mapping relationship, and the second mapping relationship, a subset of error correction policies corresponding to the current task;
    第二确定单元,用于根据所述执行结果和所述第一确定单元确定的所述纠错策略子集中的第三映射关系,确定所述第一纠错策略。a second determining unit, configured to determine the first error correction policy according to the execution result and a third mapping relationship in the subset of error correction policies determined by the first determining unit.
  10. 根据权利要求7-9任一项所述的装置,其特征在于,所述节点下一步所执行的动作包括:回滚动作、忽略并继续动作、暂停、重试中的任一个。The apparatus according to any one of claims 7-9, wherein the action performed by the node in the next step comprises: scrolling back, ignoring and continuing the action, pause, and retry.
  11. 根据权利要求10所述的装置,其特征在于,所述第一获取模块,具体用于根据所述节点执行回调函数返回的结果,获取所述执行结果。The apparatus according to claim 10, wherein the first obtaining module is specifically configured to acquire the execution result according to a result returned by the node executing a callback function.
  12. 根据权利要求7-11任一项所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 7 to 11, wherein the device further comprises:
    生成模块,用于在所述发送模块将所述第一纠错策略发送给所述节点之后,若所述节点为处理所述业务流的最后一个节点,则生成分布式任务的执行报告。And a generating module, configured to: after the sending module sends the first error correction policy to the node, if the node is the last node that processes the service flow, generate an execution report of the distributed task.
  13. 一种分布式任务处理装置,其特征在于,所述装置包括:A distributed task processing apparatus, characterized in that the apparatus comprises:
    处理器,用于获取分布式系统中节点上的当前任务的执行结果,并根据所述当前任务的标识和所述执行结果,从预设的纠错策略集合中获取所述当前任务对应的第一纠错策略;所述分布式系统包括多个节点,每个节点上包括同一业务流的至少一个任务;所述执行结果中携带所述当前任务的标识;所述纠错策略集合包括所述当前任务的标识、所述执行结果与所述第一纠错策略之间的对应关系;所述第一纠错策略用于指示所述节点下一步所执行的动作;And a processor, configured to obtain an execution result of the current task on the node in the distributed system, and obtain, according to the identifier of the current task and the execution result, the first task corresponding to the current task from the preset error correction policy set An error correction strategy; the distributed system includes a plurality of nodes, each node including at least one task of the same service flow; the execution result carries an identifier of the current task; and the error correction policy set includes the Corresponding relationship between the identifier of the current task, the execution result, and the first error correction policy; the first error correction policy is used to indicate an action performed by the node in the next step;
    发送器,用于将所述处理器获取的所述第一纠错策略发送给所述节点。And a transmitter, configured to send the first error correction policy acquired by the processor to the node.
  14. 根据权利要求13所述的装置,其特征在于,所述纠错策略集合具体包括:The device according to claim 13, wherein the error correction policy set specifically includes:
    所述当前任务的标识与所述第一纠错策略所属的纠错策略子集的标识之间的第一映射关系,以及,所述纠错策略子集的标识与所述纠错策略子集之间的第二映射关系;所述纠错策略子集中包括所述执行结果与所述第一纠错策略的之间的第三映射关系。a first mapping relationship between an identifier of the current task and an identifier of the error correction policy subset to which the first error correction policy belongs, and an identifier of the error correction policy subset and the error correction strategy subset a second mapping relationship between the error correction policy subsets includes a third mapping relationship between the execution result and the first error correction policy.
  15. 根据权利要求14所述的装置,其特征在于,所述处理器,用于根据所述当前任务的标识和所述执行结果,从预设的纠错策略集合中获取所述当前任务对应的第一纠错策略,具体为:The device according to claim 14, wherein the processor is configured to acquire, according to the identifier of the current task and the execution result, a first corresponding to the current task from a preset error correction policy set. A corrective strategy, specifically:
    所述处理器,具体用于根据所述当前任务的标识、所述第一映射关系和所述第二映射关系确定所述当前任务对应的纠错策略子集,并根据所述执行结果和所述纠错策略子集中的第三映射关系,确定所述第一纠错策略。The processor is configured to determine, according to the identifier of the current task, the first mapping relationship, and the second mapping relationship, a subset of error correction policies corresponding to the current task, and according to the execution result and the location Determining the first error correction strategy by describing a third mapping relationship in the subset of error correction strategies.
  16. 根据权利要求13-15任一项所述的装置,其特征在于,所述节点下一步所执行的动作包括:回滚动作、忽略并继续动作、暂停、重试中的任一个。The apparatus according to any one of claims 13-15, wherein the action performed by the node in the next step comprises: scrolling back, ignoring and continuing the action, pausing, retrying.
  17. 根据权利要求16所述的装置,其特征在于,所述处理器,用于获取分布式系统中节点上的当前任务的执行结果,具体为: The device according to claim 16, wherein the processor is configured to obtain an execution result of a current task on a node in a distributed system, specifically:
    所述处理器,具体用于根据所述节点执行回调函数返回的结果,获取所述执行结果。The processor is specifically configured to obtain the execution result according to a result returned by the node executing a callback function.
  18. 根据权利要求13-17任一项所述的装置,其特征在于,所述处理器,还用于在所述发送器将所述第一纠错策略发送给所述节点之后,若所述节点为处理所述业务流的最后一个节点,则生成分布式任务的执行报告。 The device according to any one of claims 13-17, wherein the processor is further configured to: after the transmitter sends the first error correction policy to the node, if the node To process the last node of the traffic flow, an execution report of the distributed task is generated.
PCT/CN2017/079230 2016-06-29 2017-04-01 Distributed task processing method and apparatus WO2018000878A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610495587.7A CN107547608A (en) 2016-06-29 2016-06-29 Distributed task scheduling treating method and apparatus
CN201610495587.7 2016-06-29

Publications (1)

Publication Number Publication Date
WO2018000878A1 true WO2018000878A1 (en) 2018-01-04

Family

ID=60786463

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/079230 WO2018000878A1 (en) 2016-06-29 2017-04-01 Distributed task processing method and apparatus

Country Status (2)

Country Link
CN (1) CN107547608A (en)
WO (1) WO2018000878A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347492A (en) * 2019-07-15 2019-10-18 深圳前海乘势科技有限公司 Distributed task dispatching method and apparatus based on time parameter method
CN113467782A (en) * 2021-07-02 2021-10-01 建信金融科技有限责任公司 Method, device and equipment for determining business process

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109388405A (en) * 2018-10-25 2019-02-26 北京大米未来科技有限公司 A kind of task processing method, device, electronic equipment and medium
CN111737751B (en) * 2020-07-17 2020-11-17 支付宝(杭州)信息技术有限公司 Method and device for realizing distributed data processing of privacy protection
CN112667623A (en) * 2021-01-13 2021-04-16 张立旭 Random algorithm-based distributed storage data error correction method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101511098A (en) * 2009-02-10 2009-08-19 中兴通讯股份有限公司 Distributed net element task management system and method
US20100146516A1 (en) * 2007-01-30 2010-06-10 Alibaba Group Holding Limited Distributed Task System and Distributed Task Management Method
CN103401712A (en) * 2013-07-31 2013-11-20 北京华易互动科技有限公司 Content distribution based intelligent high-availability task processing method and system
CN105337765A (en) * 2015-10-10 2016-02-17 上海新炬网络信息技术有限公司 Distributed hadoop cluster fault automatic diagnosis and restoration system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100146516A1 (en) * 2007-01-30 2010-06-10 Alibaba Group Holding Limited Distributed Task System and Distributed Task Management Method
CN101511098A (en) * 2009-02-10 2009-08-19 中兴通讯股份有限公司 Distributed net element task management system and method
CN103401712A (en) * 2013-07-31 2013-11-20 北京华易互动科技有限公司 Content distribution based intelligent high-availability task processing method and system
CN105337765A (en) * 2015-10-10 2016-02-17 上海新炬网络信息技术有限公司 Distributed hadoop cluster fault automatic diagnosis and restoration system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347492A (en) * 2019-07-15 2019-10-18 深圳前海乘势科技有限公司 Distributed task dispatching method and apparatus based on time parameter method
CN113467782A (en) * 2021-07-02 2021-10-01 建信金融科技有限责任公司 Method, device and equipment for determining business process
CN113467782B (en) * 2021-07-02 2022-12-09 中国建设银行股份有限公司 Method, device and equipment for determining business process

Also Published As

Publication number Publication date
CN107547608A (en) 2018-01-05

Similar Documents

Publication Publication Date Title
WO2018000878A1 (en) Distributed task processing method and apparatus
US11023270B2 (en) Configuration of decoupled upgrades for container-orchestration system-based services
US11163731B1 (en) Autobuild log anomaly detection methods and systems
US9811543B2 (en) Systems and methods for generating schemas that represent multiple data sources
US20130304713A1 (en) System and method for metadata level validation of custom setup objects
CN108566290A (en) service configuration management method, system, storage medium and server
US20150347261A1 (en) Performance checking component for an etl job
Essa et al. Mobile agent based new framework for improving big data analysis
US20120005646A1 (en) Method and system for performing deployment management
US11061669B2 (en) Software development tool integration and monitoring
US10135913B2 (en) Impact analysis system and method
US20190387063A1 (en) Dynamic generation of network routing configuration with service requirements
CN106919485A (en) A kind of system based on configuration hardware testing instrument on server
US10951509B1 (en) Methods, systems, and computer readable media for providing intent-driven microapps for execution on communications network testing devices
US11531539B2 (en) Automated compliance and testing framework for software development
CN108563455A (en) Middleware portion arranging method, system and equipment in a kind of K-UX operating systems
CN113505520A (en) Method, device and system for supporting heterogeneous federated learning
WO2023197453A1 (en) Fault diagnosis method and apparatus, device, and storage medium
US10719375B2 (en) Systems and method for event parsing
WO2020133609A1 (en) Server abnormality processing method and management device
KR20170122874A (en) Apparatus for managing log of application based on data distribution service
CN112367205B (en) Processing method and scheduling system for HTTP scheduling request
US11797418B1 (en) Automatic creation of trace spans from log data
US11748242B2 (en) Proactive monitoring of a software application framework
US20230113408A1 (en) Training and inference management for composite machine learning scenarios

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17818877

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17818877

Country of ref document: EP

Kind code of ref document: A1