WO2024109003A1 - Procédé et appareil de traitement de tâche, et nœud - Google Patents

Procédé et appareil de traitement de tâche, et nœud Download PDF

Info

Publication number
WO2024109003A1
WO2024109003A1 PCT/CN2023/101285 CN2023101285W WO2024109003A1 WO 2024109003 A1 WO2024109003 A1 WO 2024109003A1 CN 2023101285 W CN2023101285 W CN 2023101285W WO 2024109003 A1 WO2024109003 A1 WO 2024109003A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
node
database
target
target task
Prior art date
Application number
PCT/CN2023/101285
Other languages
English (en)
Chinese (zh)
Inventor
朱娜
罗光
姚博
田应军
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2024109003A1 publication Critical patent/WO2024109003A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt

Definitions

  • the present application relates to the field of information technology (IT) technology, and in particular to a task processing method, device and node.
  • IT information technology
  • clusters can be divided into active-standby clusters and active-active clusters.
  • active-standby cluster only the primary node works, and the standby node does not work.
  • active-active cluster each node works, and a load balancer needs to be deployed at the front end to achieve load sharing of the cluster.
  • active-active clusters have higher performance and are more commonly used.
  • the present application provides a task processing method, device, node, computer storage medium and computer product, which can reprocess the task requested by the user when the task processing fails, thereby ensuring that the result returned by the node is consistent with the actual result, thereby improving the user experience.
  • the present application provides a task processing method that can be applied to a first node.
  • the method may include: obtaining an application programming interface API request issued by a user; creating a target task according to the API request, and storing the task information of the target task in a database; executing the target task and obtaining the execution status of the target task; when the execution status of the target task is failed, re-executing the target task according to the task information of the target task stored in the database.
  • the node after receiving an API request, the node creates a task for the request and persists it in the database.
  • the task processing corresponding to the request fails, the task corresponding to the request can be retrieved from the database and re-executed. Therefore, by adding retry self-healing data (i.e., the task created for the API request) in the database, the node can retry self-healing when the task execution fails, ensuring the correctness of the result returned by the node.
  • the method further includes: storing the execution status of the target task in a database, thereby refreshing the status of the task in the database.
  • obtaining the execution status of the target task may include: obtaining the execution status of the target task from the database.
  • the method may further include: obtaining heartbeat information of the second node from a database, the heartbeat information being used to indicate whether the second node fails; when the second node fails, obtaining task information of at least one task created by the second node and not executed or failed to execute from the database; executing at least one task according to the task information of the at least one task, and storing the execution status of each task in the at least one task in the database. In this way, the situation of task processing failure caused by node failure is avoided, and the reliability of task processing is ensured.
  • the method may further include: when the number of times the target task is executed reaches a preset number and the last execution status is failure, outputting an alarm message, so that the user can be informed of the resource processing failure and perform manual repair.
  • the API request is used to request the creation of a target resource and the execution of a target task, including: creating a target resource.
  • the method further includes: returning the execution status of the target task to the user, so that the user can know the execution status of the task.
  • the present application provides a task processing device.
  • the device can be deployed on a first node.
  • the device includes: a request acquisition module, used to acquire an application programming interface API request issued by a user; a task creation module, used to create a target task according to the API request and store the task information of the target task in a database; a task processing module, used to execute the target task and acquire the execution status of the target task; and, when the execution status of the target task is failed, re-execute the target task according to the task information of the target task stored in the database.
  • the task processing module after executing the target task, is further used to: store the execution status of the target task in a database; wherein, when obtaining the execution status of the target task, the task processing module is specifically used to: obtain the execution status of the target task from the database; state.
  • the task processing module is further used to: obtain heartbeat information of the second node from a database, where the heartbeat information is used to indicate whether the second node fails; when the second node fails, obtain task information of at least one task created by the second node and not executed or failed to execute from the database;
  • At least one task is executed according to the task information of at least one task, and the execution status of each task in the at least one task is stored in a database.
  • the task processing module is further used to: output an alarm message when the number of times the target task is executed reaches a preset number and the last execution status is failure.
  • the API request is used to request the creation of a target resource
  • the task processing module when executing the target task, is specifically used to: create the target resource
  • the task processing module is further configured to return the execution status of the target task to the user.
  • the present application provides a node, comprising: at least one memory for storing programs; and at least one processor for executing the programs stored in the memory; wherein, when the program stored in the memory is executed, the processor is used to execute the method described in the first aspect or any possible implementation of the first aspect.
  • the present application provides a computing device, comprising: at least one memory for storing a program; at least one processor for executing the program stored in the memory; wherein, when the program stored in the memory is executed, the processor is used to execute the method described in the first aspect or any possible implementation of the first aspect.
  • the computing device may be a node in a device cluster, such as node 1 shown in FIG. 2 .
  • the present application provides a computing device cluster, comprising at least one computing device, each computing device comprising a processor and a memory; the processor of at least one computing device is used to execute instructions stored in the memory of at least one computing device, so that the computing device cluster executes the method described in the first aspect or any possible implementation of the first aspect.
  • the present application provides a computer-readable storage medium, which stores a computer program.
  • the computer program runs on a processor, the processor executes the method described in the first aspect or any possible implementation of the first aspect.
  • the present application provides a computer program product.
  • the processor executes the method described in the first aspect or any possible implementation of the first aspect.
  • FIG1 is a schematic diagram of a working process of a node in a master-master cluster provided in an embodiment of the present application
  • FIG2 is a schematic diagram of the system architecture of a task processing system of a master-master cluster provided in an embodiment of the present application
  • FIG3 is a schematic diagram of resources included in an API request provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of task processing when a node fails according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a change in resource state when a node processes resources provided by an embodiment of the present application
  • FIG6 is a flowchart of a task processing method provided in an embodiment of the present application.
  • FIG7 is a schematic diagram of the structure of a task processing device provided in an embodiment of the present application.
  • FIG8 is a schematic diagram of the structure of a computing device provided in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of the structure of a computing device cluster provided in an embodiment of the present application.
  • a and/or B in this article is a description of the association relationship of associated objects, indicating that there can be three relationships.
  • a and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone.
  • the symbol "/" in this article indicates that the associated objects are in an or relationship, for example, A/B means A or B.
  • first and second in the specification and claims herein are used to distinguish different objects rather than to describe a specific order of the objects.
  • a first response message and a second response message are used to distinguish different response messages rather than to describe a specific order of the response messages.
  • multiple means two or more than two.
  • multiple processing units refer to two or more processing units, etc.; multiple elements refer to two or more elements, etc.
  • FIG1 shows a schematic diagram of the working process of a node in a master-master cluster.
  • an asynchronous processing method is often used inside node 1.
  • a part of threads in the thread pool of node 1 can obtain one or more application programming interface (API) requests, and store the resources of each API request in the database layer (DB layer).
  • API application programming interface
  • node 1 can read the storage result from the database through the thread that processes the API request.
  • the thread returns the request result representing the successful processing to the user.
  • Another part of the threads in the thread pool can be used for internal processing resources, such as calling APIs of other services.
  • the resource handler in node 1 can call the services provided by other nodes through a thread.
  • node 1 can send the results returned by other nodes to the user.
  • the services provided by other nodes can also be called "downstream services", that is, services located downstream of the services provided by node 1, or services that node 1 needs to call.
  • node 1 provides service 1
  • node 2 provides service 2. If service 1 needs to call service 2 during operation, service 2 can be called the downstream service of service 1.
  • node 1 when the resource processor in node 1 fails, or the node corresponding to the downstream service fails, etc., node 1 will find it difficult to obtain the results returned by other nodes, causing the result returned to the user by node 1 when storing the corresponding resource in its database to be inconsistent with the result it actually needs to return to the user (i.e., the result returned by other nodes).
  • an embodiment of the present application provides a task processing method, which can create a task for each resource in the API request when storing the resources in the API request in the database, and store the created tasks in the database. Then, each task can be executed. When a task fails to execute, the task can be retrieved from the database and reprocessed, thereby ensuring that the resources of the API request can be processed, and then ensuring the correctness of the results returned by the node to the user. Therefore, by adding retry self-healing data to the database (i.e., tasks created for each resource in the API request), the node can retry self-healing when the task fails to execute, thereby ensuring the correctness of the results returned by the node.
  • Fig. 2 shows an architecture of a task processing system of a master-master cluster.
  • the task processing system 200 of the master-master cluster may include: a master-master cluster 210 and a database 220.
  • the master-master cluster 210 includes n nodes, where n ⁇ 2.
  • each node in the master-master cluster 210 includes: a request acquisition module 211 , a task creation module 212 , a task scheduling module 213 and a task processing module 214 .
  • the request acquisition module 211 in each node is mainly used to obtain the API request issued by the user.
  • the user can issue an API request through the user interface provided by the node.
  • the API request may include one or more resources to be processed, such as products purchased or updated by the user or tenant.
  • the resources contained in the API request can be stored in the database 220 to complete the persistence of the resources.
  • the request acquisition module 211 can also transmit the API request it obtains to the task creation module 212.
  • the task creation module 212 is mainly used to create tasks for the resources contained in the API request through a preconfigured function after obtaining the API request sent by the request acquisition module 211, and store the created tasks in the database 220 to complete the persistence of the created tasks.
  • each task can be associated with a user or tenant, and/or associated with a resource.
  • different tasks can be associated with the same user or tenant.
  • different tasks can be associated with different resources.
  • the execution order of the tasks corresponding to each resource can be determined based on the dependency relationship or superior-subordinate relationship between each resource.
  • the resources included in the API request are: virtual machines, network cards and volumes purchased by tenants, since network cards and volumes need to be created before executing the creation of virtual machines, and there is no dependency relationship between the creation of network cards and volumes, therefore, tasks can be created for the three resources of creating virtual machines, creating network cards and creating volumes, and the tasks corresponding to the creation of network cards and the creation of volumes can be set in parallel, and the tasks corresponding to the creation of virtual machines need to be executed after the tasks corresponding to the creation of network cards and the creation of volumes are completed.
  • the task scheduling module 213 is mainly used to obtain tasks that need to be executed by the node to which it belongs (such as unprocessed tasks or failed tasks, etc.) from the database 220, and to allocate task queues according to the creation time of the tasks, the dependency between tasks, or the identification of the tasks (such as the identification of the tenant associated with the task, or the identification of the resource associated with the task, etc.).
  • the processing order of each task in the task queue is related to the creation time of each task, wherein the earlier the creation time of the task is, the earlier the execution order of the task is, that is, the earlier it is executed.
  • tasks with the same identification can be allocated to the same task queue.
  • the task scheduling module 213 can allocate tasks to be processed according to the identification of the task.
  • a task can be assigned to one task queue or multiple task queues, depending on the actual situation and is not limited here.
  • the task processing module 214 is mainly used to execute the tasks in each task queue assigned by the task scheduling module 213.
  • the task processing module 214 can assign different threads to different task queues, that is, execute the tasks in different task queues through different threads.
  • the task processing module 214 can call the API of the service provided by other nodes, and wait for the execution result of the service called by it, and store the corresponding execution status in the database 220.
  • the execution status stored in the database 220 by the task processing module 214 can be understood as the execution status of each task, so that the task scheduling module 213 can know which tasks in the database 220 are pending tasks and which are successfully processed tasks.
  • the pending tasks may include unprocessed tasks and/or tasks that have failed to be processed. For example, when a task is successfully processed, the execution status of the service required to be called by the task obtained by the task processing module 214 is successful; when a task is failed to be processed, the execution status of the service required to be called by the task obtained by the task processing module 214 is failed.
  • the task processing module 214 can also feedback the execution status corresponding to each resource to the user.
  • each node in the master cluster 210 may have a thread that periodically scans failed tasks from the database 220 so that the task scheduling module 213 in the corresponding node adds the failed tasks to the task queue for retry self-healing.
  • the database 220 is mainly used to store resources, tasks, and execution status of each task stored in each node in the master cluster 210 .
  • the database 220 may also store the operation status of each node in the main main cluster 210, so that when a node fails, the tasks required to be executed are transferred to other nodes, thereby avoiding the situation where resource processing fails due to node failure and ensuring the reliability of resource processing.
  • the main main cluster includes node 1, node 2 and node 3, and tasks 1 to 9 are stored in the database 220.
  • the tasks required to be executed by node 1 are tasks 1 to 3
  • the tasks required to be executed by node 2 are tasks 4 to 6
  • the tasks required to be executed by node 3 are tasks 7 to 9.
  • node 1 when node 2 fails, node 1 can take over the tasks required to be executed by node 2, and the tasks required to be executed by node 1 are tasks 1 to 6. As a result, when node 2 fails, the tasks required to be executed by node 2 can also be transferred to other nodes for execution, ensuring the reliability of resource processing.
  • each node in the main cluster 210 can register its identity in the database 220 when it is started, and write heartbeats to the database 220 during the subsequent operation process, so as to determine whether the node is faulty through the heartbeat of the node, and determine the faulty node through the registered identity of the node.
  • each node can have a thread for regularly writing heartbeats to the database 220, so as to refresh the state of the node to the database 220.
  • each node can also have a thread that regularly queries the health status of other nodes from the database 220 to regularly query whether other nodes are faulty.
  • a master node may be set in the master cluster 210, and the master node periodically scans the status of other nodes from the database 220, and when a node fails, takes over the tasks that have not been successfully processed by the failed node.
  • each node in the master cluster 210 may periodically try to lock a row in the database 220, and at this time, the node that preferentially locks the row may be used as the master node.
  • the node may give up (i.e., release) locking the row in the database 220 and not participate in the selection of the master node for a period of time.
  • each node in the master cluster 210 can take over the tasks that were not successfully executed by the failed node. At this time, each node can periodically query the health status of other nodes from the database 220. If node 1 detects that node 2 is faulty, node 1 can take over some of the tasks that were not successfully executed on node 2 based on its own execution capability. The tasks that node 1 did not take over can be taken over by other nodes that detect that node 2 is faulty.
  • the node can give priority to the tasks that belong to it but have not been successfully executed, so as to prevent its own tasks from being handled in time after being taken over by other nodes.
  • the node that has returned to normal can first judge whether the task that it currently needs to execute is executed by the execution status of each task stored in the database 220, if it has been executed and the result of execution is success, then it skips the task and processes the next task.
  • the node that takes over the task of the node that has failed can also first judge whether the task that it currently needs to execute is executed by the execution status of each task stored in the database 220, if it has been executed, then it skips the task and processes the next task.
  • the node that takes over the task of the node that has failed can also judge whether the execution status is successful, if the execution fails, then it can continue to execute the task, otherwise it processes the next task.
  • the state constraints of resources can be used to ensure the order of internal resource processing to prevent disorder.
  • closed loop implementation can be used to make the state constraints of resources more complex, thereby ensuring the order of internal resource processing.
  • the state of a resource can include whether the creation of the resource is successful. Success, whether the resource is deleted successfully, whether the resource is updated successfully, etc. Exemplarily, if the status after processing any resource is failed, the any resource can be reprocessed until the status after processing the any resource is successful, thereby constraining the status of the resources and ensuring the processing order of the resources.
  • an alarm message can be output, thereby letting the user know the situation of resource processing failure and perform manual repair.
  • a node in the master-master cluster when a node in the master-master cluster is creating a resource, it can write the state of the resource being created into the aforementioned database 220. At this time, the state of the resource in the database 220 can be refreshed to being created, i.e., “creating” as shown in FIG5 .
  • the node When the node successfully creates a resource, it can write the state of successful creation into the aforementioned database 220. At this time, the state of the resource in the database 220 can be refreshed to an available state, i.e., “available” as shown in FIG5 .
  • the node When the node fails to create a resource, it knows that the resource creation has failed, and it can retry to create the resource at intervals of a preset duration (such as 10s, 20s, etc.). Among them, when the node recreates the resource, it can also write the state of retrying to create into the aforementioned database 220. At this time, the state of the resource in the database 220 can be “retrying” as shown in FIG5 . After the node is recreated and created successfully, that is, "close-loop success" shown in Figure 5, it can write the successful creation status into the aforementioned database 220. At this time, the status of the resource in the database 220 can be refreshed to an available status, that is, "available” shown in Figure 5.
  • a preset duration such as 10s, 20s, etc.
  • the node can write the creation failure status into the aforementioned database 220.
  • the database 220 can refresh the resource status to an unavailable state, that is, "failed” as shown in FIG5.
  • the node can return the creation failure information to the user so that the user can repair it.
  • the node can recreate the resource, that is, "re-enter the initial state after the task is successfully repaired” as shown in FIG5, and can write the resource creation status into the aforementioned database 220.
  • the resource status in the database 220 can be refreshed to being created, that is, "creating” as shown in FIG5. .
  • the node can write the state of deleting resources into the aforementioned database 220.
  • the state of the resource in the database 220 can be refreshed to being deleted, that is, "deleting" shown in Figure 5.
  • the node successfully deletes the resource it can write the state of successful deletion into the aforementioned database 220.
  • the state of the resource in the database 220 can be refreshed to deleted, that is, "deleted” shown in Figure 5.
  • the node fails to delete the resource it knows that the resource deletion has failed, and it can retry to delete the resource at intervals of a preset time (such as 10s, 20s, etc.).
  • the node when the node deletes the resource again, it can also write the state of retrying deletion into the aforementioned database 220.
  • the state of the resource in the database 220 can be "retrying" as shown in Figure 5, and the state of deleting resources can be written into the aforementioned database 220.
  • the state of the resource in the database 220 can be refreshed to being deleted, that is, "deleting" shown in Figure 5.
  • the status of successful deletion can be written into the aforementioned database 220.
  • the status of the resource in the database 220 can be refreshed to deleted, that is, "deleted” shown in Figure 5.
  • the node If the node still fails to delete successfully after re-deleting m times (m ⁇ 1), that is, "close-loop retry failed” as shown in Figure 5, it can write the deletion failure status into the aforementioned database 220. At this time, the status of the resource in the database 220 can be refreshed to an unavailable state, that is, "failed” as shown in Figure 5. After the node fails to delete the resource, it can return the deletion failure information to the user so that the user can repair it. When the user completes the repair, the node can delete the resource again, that is, "re-enter the initial state after the task is successfully repaired” as shown in Figure 5, and can write the status of deleting the resource into the aforementioned database 220. At this time, the status of the resource in the database 220 can be refreshed to being deleted, that is, "deleting" as shown in Figure 5.
  • the node can write the status of the resource being updated into the aforementioned database 220.
  • the status of the resource in the database 220 can be refreshed to being updated, that is, "modifying" shown in Figure 5.
  • the node successfully updates the resource it can write the status of the successful update into the aforementioned database 220.
  • the status of the resource in the database 220 can be refreshed to updated, that is, "available” shown in Figure 5.
  • the node fails to update the resource it knows that the resource update has failed, and it can retry to update the resource at a preset time interval (such as 10s, 20s, etc.).
  • the node when the node re-updates the resource, it can also write the status of the retrying update into the aforementioned database 220. At this time, the status of the resource in the database 220 can be "retrying" as shown in Figure 5. After the node is re-updated and the update is successful, that is, "close-loop success” shown in Figure 5, it can write the successful update status into the aforementioned database 220. At this time, the status of the resource in the database 220 can be refreshed to updated, that is, "available” shown in Figure 5.
  • the node If the node still fails to update successfully after re-updating m times (m ⁇ 1), that is, "close-loop retry failed” as shown in Figure 5, it can write the update failure status to the aforementioned database 220. At this time, the database 220 can refresh the status of the resource to an unavailable state, that is, "failed” as shown in Figure 5. After the node fails to update the resource, it can return the update failure information to the user so that the user can repair it. When the user completes the repair, the node can re-update the resource, that is, "re-update after successful task repair” as shown in Figure 5. The resource state being updated can be written into the aforementioned database 220. At this time, the resource state in the database 220 can be refreshed to being updated, i.e., "modifying" as shown in FIG. 5 .
  • the status of the two resources will be updated to the available state (i.e., available) only when the status of the two resources is both available (i.e., available).
  • the status of any one of the two resources is unavailable (i.e., failed)
  • the status of the two resources can be updated to the unavailable state (i.e., failed). In this way, the status constraint is performed.
  • the operation to be performed in the retry is related to the operation that failed to be performed before the retry. For example, if the operation that failed to be performed before the retry was to create a resource, then the operation to be performed in the retry is to create a resource; if the operation that failed to be performed before the retry was to update a resource, then the operation to be performed in the retry is to update a resource.
  • the state of a resource can be associated with the result of task processing corresponding to the resource.
  • the state of a resource is unavailable, there is a high probability that the task corresponding to the resource cannot be processed successfully.
  • the state of a resource is available, there is a high probability that the task corresponding to the resource can be processed successfully. Therefore, when the task corresponding to a resource is processed successfully, the state of the resource can be refreshed to an available state.
  • the state of the resource can be refreshed to an unavailable state, and at the same time, an attempt can be made to re-execute the operations required before this operation, such as creating resources, updating resources, etc. In this way, it is ensured that when the task corresponding to the resource is re-executed, it can be executed successfully, thereby improving reliability.
  • FIG6 shows a process of a task processing method.
  • the method may be executed by any node in the aforementioned master-master cluster 210.
  • the task processing method may include the following steps:
  • S601 Obtain an application programming interface (API) request issued by a user.
  • API application programming interface
  • the user can issue the required request through the user interface provided by the node.
  • the user can issue a request to purchase the product in the user interface provided by the node.
  • the node obtains the request.
  • the request issued by the user can be, but is not limited to, the API request described above.
  • the API request can be used to create a resource, such as creating a virtual machine, etc., or it can be a request to purchase a product or service, such as purchasing clothes and other goods, etc., or it can be a request to pay money, such as initiating a payment transaction through a financial application (application, APP).
  • application application
  • S602 Create a target task according to the obtained API request, and store the task information of the target task in a database.
  • the node after obtaining the request sent by the user, the node can create a task for the request through a preconfigured function, thereby obtaining a target task.
  • the target task is associated with the API request.
  • the target task is the task of creating the resource;
  • the target task is the task of purchasing a product or service;
  • the target task is the task of paying the money.
  • the node can store the task information of the target task in the database to complete the persistence of the target task, so that when it is known that the target task processing fails, the task can be obtained from the database and re-executed, or, when a node fails, the task to be executed is transferred to other nodes through the database to ensure the reliability of task processing.
  • the task information of the target task may include the task type and/or task content, etc.
  • the task information of the target task may include the size of the hard disk in the virtual machine, the type of operating system, the type and parameters of the central processing unit, the network bandwidth, etc.; when the target task is a task to pay money, the task information of the target task may include the amount of money to be paid, etc.
  • S603 Execute the target task and obtain the execution status of the target task.
  • the node after persisting the target task in the database, the node can execute the target task and store the execution status of the target task in the database.
  • the node can obtain the execution status of the target task from the database, such as whether the execution is successful or failed.
  • the node after the node executes the target task, it can also know the execution status of the target task, and the node does not need to obtain the execution status of the target task from the database.
  • the node when the API request is used to request the creation of a target resource, the node can create the target resource when executing the target task. For example, when the API request is used to request the creation of a virtual machine, the node can create the virtual machine when executing the target task.
  • the node when the node learns that the execution status of the target task is failed, it can re-acquire the task information of the target task from the database and re-execute the target task. In some embodiments, when the number of times the target task is executed reaches a preset number of times and the last execution status is failed, an alarm message can be output. This allows the user to learn about the failure of resource processing and perform manual repairs.
  • the node after receiving the API request, the node creates a task for the request and persists it in the database.
  • the task information of the task can be retrieved from the database and re-executed until it succeeds. Therefore, by adding retry self-healing data (i.e., the task created for the API request) in the database, the node can retry self-healing when the task fails to execute, ensuring the correctness of the result returned by the node.
  • the node may return the execution status of the target task obtained by executing it to the user, so that the user can be informed of the task execution status.
  • each node when there are multiple nodes, each node can write its own heartbeat information into the database.
  • the heartbeat information can be used to characterize whether a node fails.
  • Each node can obtain the heartbeat information of other nodes from the database.
  • node 1 obtains the heartbeat information of another node (hereinafter referred to as "node 2")
  • node 1 can obtain the task information of M (M ⁇ 1) tasks created by node 2 and not executed or failed to execute from the database, and, according to the task information of these M tasks, execute these M tasks, and store the execution results of each task in the M tasks in the database.
  • node 1 may first determine whether the execution status of any task to be executed stored in the database is failed or not executed.
  • the embodiment of the present application also provides a task processing device.
  • FIG7 shows a task processing device.
  • the resource processing device can be but is not limited to being deployed in any node in the aforementioned master-master cluster.
  • the task processing device 700 includes: a request acquisition module 701, a task creation module 702, and a task processing module 703.
  • the request acquisition module 701 is used to obtain an application programming interface API request issued by a user.
  • the task creation module 702 is used to create a target task according to the API request and store the task information of the target task in a database.
  • the task processing module 703 is used to execute the target task and obtain the execution status of the target task; and when the execution status of the target task is failed, the target task is re-executed according to the task information of the target task stored in the database.
  • the request acquisition module 701 can be but is not limited to the request acquisition module 211 in FIG2
  • the task creation module 702 can be but is not limited to the task creation module 212 in FIG2
  • the task processing module 703 can be but is not limited to the task processing module 214 in FIG2.
  • the task processing module 703 is further used to: store the execution status of the target task in a database.
  • the task processing module is specifically used to: acquire the execution status of the target task from the database.
  • the task processing module 703 is also used to: obtain heartbeat information of the second node from the database, the heartbeat information is used to characterize whether the second node fails; when the second node fails, obtain task information of at least one task created by the second node and not executed or failed to execute from the database; execute at least one task according to the task information of at least one task, and store the execution status of each task in the at least one task in the database.
  • the task processing module 703 is further used to: output an alarm message when the number of times the target task is executed reaches a preset number and the last execution status is failure.
  • the task processing module 703 when the API request is used to request the creation of a target resource, the task processing module 703 is specifically used to create the target resource when executing the target task.
  • the task processing module is further used to: return the execution status of the target task to the user.
  • the request acquisition module 701, the task creation module 702 and the task processing module 703 can all be implemented by software, or can be implemented by hardware. Exemplarily, the implementation of the request acquisition module 701 is described below by taking the request acquisition module 701 as an example. Similarly, the implementation of the task creation module 702 and the task processing module 703 can refer to the implementation of the request acquisition module 701.
  • the request acquisition module 701 may include code running on a computing instance.
  • the computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Furthermore, the computing instance may be one or more.
  • the request acquisition module 701 may include code running on multiple hosts/virtual machines/containers. It should be noted that Multiple hosts/virtual machines/containers for running the code can be distributed in the same region or in different regions. Furthermore, multiple hosts/virtual machines/containers for running the code can be distributed in the same availability zone (AZ) or in different AZs, each AZ including one data center or multiple geographically close data centers. Generally, one region can include multiple AZs.
  • AZ availability zone
  • VPC virtual private cloud
  • multiple hosts/virtual machines/containers used to run the code can be distributed in the same virtual private cloud (VPC) or in multiple VPCs.
  • VPC virtual private cloud
  • a VPC is set up in a region.
  • a communication gateway needs to be set up in each VPC to achieve interconnection between VPCs through the communication gateway.
  • the request acquisition module 701 may include at least one computing device, such as a server, etc.
  • the request acquisition module 701 may also be a device implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • the PLD may be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL) or any combination thereof.
  • the multiple computing devices included in the request acquisition module 701 can be distributed in the same region or in different regions.
  • the multiple computing devices included in the acquisition module 1001 can be distributed in the same AZ or in different AZs.
  • the multiple computing devices included in the request acquisition module 701 can be distributed in the same VPC or in multiple VPCs.
  • the multiple computing devices can be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
  • the request acquisition module 701 can be used to execute any step in the aforementioned task processing method
  • the task creation module 702 can be used to execute any step in the method provided by the above embodiments
  • the task processing module 703 can be used to execute any step in the method provided by the above embodiments; the steps that the request acquisition module 701, the task creation module 702, and the task processing module 703 are responsible for implementing can be specified as needed, and all functions of the task processing device 700 are realized by respectively implementing different steps in the method provided by the above embodiments through the request acquisition module 701, the task creation module 702, and the task processing module 703.
  • the present application also provides a computing device 800.
  • the computing device 800 includes: a bus 802, a processor 804, a memory 806, and a communication interface 808.
  • the processor 804, the memory 806, and the communication interface 808 communicate with each other through the bus 802.
  • the computing device 800 can be a server or a terminal device. It should be understood that the present application does not limit the number of processors and memories in the computing device 800.
  • the bus 802 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • the bus may be divided into an address bus, a data bus, a control bus, etc.
  • FIG8 is represented by only one line, but does not mean that there is only one bus or one type of bus.
  • the bus 804 may include a path for transmitting information between various components of the computing device 800 (e.g., the memory 806, the processor 804, and the communication interface 808).
  • Processor 804 may include any one or more of a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP), or a digital signal processor (DSP).
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • the memory 806 may include a volatile memory, such as a random access memory (RAM).
  • the processor 104 may also include a non-volatile memory, such as a read-only memory (ROM), a flash memory, a hard disk drive (HDD), or a solid state drive (SSD).
  • ROM read-only memory
  • HDD hard disk drive
  • SSD solid state drive
  • the memory 806 stores executable program codes, and the processor 804 executes the executable program codes to respectively implement the functions of the request acquisition module 701, the task creation module 702, and the task processing module 703, thereby implementing all or part of the steps of the method in the above embodiment. That is, the memory 806 stores instructions for executing all or part of the steps in the method in the above embodiment.
  • the memory 806 stores executable codes
  • the processor 804 executes the executable codes to respectively implement the functions of the task processing device 700, thereby implementing all or part of the steps in the above-mentioned embodiment method. That is, the memory 806 stores instructions for executing all or part of the steps in the above-mentioned embodiment method.
  • the communication interface 803 uses a transceiver module such as, but not limited to, a network interface card or a transceiver to implement communication between the computing device 800 and other devices or communication networks.
  • a transceiver module such as, but not limited to, a network interface card or a transceiver to implement communication between the computing device 800 and other devices or communication networks.
  • the present application also provides a computing device cluster.
  • the computing device cluster includes at least one computing device.
  • the computing device may be a server, such as a central server, an edge server, or a local server in a local data center.
  • the computing device cluster includes at least one computing device.
  • the computing device can also be a terminal device such as a desktop computer, a laptop computer or a smart phone.
  • the computing device cluster includes at least one computing device 800.
  • the memory 806 in one or more computing devices 800 in the computing device cluster may store the same instructions for executing all or part of the steps in the above embodiment method.
  • the memory 806 of one or more computing devices 800 in the computing device cluster may also store partial instructions for executing all or part of the steps in the above-mentioned embodiment method.
  • the combination of one or more computing devices 800 may jointly execute instructions for executing all or part of the steps in the above-mentioned embodiment method.
  • the memory 806 in different computing devices 800 in the computing device cluster can store different instructions, which are respectively used to execute part of the functions of the task processing apparatus 700. That is, the instructions stored in the memory 806 in different computing devices 800 can implement the functions of one or more of the aforementioned request acquisition module 701, task creation module 702, and task processing module 703.
  • one or more computing devices in the computing device cluster may be connected via a network, which may be a wide area network or a local area network.
  • the embodiment of the present application provides a node.
  • the node may include: at least one memory for storing a program; and at least one processor for executing the program stored in the memory.
  • the processor is used to execute the method in the above embodiment.
  • an embodiment of the present application provides a computer-readable storage medium, which stores a computer program.
  • the computer program runs on a processor, the processor executes the method in the above embodiment.
  • an embodiment of the present application provides a computer program product.
  • the computer program product runs on a processor
  • the processor executes the method in the above embodiment.
  • processors in the embodiments of the present application may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSP), application specific integrated circuits (ASIC), field programmable gate arrays (FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof.
  • the general-purpose processor may be a microprocessor or any conventional processor.
  • the method steps in the embodiments of the present application can be implemented by hardware or by a processor executing software instructions.
  • the software instructions can be composed of corresponding software modules, which can be stored in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disks, mobile hard disks, CD-ROMs, or any other form of storage medium known in the art.
  • An exemplary storage medium is coupled to a processor so that the processor can read information from the storage medium and write information to the storage medium.
  • the storage medium can also be a component of the processor.
  • the processor and the storage medium can be located in an ASIC.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium.
  • the computer instructions may be transmitted from one website, computer, server or data center to another website, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more available media integrated.
  • the available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid state drive (SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

L'invention concerne un procédé de traitement de tâche, comprenant les étapes suivantes : un nœud acquiert une demande d'interface de programmation d'application (API) envoyée par un utilisateur ; crée une tâche cible selon la demande d'API, et stocke des informations de tâche de la tâche cible dans une base de données ; exécute la tâche cible, et acquiert l'état d'exécution de la tâche cible ; et lorsque l'état d'exécution de la tâche cible est "échoué", réexécute la tâche cible selon les informations de tâche de la tâche cible qui sont stockées dans la base de données. De cette manière, des données (c'est-à-dire une tâche créée pour une demande d'API) pour une auto-réparation de relance sont ajoutées dans une base de données, de telle sorte qu'un nœud peut effectuer une auto-réparation de relance lorsque l'exécution d'une tâche échoue, ce qui permet d'assurer l'exactitude d'un résultat renvoyé par le nœud.
PCT/CN2023/101285 2022-11-22 2023-06-20 Procédé et appareil de traitement de tâche, et nœud WO2024109003A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211463984.8A CN118093104A (zh) 2022-11-22 2022-11-22 一种任务处理方法、装置及节点
CN202211463984.8 2022-11-22

Publications (1)

Publication Number Publication Date
WO2024109003A1 true WO2024109003A1 (fr) 2024-05-30

Family

ID=91158931

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101285 WO2024109003A1 (fr) 2022-11-22 2023-06-20 Procédé et appareil de traitement de tâche, et nœud

Country Status (2)

Country Link
CN (1) CN118093104A (fr)
WO (1) WO2024109003A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156198A (zh) * 2015-04-22 2016-11-23 阿里巴巴集团控股有限公司 基于分布式数据库的任务执行方法及装置
CN108304255A (zh) * 2017-12-29 2018-07-20 北京城市网邻信息技术有限公司 分布式任务调度方法及装置、电子设备及可读存储介质
CN108334545A (zh) * 2017-12-27 2018-07-27 微梦创科网络科技(中国)有限公司 一种实现异步服务的方法及装置
US10691558B1 (en) * 2016-09-22 2020-06-23 Amazon Technologies, Inc. Fault tolerant data export using snapshots
US11138033B1 (en) * 2018-08-24 2021-10-05 Amazon Technologies, Inc. Providing an application programming interface (API) including a bulk computing task operation
CN113515357A (zh) * 2021-04-20 2021-10-19 建信金融科技有限责任公司 集群环境下批量任务的执行方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156198A (zh) * 2015-04-22 2016-11-23 阿里巴巴集团控股有限公司 基于分布式数据库的任务执行方法及装置
US10691558B1 (en) * 2016-09-22 2020-06-23 Amazon Technologies, Inc. Fault tolerant data export using snapshots
CN108334545A (zh) * 2017-12-27 2018-07-27 微梦创科网络科技(中国)有限公司 一种实现异步服务的方法及装置
CN108304255A (zh) * 2017-12-29 2018-07-20 北京城市网邻信息技术有限公司 分布式任务调度方法及装置、电子设备及可读存储介质
US11138033B1 (en) * 2018-08-24 2021-10-05 Amazon Technologies, Inc. Providing an application programming interface (API) including a bulk computing task operation
CN113515357A (zh) * 2021-04-20 2021-10-19 建信金融科技有限责任公司 集群环境下批量任务的执行方法及装置

Also Published As

Publication number Publication date
CN118093104A (zh) 2024-05-28

Similar Documents

Publication Publication Date Title
KR102510195B1 (ko) 트랜잭션 처리 방법, 장치 및 기기, 그리고 컴퓨터 저장 매체
US20190340166A1 (en) Conflict resolution for multi-master distributed databases
CN107766080B (zh) 事务消息处理方法、装置、设备及系统
CN110188110B (zh) 一种构建分布式锁的方法及装置
KR102121157B1 (ko) 동시 블록체인 트랜잭션 실패를 해결하기 위한 넌스 테이블의 이용
US10997158B2 (en) Techniques for updating big data tables using snapshot isolation
US20210165810A1 (en) Transaction processing method, apparatus, and device and computer storage medium
CN110795447A (zh) 数据处理方法、数据处理系统、电子设备和介质
US11741081B2 (en) Method and system for data handling
CN112559496B (zh) 一种分布式数据库事务原子性实现方法及装置
US20210218827A1 (en) Methods, devices and systems for non-disruptive upgrades to a replicated state machine in a distributed computing environment
WO2024109003A1 (fr) Procédé et appareil de traitement de tâche, et nœud
US11138231B2 (en) Method and system for data handling
CN114546705B (zh) 操作响应方法、操作响应装置、电子设备以及存储介质
CN113077241B (zh) 审批处理方法、装置、设备及存储介质
CN112783954B (zh) 数据访问方法、装置及服务器
US11500857B2 (en) Asynchronous remote calls with undo data structures
CN113835780A (zh) 一种事件响应方法及装置
US11442668B2 (en) Prioritizing volume accesses in multi-volume storage device based on execution path of a service
CN114116732B (zh) 处理事务的方法、装置、存储装置以及服务器
US20230061088A1 (en) Systems and methods for zero downtime distributed search system updates
CN117785900A (zh) 数据更新方法、装置、计算机设备和存储介质
CN116301634A (zh) 资源交互状态的检测方法、装置、设备及介质
CN116467050A (zh) 事务处理方法、装置、设备、存储介质及系统
CN115422188A (zh) 表结构在线变更方法及装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23893103

Country of ref document: EP

Kind code of ref document: A1