CN118093104A - Task processing method, device and node - Google Patents

Task processing method, device and node Download PDF

Info

Publication number
CN118093104A
CN118093104A CN202211463984.8A CN202211463984A CN118093104A CN 118093104 A CN118093104 A CN 118093104A CN 202211463984 A CN202211463984 A CN 202211463984A CN 118093104 A CN118093104 A CN 118093104A
Authority
CN
China
Prior art keywords
task
node
database
target
target task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211463984.8A
Other languages
Chinese (zh)
Inventor
朱娜
罗光
姚博
田应军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Priority to CN202211463984.8A priority Critical patent/CN118093104A/en
Priority to PCT/CN2023/101285 priority patent/WO2024109003A1/en
Publication of CN118093104A publication Critical patent/CN118093104A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

A task processing method, comprising: the node obtains an Application Programming Interface (API) request sent by a user; creating a target task according to the API request, and storing task information of the target task into a database; executing the target task and acquiring the execution state of the target task; and when the execution state of the target task is failure, re-executing the target task according to the task information of the target task stored in the database. In this way, the data for retrying self-healing (i.e. the task created for the API request) is added in the database, so that the node can retry self-healing when the task execution fails, and the correctness of the result returned by the node is ensured.

Description

Task processing method, device and node
Technical Field
The present application relates to the field of information technologies (informat ion techno logy, I T), and in particular, to a task processing method, device, and node.
Background
With the continuous development of technology, the service architecture gradually evolves from a single architecture to a distributed architecture, and the deployment mode of the service also evolves from a single process mode to a cluster mode, so as to ensure the elastic capacity expansion capability and high reliability of software, so as to cope with the change of users. In general, clusters can be divided into primary and backup clusters and primary clusters. In the active-standby cluster, only the active node works, and the standby node does not work. In the main cluster, each node works, and a load equalizer needs to be deployed at the front end to realize load sharing of the cluster. Compared with the main and standby clusters, the main and standby clusters have higher performance and are more commonly used. However, in the main cluster, when there is a call relationship between two nodes, if communication between the two nodes fails or processing of the called nodes fails, a result returned by one of the nodes to the user will not be consistent with an actual result, and user experience is affected.
Disclosure of Invention
The application provides a task processing method, a device, a node, a computer storage medium and a computer product, which can reprocess a task requested by a user when the task processing fails, thereby ensuring that the returned result of the node is consistent with the actual result and improving the user experience.
In a first aspect, the present application provides a task processing method, which may be applied to a first node. The method may include: acquiring an Application Programming Interface (API) request sent by a user; creating a target task according to the API request, and storing task information of the target task into a database; executing the target task and acquiring the execution state of the target task; and when the execution state of the target task is failure, re-executing the target task according to the task information of the target task stored in the database.
In this way, after acquiring the API request, the node creates a task for the request and persists the task to the database, so that when the task corresponding to the request fails to be processed, the task corresponding to the request can be acquired again from the database and re-executed. Therefore, the self-healing retried data (namely the task created for the API request) is added in the database, so that the node can retry the self-healing when the task execution fails, and the correctness of the result returned by the node is ensured.
In one possible implementation, after performing the target task, the method further includes: the execution state of the target task is stored in a database. Thereby refreshing the status of the task into the database. At this time, acquiring the execution state of the target task may include: the execution state of the target task is obtained from the database.
In one possible implementation, the method may further include: acquiring heartbeat information of the second node from the database, wherein the heartbeat information is used for representing whether the second node fails or not; when the second node fails, task information of at least one task which is created by the second node and is not executed or fails to be executed is obtained from a database; and executing at least one task according to the task information of the at least one task, and storing the execution state of each task in the at least one task into a database. Therefore, the situation that the task processing fails due to the node failure is avoided, and the reliability of the task processing is ensured.
In one possible implementation, the method may further include: and when the number of times of executing the target task reaches the preset number of times, and the last execution state is failure, outputting alarm information. Therefore, the user can know the condition of failure of resource processing, and the user can perform artificial repair.
In one possible implementation, the API request is for requesting creation of a target resource, execution of a target task, including: a target resource is created.
In one possible implementation, the method further includes: and returning the execution state of the target task to the user. Thereby allowing the user to learn the execution state of the task.
In a second aspect, the present application provides a task processing device. The apparatus may be deployed at a first node. The device comprises: the request acquisition module is used for acquiring an Application Programming Interface (API) request sent by a user; the task creation module is used for creating a target task according to the API request and storing task information of the target task into the database; the task processing module is used for executing the target task and acquiring the execution state of the target task; and re-executing the target task according to the task information of the target task stored in the database when the execution state of the target task is failure.
In one possible implementation, the task processing module is further configured to, after executing the target task: storing the execution state of the target task into a database; the task processing module is specifically configured to, when acquiring an execution state of a target task: the execution state of the target task is obtained from the database.
In one possible implementation, the task processing module is further configured to: acquiring heartbeat information of the second node from the database, wherein the heartbeat information is used for representing whether the second node fails or not; when the second node fails, task information of at least one task which is created by the second node and is not executed or fails to be executed is obtained from a database;
and executing at least one task according to the task information of the at least one task, and storing the execution state of each task in the at least one task into a database.
In one possible implementation, the task processing module is further configured to: and when the number of times of executing the target task reaches the preset number of times, and the last execution state is failure, outputting alarm information.
In one possible implementation, the API request is used to request creation of a target resource, and the task processing module is specifically configured to, when executing a target task: a target resource is created.
In one possible implementation, the task processing module is further configured to: and returning the execution state of the target task to the user.
In a third aspect, the present application provides a node comprising: at least one memory for storing a program; at least one processor for executing programs stored in the memory; wherein the processor is adapted to perform the method as described in the first aspect or any one of the possible implementations of the first aspect, when the memory-stored program is executed.
In a fourth aspect, the present application provides a computing device comprising: at least one memory for storing a program; at least one processor for executing programs stored in the memory; wherein the processor is adapted to perform the method described in the first aspect or any one of the possible implementations of the first aspect, when the memory-stored program is executed. By way of example, the computing device may be a node in a cluster of devices, such as node 1 shown in FIG. 2, and so on.
In a fifth aspect, the present application provides a cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory; the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method described in the first aspect or any one of the possible implementations of the first aspect.
In a sixth aspect, the present application provides a computer readable storage medium storing a computer program which, when run on a processor, causes the processor to perform the method described in the first aspect or any one of the possible implementations of the first aspect.
In a seventh aspect, the application provides a computer program product which, when run on a processor, causes the processor to perform the method described in the first aspect or any one of the possible implementations of the first aspect.
It will be appreciated that the advantages of the second to seventh aspects may be found in the relevant description of the first aspect, and are not described here again.
Drawings
FIG. 1 is a schematic diagram of a working process of a node in a primary cluster according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a system architecture of a task processing system of a primary host cluster according to an embodiment of the present application;
FIG. 3 is a schematic diagram of resources included in an API request according to an embodiment of the present application;
FIG. 4 is a schematic diagram of task processing when a node fails according to an embodiment of the present application;
FIG. 5 is a schematic diagram illustrating a change in a resource status when a node processes a resource according to an embodiment of the present application;
FIG. 6 is a schematic flow chart of a task processing method according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a task processing device according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a computing device according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a computing device cluster according to an embodiment of the present application.
Detailed Description
The term "and/or" herein is an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. The symbol "/" herein indicates that the associated object is or is a relationship, e.g., A/B indicates A or B.
The terms "first" and "second" and the like in the description and in the claims are used for distinguishing between different objects and not for describing a particular sequential order of objects. For example, the first response message and the second response message, etc. are used to distinguish between different response messages, and are not used to describe a particular order of response messages.
In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the description of the embodiments of the present application, unless otherwise specified, the meaning of "plurality" means two or more, for example, the meaning of a plurality of processing units means two or more, or the like; the plurality of elements means two or more elements and the like.
By way of example, fig. 1 shows a schematic diagram of the operation of a node in a primary host cluster. As shown in fig. 1, in order to improve the resource processing performance, an asynchronous processing manner is often adopted inside the node 1. In fig. 1, a portion of threads in a thread pool (thread pool) of node 1 may obtain one or more application programming interface (app l icat ion programming interface, API) requests and store the resources of each API request into a database layer (database l ayer, DB l ayer). After storing the resources requested by a certain API in the database, the node 1 may read the stored result from the database by the thread processing the API request. When the storage result is successful, the thread returns a request result for representing the success of the processing to the user. Another portion of the threads in the thread pool may be used for internal processing resources, such as APIs that call other services, and the like. In addition, after the resources requested by a certain API are stored in the database, the resource processor (resource hand ler) in the node 1 may call the services provided by other nodes through a certain thread. Finally, the node 1 may send the results returned by other nodes to the user. For example, the service provided by other nodes may also be referred to as a "downstream service", i.e. a service located downstream of the service provided by node 1, or a service that node 1 needs to invoke, etc. For example, node 1 provides service 1 and node 2 provides service 2, and if service 1 is running, service 2 needs to be invoked, service 2 may be referred to as a downstream service of service 1.
However, in fig. 1, when the resource processor in the node 1 fails to process, or the node corresponding to the downstream service fails, the node 1 will have difficulty in acquiring the results returned by other nodes, so that the results returned by the node 1 when the corresponding resource is stored in the database thereof are inconsistent with the results actually required to be returned to the user (i.e. the results returned by other nodes).
In view of this, the embodiment of the present application provides a task processing method, which can create a task for each resource in an API request when storing the resource in the API request in a database, and store the created task in the database. Then, each task can be executed, and when the execution of a certain task fails, the task can be re-acquired from the database and re-processed, so that the resources requested by the API can be processed, and the correctness of the result returned to the user by the node is further ensured. Therefore, the self-healing retried data (namely the task created by each resource in the API request) is added in the database, so that the node can retry the self-healing when the task execution fails, and the correctness of the return result of the node is ensured.
By way of example, FIG. 2 illustrates an architecture of a task processing system of a primary host cluster. As shown in fig. 2, the task processing system 200 of the master cluster may include: a primary master cluster 210 and a database 220. The master cluster 210 includes n nodes, n.gtoreq.2.
In this embodiment, each node in the master cluster 210 includes: a request acquisition module 211, a task creation module 212, a task scheduling module 213 and a task processing module 214.
The request obtaining module 211 in each node is mainly configured to obtain an API request issued by a user. Illustratively, a user may issue an API request through a user interface provided by the node. For example, one or more resources to be processed, such as products purchased or updated by a user or tenant, may be included in the API request. In some embodiments, after the request obtaining module 211 obtains the API request, the resources included in the API request may be stored in the database 220 to complete the persistence of the resources. In addition, the request acquisition module 211 may also transmit the API request acquired by the request acquisition module to the task creation module 212.
The task creation module 212 is mainly configured to create a task for a resource included in the API request through a preconfigured function after acquiring the API request sent by the request acquisition module 211, and store the created task in the database 220 to complete persistence of the created task. In some embodiments, each task may be associated with one user or tenant, and/or with one resource. In some embodiments, different tasks may be associated with the same user or tenant. In addition, different tasks may be associated with different resources. In some embodiments, when a plurality of resources are included in one API request, the execution order of the tasks corresponding to the respective resources may be determined based on the dependency relationship or the upper-lower relationship between the respective resources, or the like. For example, as shown in fig. 3, if the resources included in the API request are: the virtual machine, the network card and the volume purchased by the tenant are created before the virtual machine is created, and no dependency relationship exists between the network card and the volume, so that tasks of creating the virtual machine, the network card and the volume can be created for three resources respectively, and the tasks corresponding to the network card and the volume can be set to be parallel, and the tasks corresponding to the virtual machine and the volume can be set to be executed after the tasks corresponding to the network card and the volume are created.
The task scheduling module 213 is mainly configured to obtain, from the database 220, a task (such as an unprocessed task or a task that fails to be processed) that needs to be executed by a node to which the task belongs, and allocate a task queue according to a creation time of the task, a dependency relationship between the tasks, an identifier of the task (such as an identifier of a tenant associated with the task, or an identifier of a resource associated with the task), and the like. In some embodiments, when the task queues are allocated according to the creation time of the tasks, the processing order of each task in the task queues is related to the creation time of each task, wherein the earlier the creation time, the earlier the execution order of the tasks is, the earlier is executed. In some embodiments, when task queues are allocated according to the identities of the tasks, the same identified tasks may be allocated to the same task queue. In some embodiments, the task scheduling module 213 may allocate one task queue for the task to be processed, or may allocate a plurality of task queues, which may be specific according to the actual situation, and is not limited herein.
The task processing module 214 is mainly configured to execute tasks in the respective task queues allocated by the task scheduling module 213. In some embodiments, the task processing module 214 can allocate different threads for different task queues, i.e., execute tasks in different task queues by different threads. In some embodiments, the task processing module 214 may call the API of the service provided by the other node and wait for the execution result of the service called by the other node when executing the task, and store the corresponding execution state in the database 220. Illustratively, the execution states of the tasks stored in the database 220 by the task processing module 214 can be understood as the execution states of the respective tasks, so that the task scheduling module 213 can learn which tasks in the database 220 are tasks to be processed and which are tasks that have been successfully processed. The tasks to be processed may include unprocessed tasks and/or tasks that fail to process. For example, when a task is successfully processed, the task processing module 214 obtains that the execution state of the service that the task needs to call is successful; when a task fails to process, the task processing module 214 obtains the execution status of the service that the task needs to call as failure. In some embodiments, the task processing module 214 can also feed back the execution status corresponding to each resource to the user.
In some embodiments, there may be one thread in each node in the master cluster 210 periodically scanning the slave database 220 for failed processing tasks, so that the task scheduling module 213 in the corresponding node adds the failed processing tasks to the task queue for retry self-healing.
In this embodiment, the database 220 is mainly used to store resources, tasks and execution states of the tasks stored in each node in the main cluster 210.
In some embodiments, the database 220 may further store the operation condition of each node in the main cluster 210, so that when a node fails, the tasks that need to be executed by the node are transferred to other nodes, thereby avoiding the failure of resource processing caused by the failure of the node, and ensuring the reliability of resource processing. For example, as shown in fig. 4 (a), the master cluster includes node 1, node2, and node 3, and tasks 1 to 9 are stored in the database 220. Tasks to be executed by the node 1 are tasks 1 to 3, tasks to be executed by the node2 are tasks 4 to 6, and tasks to be executed by the node 3 are tasks 7 to 9. As shown in fig. 4 (B), when the node2 fails, the node 1 may take over the tasks that the node2 needs to perform, and the tasks that the node 1 needs to perform are tasks 1 to 6. Therefore, when the node2 fails, tasks required to be executed by the node2 can be transferred to other nodes for execution, and the reliability of resource processing is ensured.
Each node in the main cluster 210 may register an identity in the database 220 when it is started, and write a heartbeat to the database 220 in a subsequent operation process, so as to determine whether the node fails through the heartbeat of the node, and determine the failed node through the registered identity of the node. At this point, each node may have a thread thereon for periodically writing a heartbeat to database 220, thereby flushing the state of the node to database 220. In addition, each node may also have a thread that periodically queries the database 220 for the health status of other nodes to periodically query whether other nodes are malfunctioning.
As a possible implementation manner, a master node may be set in the master cluster 210, and the master node periodically scans the states of other nodes from the database 220, and when a node fails, takes over the tasks that have not been successfully processed by the failed node. Illustratively, each node in the primary master cluster 210 may periodically attempt to lock a row in the database 220, where the node that preferentially locks the row may be the master node. For example, in order to prevent the master node from affecting its processing performance by taking over tasks of other nodes for a long time, when a duration of one node as the master node reaches a preset duration, the node may discard (i.e. release) a row in the lock database 220, and does not participate in the selection work of the master node for a period of time.
As another possible implementation, each node in the primary master cluster 210 may take over that failed node did not perform a successful task. At this point, each node may periodically query the database 220 for the health status of the other nodes. If node 1 detects that node 2 is malfunctioning, then node 1 may take over a portion of the tasks on node 2 that were not successfully performed, based on its own execution capabilities. Whereas the tasks not taken over by node 1 may be taken over by other nodes that detect the failure of node 2.
Further, when a failed node returns to normal, the node may prioritize tasks that are attributed to itself but not successfully performed to prevent untimely post-processing of its own tasks taken over by other nodes. In addition, when executing a task, the node that is restored may first determine, from the execution status of each task stored in the database 220, whether the task that is currently required to be executed is executed, and if the task has been executed and the execution result is successful, it skips the task and processes the next task. Meanwhile, when executing the task that it takes over, the node that takes over the task that is the failed node may also determine, from the execution status of each task stored in the database 220, whether the task that it needs to execute at present is executed, and if so, it skips the task and processes the next task. Of course, when the node that takes over the task of the failed node determines that the task that needs to be executed currently has been executed, it may also determine whether the execution state is successful, if execution fails, it may continue to execute the task, otherwise, it processes the next task.
In some embodiments, for complex business scenarios, there is a dependency between processing of resources, and order preservation is required to simplify internal implementation, so that the order of internal processing of resources can be ensured by state constraint of the resources, and the disorder problem is prevented. In this embodiment, the implementation may be performed by a closed loop, so that the state constraint of the resource becomes complex, thereby ensuring the sequence of internal processing of the resource. The status of the resource may include whether the creation of the resource was successful, whether the deletion of the resource was successful, whether the update of the resource was successful, and the like. For example, in the case that the state after any one resource is processed is failed, the any one resource may be reprocessed until the state after any one resource is processed is successful, so as to constrain the state of the resource and ensure the processing sequence of the resource. In addition, when the number of times of processing any one resource reaches the preset number of times and the state after processing any one resource is still failed, alarm information can be output, so that a user can acquire the condition that the resource processing is failed and perform manual repair.
Specifically, as shown in fig. 5, when a node in the main cluster is creating a resource, it may write the state of the resource being created into the aforementioned database 220, and at this time, the state of the resource may be refreshed in the database 220 as being created, that is, "creating" shown in fig. 5. When the node succeeds in creating the resource, it may write the state of the successful creation into the aforementioned database 220, and at this time, the database 220 may refresh the state of the resource to the available state, that is, "avai l ab le" shown in fig. 5. When the node fails to create a resource, it knows that the resource has failed to create, and it can retry to create the resource at intervals of a preset duration (e.g., 10s, 20s, etc.). When the node re-creates the resource, the state of the resource in the database 220 may be "retrying" as shown in fig. 5, where the state of the resource being re-created may also be written into the database 220. After the node is recreated and the creation is successful, i.e. "c lose-loop is successful" as shown in fig. 5, it may write the state of the successful creation into the aforementioned database 220, at which point the state of the resource may be refreshed in the database 220 to an available state, i.e. "avai l ab le" as shown in fig. 5.
After the node re-creates m times (m is greater than or equal to 1), if the node has not yet created successfully, i.e. "c lose-loop retry fails" shown in fig. 5, it may write the state of the failed creation into the aforementioned database 220, at this time, the state of the resource may be refreshed into an unavailable state in the database 220, i.e. "fai led" shown in fig. 5. After failing to create the resource, the node may return information of the failed creation to the user to allow the user to repair. When the user completes the repair, the node may recreate the resource, i.e. "after task repair succeeded" shown in fig. 5, re-enter the initial state ", and may write the state of the resource being created into the aforementioned database 220, at which time the state of the resource may be refreshed in the database 220 as being created, i.e." create "shown in fig. 5. .
After the creation of the resource is completed, when a node in the main cluster needs to delete the resource, the node may write the state of deleting the resource into the aforementioned database 220, and at this time, the database 220 may refresh the state of the resource to be deleting, that is, the "de let ing" shown in fig. 5. When the node succeeds in deleting the resource, it may write the state of the successful deletion into the aforementioned database 220, and at this time, the database 220 may refresh the state of the resource to be deleted, that is, "de leted" shown in fig. 5. When the node fails to delete a resource, it knows that the resource fails to delete, and it can retry to delete the resource at intervals of a preset duration (e.g., 10s, 20s, etc.). When the node deletes the resource again, the state of the resource being deleted may be written into the aforementioned database 220, where the state of the resource in the database 220 may be "retrying" shown in fig. 5, and the state of the resource being deleted may be written into the aforementioned database 220, where the state of the resource in the database 220 may be refreshed to be being deleted, i.e. "de let" shown in fig. 5. After the node is re-deleted and the deletion is successful, i.e. "c lose-loop is successful" as shown in fig. 5, it may write the state of the successful deletion into the aforementioned database 220, at this time, the state of the resource may be refreshed in the database 220 to be deleted, i.e. "de leted" as shown in fig. 5.
After the node re-deletes m times (m is greater than or equal to 1), if the deletion is not successful, i.e. "c lose-loop retry fails" shown in fig. 5, it may write the status of deletion failure into the aforementioned database 220, at this time, the status of the resource may be refreshed into an unavailable status in the database 220, i.e. "fai led" shown in fig. 5. After the deletion of the resource fails, the node can return deletion failure information to the user so as to repair the user. When the user completes the repair, the node may re-delete the resource, i.e. "after task repair succeeded" shown in fig. 5, re-enter the initial state ", and may write the state of deleting the resource into the aforementioned database 220, at which time the state of the resource may be refreshed in the database 220 as being deleted, i.e." de let ing "shown in fig. 5.
After the creation of the resource is completed, when a node in the main cluster needs to update the resource, the node may write the state of the resource being updated into the aforementioned database 220, and at this time, the database 220 may refresh the state of the resource to be being updated, that is, "mod ifying" shown in fig. 5. When the node succeeds in updating the resource, it may write the status of the successful update to the aforementioned database 220, at which point the status of the resource may be refreshed in the database 220 to be updated, i.e., "avai l ab le" as shown in fig. 5. When the node fails to update the resource, it knows that the resource fails to update itself, and it can retry updating the resource at intervals of a preset duration (e.g., 10s, 20s, etc.). When the node updates the resource again, the state of the resource in the database 220 may be "retrying" as shown in fig. 5, and the state of the resource may be written into the database 220. After the node has been updated again and the update has succeeded, i.e. "c lose-loop succeeded" as shown in fig. 5, it may write the status of the successful update to the aforementioned database 220, at which point the status of the resource may be refreshed in the database 220 to be updated, i.e. "avai l ab le" as shown in fig. 5.
After m times of updating (m is greater than or equal to 1), if the node has not yet been updated successfully, i.e. "c lose-loop retry fails" shown in fig. 5, it may write the status of the failed update into the aforementioned database 220, at this time, the status of the resource may be refreshed into an unavailable status in the database 220, i.e. "fai led" shown in fig. 5. After the node fails to update the resource, the node can return information of failed update to the user so as to repair the user. When the user completes the repair, the node may re-update the resource, i.e. "after task repair succeeded" shown in fig. 5, re-enter the initial state ", and may write the state of the resource being updated into the aforementioned database 220, at which time the state of the resource may be refreshed in the database 220 to be being updated, i.e." mod ifying "shown in fig. 5.
In fig. 5, only when the state of a certain resource is an available state (i.e., avai l ab le), an update operation or a delete operation is allowed for that resource. Thereby to perform state constraints.
In addition, if there is a dependency relationship between two resources, and the two resources are located on different nodes, when the states of the two resources are both available (i.e. avai l ab le), the states of the two resources are updated to be available (i.e. avai l ab le). When the state of either of the two resources is unavailable (i.e., fai led), the state of both resources may be updated to unavailable (i.e., fai led). Thereby to perform state constraints.
In addition, when the time taken for the retry is long, the user may perform other operations on the node during this time. In some embodiments, the operations that need to be performed for a retry are related to operations that failed to be performed prior to the retry. For example, if the operation failed to be executed before the retry is to create a resource, the operation required to be executed during the retry is to create the resource; and if the operation failed to be executed before retrying is the updated resource, the operation required to be executed during retrying is the updated resource.
Further, the state of the resource may be associated with the task processing result corresponding to the resource. When the state of a resource is in an unavailable state, the large probability is that the task corresponding to the resource cannot be successfully processed. When the state of a resource is available, a high probability is that the task corresponding to the resource can be processed successfully. Therefore, when the task corresponding to a certain resource is successfully processed, the state of the resource can be refreshed into an available state. When the task processing corresponding to a certain resource is identified, the state of the resource can be refreshed to be in an unavailable state, and meanwhile, the operation required to be executed before the operation is executed again, such as creating the resource, updating the resource and the like, is attempted. Therefore, when the task corresponding to the resource is re-executed, the task can be successfully executed, and the reliability is improved.
The above is an introduction to the resource processing system of the primary main cluster provided by the embodiment of the present application, and based on the above, a task processing method provided by the embodiment of the present application is described below.
By way of example, fig. 6 shows a flow of a task processing method. Wherein the method may be performed by any one of the nodes in the foregoing primary cluster 210. As shown in fig. 6, the task processing method may include the steps of:
s601, acquiring an application programming interface API request sent by a user.
In this embodiment, the user may issue a request for purchasing a product through a user interface provided by the node, for example, when the user needs to purchase the product, the user may issue a request for purchasing the product on the user interface provided by the node, and so on. After the user issues the request, the node acquires the request. By way of example, the request issued by the user may be, but is not limited to, the API request described previously. By way of example, an API request may be for creating a resource, such as: the creation of a virtual machine or the like may be a request to purchase a product or service, such as a commodity such as clothing, or a request to pay money, such as initiating a payment transaction via a financial Application (APP).
S602, creating a target task according to the acquired API request, and storing task information of the target task into a database.
In this embodiment, after obtaining a request issued by a user, a node may create a task for the request through a preconfigured function, so as to obtain a target task. Illustratively, the target task is associated with an API request, e.g., when the API request is for creating a resource, the target task is the task that created the resource; when the API requests to purchase a certain product or service, the target task is a task of purchasing the certain product or service; when the API requests payment, the target task is the payment task.
Further, after the target task is obtained, the node may store task information of the target task into the database to complete persistence of the target task, so that when knowing that the target task fails to be processed, the node may acquire the task from the database and re-execute the task, or if the node fails, the node transfers the task to be executed to other nodes through the database, so as to ensure reliability of task processing. By way of example, task information of the target task may include a task type and/or task content, etc. For example, when the target task is a task of creating a virtual machine, the task information of the target task may include a size of a hard disk in the virtual machine, a type of an operating system, a type and parameters of a central processing unit, a network bandwidth, and the like; when the target task is a task of paying money, the task information of the target task may include the amount of money required to be paid, and the like.
S603, executing the target task and acquiring the execution state of the target task.
In this embodiment, after the target task is persisted into the database, the node may execute the target task and store the execution state of the target task into the database. For example, the node may obtain the execution status of the target task from the database, such as whether the execution was successful or failed. Of course, after the node finishes executing the target task, the node can also know the execution state of the target task, and the node does not need to acquire the execution state of the target task from the database.
In some embodiments, when an API request is used to request creation of a target resource, a node may create the target resource while executing the target task. For example, when an API request is used to request creation of a virtual machine, a node may create the virtual machine when executing a target task.
S604, when the execution state of the target task is failure, re-executing the target task according to the task information of the target task stored in the database.
In this embodiment, when the node acquires that the execution state of the target task is failure, the node may reacquire the task information of the target task from the database and re-execute the target task. In some embodiments, when the number of times of executing the target task reaches the preset number of times and the last execution state is failure, the alarm information may be output. Therefore, the user can know the condition of failure of resource processing, and the user can perform artificial repair.
In this way, after acquiring the API request, the node creates a task for the request and persists the task to the database, so that when the task processing fails, the task information of the task can be acquired again from the database and re-executed until the execution is successful. Therefore, the self-healing retried data (namely the task created for the API request) is added in the database, so that the node can retry the self-healing when the task execution fails, and the correctness of the result returned by the node is ensured.
In some embodiments, after executing the target task, the node may return the execution state of the target task obtained by executing the target task to the user, so that the user knows the execution condition of the task.
In some embodiments, when the nodes are multiple, each node may write its own heartbeat information to the database. The heartbeat information may be used to characterize whether a node is malfunctioning. Each node may obtain heartbeat information for other nodes from the database. When one node (hereinafter referred to as "node 1") acquires heartbeat information of another node (hereinafter referred to as "node 2"), if it is determined that the node 2 fails, the node 1 may acquire task information of M (M is greater than or equal to 1) tasks created by the node 2 and not executed or failed to be executed from the database, execute the M tasks according to the task information of the M tasks, and store execution results of each task of the M tasks in the database. Therefore, the situation that the task processing fails due to the node failure is avoided, and the reliability of the task processing is ensured.
Further, to avoid the task being repeatedly processed, for any task to be executed in the M tasks, the node 1 may determine that the execution state of any task to be executed stored in the database is failed or is not executed before executing the any task to be executed.
It should be understood that, the sequence number of each step in the foregoing embodiment does not mean the execution sequence, and the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
Based on the method in the above embodiment, the embodiment of the present application further provides a task processing device.
By way of example, fig. 7 shows a task processing device. The resource processing device may be, but is not limited to being, deployed in any one of the nodes in the foregoing primary cluster. As shown in fig. 7, the task processing device 700 includes: a request acquisition module 701, a task creation module 702, and a task processing module 703. The request acquisition module 701 is configured to acquire an application programming interface API request sent by a user. The task creation module 702 is configured to create a target task according to the API request, and store task information of the target task in the database. The task processing module 703 is configured to execute a target task and obtain an execution state of the target task; and re-executing the target task according to the task information of the target task stored in the database when the execution state of the target task is failure. Illustratively, the request acquisition module 701 may be, but is not limited to being, the request acquisition module 211 of fig. 2 described above, the task creation module 702 may be, but is not limited to being, the task creation module 212 of fig. 2 described above, and the task processing module 703 may be, but is not limited to being, the task processing module 214 of fig. 2 described above.
In some embodiments, the task processing module 703 is further configured to, after executing the target task: the execution state of the target task is stored in a database. The task processing module is specifically configured to, when acquiring an execution state of a target task: the execution state of the target task is obtained from the database.
In some embodiments, the task processing module 703 is further configured to: acquiring heartbeat information of the second node from the database, wherein the heartbeat information is used for representing whether the second node fails or not; when the second node fails, task information of at least one task which is created by the second node and is not executed or fails to be executed is obtained from a database; and executing at least one task according to the task information of the at least one task, and storing the execution state of each task in the at least one task into a database.
In some embodiments, the task processing module 703 is further configured to: and when the number of times of executing the target task reaches the preset number of times, and the last execution state is failure, outputting alarm information.
In some embodiments, when an API request is used to request creation of a target resource, the task processing module 703, when executing the target task, is specifically configured to: a target resource is created.
In some embodiments, the task processing module is further configured to: and returning the execution state of the target task to the user.
In some embodiments, the request acquisition module 701, the task creation module 702, and the task processing module 703 may all be implemented by software, or may be implemented by hardware. Illustratively, the implementation of the request acquisition module 701 is described next with reference to the request acquisition module 701. Similarly, the task creation module 702 and the task processing module 703 may refer to the implementation of the request acquisition module 701.
Module as an example of a software functional unit, the request acquisition module 701 may include code running on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, and a container, among others. Further, the above-described computing examples may be one or more. For example, the request acquisition module 701 may include code running on multiple hosts/virtual machines/containers. It should be noted that, multiple hosts/virtual machines/containers for running the code may be distributed in the same region (region), or may be distributed in different regions. Further, multiple hosts/virtual machines/containers for running the code may be distributed in the same availability zone (avai l abi l ity zone, AZ) or may be distributed in different AZs, each AZ comprising one data center or multiple geographically close data centers. Wherein typically a region may comprise a plurality of AZs.
Also, multiple hosts/virtual machines/containers for running the code may be distributed in the same virtual private cloud (vi rtua LPR IVATE C loud, VPC) or may be distributed in multiple VPCs. In general, one VPC is disposed in one region, and a communication gateway is disposed in each VPC for implementing inter-connection between VPCs in the same region and between VPCs in different regions.
Module as an example of a hardware functional unit, the request acquisition module 701 may include at least one computing device, such as a server or the like. Or the request acquisition module 701 may be a device implemented using an Application Specific Integrated Circuit (ASIC) or a programmable logic device (programmab le logic device, PLD), or the like. The PLD may be implemented as a complex program logic device (comp lex programmab le logica ldevice, CPLD), a field programmable gate array (fie ld-programmab LE GATE ARRAY, FPGA), a generic array logic (GENER IC ARRAY logic, GAL), or any combination thereof.
The multiple computing devices included in the request acquisition module 701 may be distributed in the same region or may be distributed in different regions. The plurality of computing devices included in the acquisition module 1001 may be distributed in the same AZ or may be distributed in different AZ. Likewise, multiple computing devices included in the request acquisition module 701 may be distributed in the same VPC or may be distributed among multiple VPCs. Wherein the plurality of computing devices may be any combination of computing devices such as servers, ASIC, PLD, CPLD, FPGA, and GAL.
It should be noted that, in other embodiments, the request obtaining module 701 may be configured to perform any step in the foregoing task processing method, the task creating module 702 may be configured to perform any step in the method provided in the foregoing embodiment, and the task processing module 703 may be configured to perform any step in the method provided in the foregoing embodiment; the steps of the request acquisition module 701, the task creation module 702 and the task processing module 703 that are responsible for implementation may be specified according to needs, and the different steps in the method provided in the above embodiment are implemented by the request acquisition module 701, the task creation module 702 and the task processing module 703 respectively to implement all the functions of the task processing device 700.
The present application also provides a computing device 800. As shown in fig. 8, a computing device 800 includes: bus 802, processor 804, memory 806, and communication interface 808. Communication between processor 804, memory 806, and communication interface 808 is via bus 802. Computing device 800 may be a server or a terminal device. It should be understood that the present application is not limited to the number of processors, memories in computing device 800.
Bus 802 may be a peripheral component interconnect (PER IPHERA L component interconnect, PCI) bus, or an extended industry standard architecture (extended industry STANDARD ARCH itecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one line is shown in fig. 8, but not only one bus or one type of bus. Bus 804 may include a pathway for transferring information among the various components of computing device 800 (e.g., memory 806, processor 804, communication interface 808).
The processor 804 may include any one or more of a central processing unit (centra l process ing un it, CPU), a graphics processor (graph ics process ing un it, GPU), a Microprocessor (MP), or a digital signal processor (D IGITA LS IGNA L processor, DSP).
Memory 806 may include volatile memory (vo l at i le memory), such as random access memory (random access memory, RAM). The processor 104 may also include non-volatile memory (non-vo l at i le memory), such as read-only memory (ROM), flash memory, mechanical hard disk (HARD D I SK DR IVE, HDD) or solid state disk (so L ID STATE DR IVE, SSD).
The memory 806 stores executable program codes, and the processor 804 executes the executable program codes to implement the functions of the request acquisition module 701, the task creation module 702, and the task processing module 703, respectively, so as to implement all or part of the steps of the method in the above embodiments. That is, the memory 806 has instructions stored thereon for performing all or part of the steps of the methods of the embodiments described above.
Or the memory 806 has stored therein executable code that is executed by the processor 804 to perform the functions of the task processing device 700 described above, respectively, to implement all or part of the steps in the methods of the embodiments described above. That is, the memory 806 has instructions stored thereon for performing all or part of the steps of the methods of the embodiments described above.
Communication interface 803 enables communication between computing device 800 and other devices or communication networks using a transceiver module such as, but not limited to, a network interface card, transceiver, etc.
The embodiment of the application also provides a computing device cluster. The cluster of computing devices includes at least one computing device. The computing device may be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device may also be a terminal device such as a desktop, notebook, or smart phone.
As shown in fig. 9, the cluster of computing devices includes at least one computing device 800. The same instructions for performing all or part of the steps in the method of the embodiment described above may be stored in memory 806 in one or more computing devices 800 in a cluster of computing devices.
In some possible implementations, some instructions for performing all or some of the steps of the methods of the embodiments described above may also be stored in the memory 806 of one or more computing devices 800, respectively, in the computing device cluster. In other words, a combination of one or more computing devices 800 may collectively execute instructions for performing all or part of the steps in the methods of the embodiments described above.
It should be noted that, the memories 806 in different computing devices 800 in the computing device cluster may store different instructions for performing part of the functions of the task processing device 700. That is, the instructions stored in the memory 806 of the different computing devices 800 may implement the functionality of one or more of the aforementioned request acquisition module 701, task creation module 702, task processing module 703.
In some possible implementations, one or more computing devices in a cluster of computing devices may be connected through a network. Wherein the network may be a wide area network or a local area network, etc.
Based on the method in the above embodiment, the embodiment of the present application provides a node. The node may include: at least one memory for storing a program; at least one processor for executing the programs stored in the memory. Wherein the processor is adapted to perform the methods of the above embodiments when the program stored in the memory is executed.
Based on the method in the above embodiment, the embodiment of the present application provides a computer-readable storage medium storing a computer program, which when executed on a processor, causes the processor to perform the method in the above embodiment.
Based on the method in the above embodiments, an embodiment of the present application provides a computer program product, which when run on a processor causes the processor to perform the method in the above embodiments.
It is to be appreciated that the processor in embodiments of the application may be a central processing unit (centra l process ing un it, CPU), but may also be other general purpose processors, digital signal processors (D IGITA L S IGNA L processors, DSPs), application Specific Integrated Circuits (ASICs), field programmable gate arrays (fie ld programmab LE GATE ARRAY, FPGAs) or other programmable logic devices, transistor logic devices, hardware components, or any combinations thereof. The general purpose processor may be a microprocessor, but in the alternative, it may be any conventional processor.
The method steps in the embodiments of the present application may be implemented by hardware, or may be implemented by executing software instructions by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access memory (random access memory, RAM), flash memory, read-only memory (ROM), programmable read-only memory (programmab le ROM, PROM), erasable programmable read-only memory (erasab le PROM, EPROM), electrically erasable programmable read-only memory (E LECTR ICA L LY EPROM, EEPROM), registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk (so L ID STATE D I SK, SSD)), etc.
It will be appreciated that the various numerical numbers referred to in the embodiments of the present application are merely for ease of description and are not intended to limit the scope of the embodiments of the present application.

Claims (16)

1. A method of task processing, applied to a first node, the method comprising:
acquiring an Application Programming Interface (API) request sent by a user;
Creating a target task according to the API request, and storing task information of the target task into a database;
Executing the target task and acquiring the execution state of the target task;
And when the execution state of the target task is failure, re-executing the target task according to the task information of the target task stored in the database.
2. The method of claim 1, wherein after the performing the target task, the method further comprises:
Storing the execution state of the target task into the database;
the obtaining the execution state of the target task includes:
and acquiring the execution state of the target task from the database.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
Acquiring heartbeat information of a second node from the database, wherein the heartbeat information is used for representing whether the second node has faults or not;
When the second node fails, task information of at least one task which is created by the second node and is not executed or fails to be executed is obtained from the database;
Executing the at least one task according to the task information of the at least one task, and storing the execution state of each task in the at least one task into the database.
4. A method according to any one of claims 1 to 3, wherein the method further comprises:
and when the number of times of executing the target task reaches the preset number of times, and the last execution state is failure, outputting alarm information.
5. The method of any of claims 1 to 4, wherein the API request is for requesting creation of a target resource, the performing the target task comprising:
and creating the target resource.
6. The method according to any one of claims 1 to 5, further comprising:
And returning the execution state of the target task to the user.
7. A task processing device, deployed at a first node, the device comprising:
The request acquisition module is used for acquiring an Application Programming Interface (API) request sent by a user;
the task creation module is used for creating a target task according to the API request and storing task information of the target task into a database;
the task processing module is used for executing the target task and acquiring the execution state of the target task; and re-executing the target task according to the task information of the target task stored in the database when the execution state of the target task is failure.
8. The apparatus of claim 7, wherein the task processing module, after performing the target task, is further configured to: storing the execution state of the target task into the database;
The task processing module is specifically configured to, when acquiring an execution state of the target task: and acquiring the execution state of the target task from the database.
9. The apparatus of claim 7 or 8, wherein the task processing module is further configured to:
Acquiring heartbeat information of a second node from the database, wherein the heartbeat information is used for representing whether the second node has faults or not;
When the second node fails, task information of at least one task which is created by the second node and is not executed or fails to be executed is obtained from the database;
Executing the at least one task according to the task information of the at least one task, and storing the execution state of each task in the at least one task into the database.
10. The apparatus according to any one of claims 7 to 9, wherein the task processing module is further configured to:
and when the number of times of executing the target task reaches the preset number of times, and the last execution state is failure, outputting alarm information.
11. The apparatus according to any one of claims 7 to 10, wherein the API request is for requesting creation of a target resource, and the task processing module, when executing the target task, is specifically configured to:
and creating the target resource.
12. The apparatus according to any one of claims 7 to 11, wherein the task processing module is further configured to:
And returning the execution state of the target task to the user.
13. A node, comprising:
At least one memory for storing a program;
At least one processor for executing the programs stored in the memory;
Wherein the processor is adapted to perform the method of any of claims 1-6 when the program stored in the memory is executed.
14. A cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory;
The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method of any of claims 1-6.
15. A computer readable storage medium storing a computer program which, when run on a processor, causes the processor to perform the method of any one of claims 1-6.
16. A computer program product, characterized in that the computer program product, when run on a processor, causes the processor to perform the method according to any of claims 1-6.
CN202211463984.8A 2022-11-22 2022-11-22 Task processing method, device and node Pending CN118093104A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211463984.8A CN118093104A (en) 2022-11-22 2022-11-22 Task processing method, device and node
PCT/CN2023/101285 WO2024109003A1 (en) 2022-11-22 2023-06-20 Task processing method and apparatus, and node

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211463984.8A CN118093104A (en) 2022-11-22 2022-11-22 Task processing method, device and node

Publications (1)

Publication Number Publication Date
CN118093104A true CN118093104A (en) 2024-05-28

Family

ID=91158931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211463984.8A Pending CN118093104A (en) 2022-11-22 2022-11-22 Task processing method, device and node

Country Status (2)

Country Link
CN (1) CN118093104A (en)
WO (1) WO2024109003A1 (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156198B (en) * 2015-04-22 2019-12-27 阿里巴巴集团控股有限公司 Task execution method and device based on distributed database
US10691558B1 (en) * 2016-09-22 2020-06-23 Amazon Technologies, Inc. Fault tolerant data export using snapshots
CN108334545B (en) * 2017-12-27 2021-09-03 微梦创科网络科技(中国)有限公司 Method and device for realizing asynchronous service
CN108304255A (en) * 2017-12-29 2018-07-20 北京城市网邻信息技术有限公司 Distributed task dispatching method and device, electronic equipment and readable storage medium storing program for executing
US11138033B1 (en) * 2018-08-24 2021-10-05 Amazon Technologies, Inc. Providing an application programming interface (API) including a bulk computing task operation
CN113515357B (en) * 2021-04-20 2024-03-08 建信金融科技有限责任公司 Method and device for executing batch tasks in cluster environment

Also Published As

Publication number Publication date
WO2024109003A1 (en) 2024-05-30

Similar Documents

Publication Publication Date Title
CN107771321B (en) Recovery in a data center
US7490179B2 (en) Device for, method of, and program for dynamically switching modes for writing transaction data into disk
US7536582B1 (en) Fault-tolerant match-and-set locking mechanism for multiprocessor systems
US20120011100A1 (en) Snapshot acquisition processing technique
KR102121157B1 (en) Use of nonce table to solve concurrent blockchain transaction failure
CN111125040A (en) Method, apparatus and storage medium for managing redo log
CN115408411A (en) Data writing method and device, electronic equipment and storage medium
CN113672350A (en) Application processing method and device and related equipment
CN109684048B (en) Method and device for processing transaction in transaction submitting system
US10726047B2 (en) Early thread return with secondary event writes
CN107391539B (en) Transaction processing method, server and storage medium
CN112559496A (en) Distributed database transaction atomicity realization method and device
CN111880908A (en) Distributed transaction processing method and device and storage medium
EP3389222B1 (en) A method and a host for managing events in a network that adapts event-driven programming framework
CN118093104A (en) Task processing method, device and node
US11422715B1 (en) Direct read in clustered file systems
US11669516B2 (en) Fault tolerance for transaction mirroring
US11500857B2 (en) Asynchronous remote calls with undo data structures
WO2018188959A1 (en) Method and apparatus for managing events in a network that adopts event-driven programming framework
CN108804214B (en) Asynchronous task scheduling method and device and electronic equipment
CN114722261A (en) Resource processing method and device, electronic equipment and storage medium
CN112162988A (en) Distributed transaction processing method and device and electronic equipment
CN114328374A (en) Snapshot method, device, related equipment and database system
US11442668B2 (en) Prioritizing volume accesses in multi-volume storage device based on execution path of a service
WO2024108348A1 (en) Method and system for eventual consistency of data types in geo-distributed active-active database systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication