CN113204413A

CN113204413A - Task processing method, device and equipment

Info

Publication number: CN113204413A
Application number: CN202010079161.XA
Authority: CN
Inventors: 范鹏飞; 金玲玲; 周军蕊
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-02-03
Filing date: 2020-02-03
Publication date: 2021-08-03

Abstract

The application provides a task processing method, a task processing device and a task processing device, wherein the method comprises the following steps: receiving an acceleration processing request sent by a client; processing the accelerated processing request to obtain accelerated processing parameters; generating a task to be processed according to the accelerated processing parameters; and sending the task to be processed to an accelerator so that the accelerator processes the task to be processed. Through the technical scheme of the application, the utilization rate of each accelerator can be improved.

Description

Task processing method, device and equipment

Technical Field

The present application relates to the field of internet technologies, and in particular, to a method, an apparatus, and a device for task processing.

Background

With the rapid development of AI (Artificial Intelligence), there are more and more AI accelerators, technology updating is faster and faster, and the usage amount of the AI accelerators in a data center is gradually increased. An AI accelerator, also called an AI chip or a computing card, is a module dedicated to handling a large number of computing tasks in artificial intelligence applications, and other non-computing tasks are still handled by a Central Processing Unit (CPU).

In the related art, an independent AI accelerator needs to be configured for each user, and the AI accelerator only processes the calculation task of the artificial intelligence application of the user, so that the problems of low utilization rate of the AI accelerator and the like exist.

Disclosure of Invention

The application provides a task processing method, which comprises the following steps:

receiving an acceleration processing request sent by a client;

processing the accelerated processing request to obtain accelerated processing parameters;

generating a task to be processed according to the accelerated processing parameter;

and sending the task to be processed to an accelerator so that the accelerator processes the task to be processed.

acquiring an acceleration processing parameter of an application program;

packaging the accelerated processing parameters to obtain an accelerated processing request;

and sending the acceleration processing request to a server side so that the server side generates a task to be processed according to the acceleration processing parameters, and sending the task to be processed to an accelerator for processing.

receiving an acceleration processing request sent by a client;

determining whether the accelerated processing request is a target accelerated processing request of a queue to be processed, wherein the target accelerated processing request is a next accelerated processing request of a last accelerated processing request in the queue to be processed;

if so, adding the accelerated processing request to the queue to be processed;

processing a first accelerated processing request in the queue to be processed based on the sequence of the accelerated processing requests in the queue to be processed to obtain accelerated processing parameters;

acquiring an acceleration processing parameter of an application program;

before a task processing result corresponding to the accelerated processing parameter is obtained, a predicted task processing result is returned to the application program, and the application program executes a next task according to the predicted task processing result;

adding the accelerated processing request to a queue to be processed;

and sequentially sending the acceleration processing requests in the queue to be processed to a server side based on the sequence of the acceleration processing requests in the queue to be processed, so that the server side generates tasks to be processed according to the acceleration processing requests, and sends the tasks to be processed to an accelerator for processing.

receiving an Artificial Intelligence (AI) processing request sent by a client;

processing the AI processing request to obtain an AI processing parameter;

generating an AI task to be processed according to the AI processing parameters;

and sending the AI task to be processed to an AI accelerator so that the AI accelerator processes the AI task to be processed to obtain a task processing result corresponding to the AI task to be processed.

The present application provides a task processing device, the device comprising:

the receiving module is used for receiving an acceleration processing request sent by a client;

the processing module is used for processing the accelerated processing request to obtain accelerated processing parameters;

the generating module is used for generating a task to be processed according to the accelerated processing parameter;

and the sending module is used for sending the task to be processed to an accelerator so that the accelerator processes the task to be processed.

the acquisition module is used for acquiring the accelerated processing parameters of the application program;

the processing module is used for packaging the accelerated processing parameters to obtain an accelerated processing request;

and the sending module is used for sending the acceleration processing request to a server so that the server generates a task to be processed according to the acceleration processing parameter and sends the task to be processed to an accelerator for processing.

The application provides a server side device, includes:

a processor and a machine-readable storage medium having stored thereon a plurality of computer instructions, the processor when executing the computer instructions performs:

receiving an acceleration processing request sent by a client;

The present application provides a client device comprising:

acquiring an acceleration processing parameter of an application program;

Based on the technical scheme, in the embodiment of the application, the accelerator is deployed at the server, and the server provides services for the plurality of clients, so that the plurality of users share the same accelerator, that is, each user uses the client to access the server, and then uses the resources of the accelerator. The method can improve the utilization rate of each accelerator, and is very suitable for application scenes with low utilization rates of the accelerators.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments of the present application or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings of the embodiments of the present application.

FIG. 1 is a schematic flow chart diagram of a task processing method in one embodiment of the present application;

FIG. 2 is a schematic flow chart diagram of a task processing method in one embodiment of the present application;

FIG. 3 is a schematic diagram of an application scenario in an embodiment of the present application;

FIG. 4 is a flowchart illustrating a task processing method according to an embodiment of the present application;

FIGS. 5A-5C are schematic diagrams of a timing sequence in an embodiment of the present application;

FIGS. 6A and 6B are schematic structural diagrams of a task processing device according to an embodiment of the present application;

fig. 7A is a schematic structural diagram of a server device in an embodiment of the present application;

fig. 7B is a schematic structural diagram of a client device in an embodiment of the present application.

Detailed Description

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein is meant to encompass any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Depending on the context, moreover, the word "if" as used may be interpreted as "at … …" or "when … …" or "in response to a determination".

The embodiment of the present application provides a task processing method, which may be applied to a server, where the server may deploy an accelerator (may also be referred to as a heterogeneous accelerator), and the accelerator is configured to provide a service for a plurality of clients, and referring to fig. 1, the method is a flowchart of the task processing method, and the method may include:

step 101, receiving an acceleration processing request sent by a client.

And 102, processing the acceleration processing request to obtain an acceleration processing parameter.

For example, the acceleration processing request may include, but is not limited to, an API (Application Programming interface) request, and the acceleration processing parameters may include, but is not limited to, API parameters. Based on this, API decapsulation processing may be performed on the accelerated processing request to obtain an accelerated processing parameter.

And 103, generating a task to be processed according to the accelerated processing parameter.

For example, a first library file of the application program may be obtained from a real accelerator library, and the to-be-processed task may be generated according to the first library file and the acceleration processing parameter. For example, the accelerated processing parameter is substituted into the API function in the first library file, and the task to be processed is obtained based on the API function and the accelerated processing parameter.

And 104, sending the task to be processed to the accelerator so that the accelerator processes the task to be processed.

The accelerator is illustratively a module that can process the task to be processed, and thus, after sending the task to be processed to the accelerator, the accelerator can process the task to be processed.

In a possible implementation manner, after step 104, a task processing result corresponding to the task to be processed may also be obtained, and API encapsulation processing is performed on the task processing result to obtain an accelerated processing response; the accelerated processing response may then be sent to the client.

In a possible implementation manner, after receiving the acceleration processing request sent by the client, it may also be determined whether the acceleration processing request is a target acceleration processing request of the pending queue, where the target acceleration processing request is a next acceleration processing request of a last acceleration processing request in the pending queue. If so, the accelerated processing request is added to the pending queue. If not, the acceleration processing request is added to the queue to be processed after the acceleration processing request is the target acceleration processing request of the queue to be processed.

For example, based on the order of the acceleration processing requests in the queue to be processed, the first acceleration processing request in the queue to be processed may be processed to obtain the acceleration processing parameter. After the first acceleration processing request in the queue to be processed is processed, the acceleration processing request may be deleted from the queue to be processed, that is, the first acceleration processing request in the queue to be processed changes, and the changed first acceleration processing request in the queue to be processed is processed to obtain an acceleration processing parameter, and so on.

For example, determining whether the accelerated processing request is a target accelerated processing request of the pending queue may include, but is not limited to: determining whether the accelerated processing request is a target accelerated processing request of the queue to be processed or not according to the sequence number value of the accelerated processing request and the count value of the global counter of the queue to be processed; wherein the count value represents a sequence number value of a last accelerated processing request in the pending queue.

In an example, the execution sequence is only an example given for convenience of description, and in practical applications, the execution sequence between steps may also be changed, and the execution sequence is not limited. Moreover, in other embodiments, the steps of the respective methods do not have to be performed in the order shown and described herein, and the methods may include more or less steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.

An embodiment of the present application provides a task processing method, which may be applied to a client, and as shown in fig. 2, is a flowchart of the task processing method, where the method may include:

step 201, acquiring an acceleration processing parameter of an application program.

Step 202, the acceleration processing parameter is encapsulated to obtain an acceleration processing request.

Illustratively, the accelerated processing parameters may include, but are not limited to, API parameters, and the accelerated processing request may include, but is not limited to, an API request. Based on this, the accelerated processing parameter is encapsulated to obtain an accelerated processing request, which may include but is not limited to: and acquiring a second library file of the application program from the function agent library, and carrying out API (application programming interface) packaging on the accelerated processing parameter according to the second library file to obtain an accelerated processing request.

For example, before the second library file of the application program is acquired from the function agent library, the API function information may be parsed from the first library file of the real accelerator library of the server, the second library file of the application program may be generated according to the API function information, and the second library file of the application program may be stored in the function agent library.

Step 203, sending the acceleration processing request to the server, so that the server generates a task to be processed according to the acceleration processing parameter, and sends the task to be processed to the accelerator for processing.

For example, after the acceleration processing request is sent to the server, an acceleration processing response returned by the server for the acceleration processing request may also be received; and then, performing API decapsulation processing on the accelerated processing response to obtain a task processing result, and returning the task processing result to the application program.

In a possible implementation manner, after the acceleration processing parameter of the application program is obtained, the predicted task processing result may be returned to the application program before the task processing result corresponding to the acceleration processing parameter is obtained, the application program executes the next task according to the predicted task processing result, and the application program can execute the next task without waiting for the task processing result corresponding to the acceleration processing parameter, so that the waiting time is reduced.

For example, sending the acceleration processing request to the server may include, but is not limited to: and determining a count value of a global counter of the queue to be processed, wherein the count value represents a sequence number value corresponding to the last accelerated processing request in the queue to be processed. And determining a sequence number value corresponding to the acceleration processing request (namely the currently obtained acceleration processing request) according to the count value. The sequence number value is then added to the accelerated processing request and the accelerated processing request is added to the pending queue. And sequentially sending each accelerated processing request to the server side based on the sequence of the accelerated processing requests in the queue to be processed. For example, a first acceleration processing request in the queue to be processed is sent first, and after the acceleration processing request is sent, the acceleration processing request may be deleted from the queue to be processed, that is, the first acceleration processing request in the queue to be processed changes, and the changed first acceleration processing request in the queue to be processed is sent, and so on.

Based on the same application concept as the method, another task processing method is also provided in the embodiment of the present application, and the method can be applied to a server, and the method can include the following steps:

and a step a1, receiving an acceleration processing request sent by the client.

Step a2, determining whether the accelerated processing request is a target accelerated processing request of the queue to be processed. Illustratively, the target accelerated processing request is the next accelerated processing request to the last accelerated processing request in the pending queue. If so, step a3 may be performed.

Step a3, add the accelerated processing request to the pending queue.

Step a4, processing the first acceleration processing request in the queue to be processed based on the sequence of the acceleration processing requests in the queue to be processed, and obtaining the acceleration processing parameters.

And a step a5, generating the task to be processed according to the accelerated processing parameter.

Step a6, sending the task to be processed to the accelerator, so that the accelerator processes the task to be processed.

Based on the same application concept as the method, another task processing method is also provided in the embodiment of the present application, and the method can be applied to a client, and the method can include the following steps:

and b1, acquiring the acceleration processing parameters of the application program.

And b2, before the task processing result corresponding to the accelerated processing parameter is obtained, returning the predicted task processing result to the application program, and executing the next task by the application program according to the predicted task processing result.

And step b3, packaging the acceleration processing parameters to obtain an acceleration processing request.

Step b4, add the accelerated processing request to the pending queue.

Step b5, based on the sequence of the acceleration processing requests in the queue to be processed, sequentially sending each acceleration processing request in the queue to be processed to the server, so that the server generates a task to be processed according to the acceleration processing request, and sends the task to be processed to the accelerator for processing.

and step c1, receiving an AI processing request sent by the client.

And c2, processing the AI processing request to obtain the AI processing parameters.

And c3, generating the AI task to be processed according to the AI processing parameters.

And c4, sending the AI task to be processed to the AI accelerator, so that the AI accelerator processes the AI task to be processed to obtain a task processing result corresponding to the AI task to be processed.

The above technical solution of the embodiment of the present application is described below with reference to specific application scenarios.

Referring to fig. 3, which is a schematic view of an application scenario of the embodiment of the present application, a Server 31(Server) is connected to a plurality of clients 32 (clients), and an accelerator 311 deployed by the Server 31 is used for providing services for the plurality of clients 32. Illustratively, the client 32 may be referred to as a front end and the server 31 may be referred to as a back end.

In the AI application scenario, accelerator 311 may be an AI accelerator (AI accelerator is also referred to as AI chip or AI computing card), which is a module dedicated to handling a large number of computing tasks in artificial intelligence applications. Other application scenarios may also implement the computing function through the accelerator 311, which is not limited in this regard.

Illustratively, the accelerator 311 may include, but is not limited to, a GPU (Graphics Processing Unit), or an FPGA (Field Programmable Gate Array), or a CPLD (Complex Programmable Logic Device), or an ASIC (Application Specific Integrated Circuit). Of course, the above are just a few examples of the accelerator 311, and the type of the accelerator 311 is not limited.

In the embodiment of the present application, the accelerator 311 is deployed at the server 31, and the server 31 provides services for the multiple clients 32, so that multiple users share the same accelerator 311, the utilization rate of the accelerator 311 is improved, and the method is very suitable for application scenarios with low accelerator utilization rates. For example, user 1 uses the computing resources of accelerator 311 in time interval 1, user 2 uses the computing resources of accelerator 311 in time interval 2, user 3 uses the computing resources of accelerator 311 in time interval 3, and so on.

For example, in a development scenario, a user generally needs to debug and modify a code in a running process, that is, an accelerator is not used all the time, so that multiple users can share the same accelerator 311 by deploying the accelerator 311 in the server 31, and thus the utilization rate of the accelerator 311 can be improved.

Referring to fig. 3, the client 32 may include an application 321, a function proxy library 322, and a network communication layer 323. The server 31 may include an accelerator 311, a network communication layer 312, an API service module 313, and a real accelerator library 314. As can be seen from fig. 3, the accelerator 311 and the real accelerator library 314 are deployed on the server 31, the application 321 is deployed on the client 32, and the client 32 and the server 31 are deployed on two different devices, where the client 32 and the server 31 are connected through a network, for example, the client 32 and the server 31 communicate through TCP (Transmission Control Protocol) or RDMA (Remote Direct Memory Access). Therefore, the accelerator 311, the real accelerator library 314 and the application 321 are not deployed on the same device, i.e., the accelerator 311 and the application 321 are separated, so that the accelerator 311 can provide services for a plurality of applications 321.

The application 321 is a program (e.g., Applications) for implementing related functions by using the accelerator 311, that is, the application 321 distributes tasks to the accelerator 311 for processing, so as to utilize the computing resources of the accelerator 311, for example, the application 321 may be a program for implementing an artificial intelligence application, without limitation. Obviously, since the application 321 and the accelerator 311 are not deployed on the same device, the accelerator 311 can provide services for a plurality of applications 321, and the utilization rate of the accelerator 311 is improved.

The real accelerator library 314 is configured to provide a library file including a large number of API functions (for convenience of distinction, the library file in the real accelerator library 314 is referred to as a first library file), that is, the real accelerator library 314 includes the first library file, and the first library file includes a large number of API functions, which are all functions related to the accelerator 311, that is, tasks that can be processed by the accelerator 311 are generated based on the API functions, so that resources of the accelerator 311 are used, and the content of the real accelerator library 314 is not limited.

The API function is a predefined function, and the operating system includes a large number of function libraries, such as a large service center, besides function kernels responsible for coordinating execution of the application program, memory allocation, system resource management, and the like, and calls various services (each service is a function) of the service center, thereby helping the application program to achieve the purposes of opening a window, drawing a graph, and using peripheral devices. For example, a set of APIs in a graphics library define the way pointers are drawn, which may be displayed on a graphical output device. When an application requires pointer functionality, it can be linked to this set of APIs at reference, compile time, and the runtime will call the implementation (library) of this API to display the pointer.

The Library file may be divided into a static Library and a DLL (Dynamic Link Library) file, the static Library is copied into the program at the Link stage of the program, the DLL file is not copied into the program at the Link stage of the program, but the program is dynamically loaded into the memory by the system at the time of running for the program to call.

Illustratively, the first library file in the real accelerator library 314 may be a DLL file.

In summary, API functions associated with the accelerator 311 may be predefined, stored in a DLL file, and stored in the real accelerator library 314.

The function Proxy Library 322 is a function Library Proxy layer (e.g., Accelerator Library Proxy) associated with the Accelerator 311, the function Proxy Library 322 is configured to provide a Library file including a large number of API functions (for convenience of differentiation, the Library file in the function Proxy Library 322 is referred to as a second Library file), that is, the function Proxy Library 322 includes a second Library file, and the second Library file may include a large number of API functions, which are functions associated with the Accelerator 311, that is, tasks that can be processed by the Accelerator 311 are generated based on the API functions, and the content of the function Proxy Library 322 is not limited.

Illustratively, the second library file is defined identically to the first library file, but the internal implementation of the second library file is different from that of the first library file, and therefore, from the perspective of the application 321, the first library file and the second library file cannot be distinguished, that is, the second library file can replace the first library file.

For example, the client 32 may obtain a first library file from the real accelerator library 314 of the server 31, parse the API function information from the first library file, generate a second library file of the application 321 according to the API function information, and store the second library file in the function agent library 322. For example, the client 32 may create a code generator, parse API function information from a first library file by the code generator, generate a second library file from the API function information, and store the second library file in the function proxy library 322.

For example, rather than performing the generation of the function proxy library 322 at runtime, the code generator may parse the real accelerator library 314 offline to generate the function proxy library 322. The function proxy library 322 is already added to the operating system in place of the real accelerator library 314 before the client's application starts.

Since the first library file includes the API functions related to the accelerator 311, the code generator may parse out API function information (e.g., definitions of APIs, etc.) related to the accelerator 311 from the first library file. The code generator generates a second library file according to the API function information, where the second library file also includes the API function related to the accelerator 311, and the generation process of the second library file is not limited. The API function in the second library file may be the same as the API function in the first library file, that is, the second library file may include all APIs in the first library file, and the API definitions may be the same, which is not limited thereto.

In summary, the generation of the function agent library 322 is automated by the code generator, i.e., the second library file is automatically generated by the code generator and stored in the function agent library 322.

The network communication layer 323 is located at the client 32 and is a network communication layer independent of the accelerator 311. The network communication layer 312 is located at the server 31 and is a network communication layer independent of the accelerator 311. The network communication layer 323 and the network communication layer 312 are used to implement the transmission of messages. For example, the network communication layer 323 can send messages for the client 32 to the network communication layer 312, which in turn sends messages to the server 31. The network communication layer 312 can send the message of the server 31 to the network communication layer 323 and then send the message to the client 32.

The API service module 313 is used to encapsulate or decapsulate API messages. For example, the API service module 313, upon receiving the API message sent by the client 32, may decapsulate the API message, obtain parameters related to the API function, and provide the parameters related to the API function to the real accelerator library 314. When receiving a task processing result to be sent to the client 32, the API service module 313 encapsulates the task processing result into an API message, and sends the API message to the client 32 through the network communication layer 312.

In the application scenario, another task processing method is proposed in the embodiment of the present application, and as shown in fig. 4, a flowchart of the task processing method is shown, where the method may include:

in step 401, the client 32 obtains the acceleration processing parameters (e.g., API parameters) of the application 321.

For example, when the application 321 needs to utilize the computing resources of the accelerator 311, the client 32 may be provided with an API parameter, without limitation, the API parameter can prompt the accelerator 311 to implement the relevant task, and the client 32 may obtain the acceleration processing parameter of the application 321.

Illustratively, the aforementioned accelerated processing parameters (e.g., API parameters) may include, but are not limited to, one or any combination of the following: the acceleration function library name, API (i.e., function) name, API attribute value, client ID, client process ID, client thread ID, etc., and the contents of the acceleration processing parameter are not limited.

In step 402, the client 32 encapsulates the API parameter to obtain an accelerated processing request (e.g., API request). For example, the client 32 may API-package the API parameters to obtain API requests.

Referring to the above embodiment, the function agent library 322 may include a second library file, and the second library file may include a large number of API functions, so that the client 32 may obtain the second library file from the function agent library 322, and perform API encapsulation on the API parameter according to the second library file to obtain the API request. For example, based on the API function in the second library file, the client 32 may perform API encapsulation on the API parameter by using the API function to obtain an API request, and the encapsulation manner is not limited and is related to the function of the API function.

For example, in step 401 and step 402, the application 321 may provide the API parameter to the function proxy library 322, and the function proxy library 322 encapsulates the API parameter to obtain the accelerated processing request. For example, the corresponding real acceleration function library may be loaded according to the name of the acceleration function library, and the corresponding API pointer may be obtained, the API may be called according to the API pointer, and finally the API request may be obtained according to the API.

In step 403, the client 32 sends the API request to the server 31.

Referring to the above embodiments, the client 32 may include the network communication layer 323, and the API request may be sent to the network communication layer 312 of the server 31 by the network communication layer 323 of the client 32.

In step 404, the server 31 receives the API request sent by the client 32.

For example, the network communication layer 312 of the server 31 receives the API request sent by the client 32.

In step 405, the service end 31 performs API decapsulation processing on the API request to obtain API parameters.

Referring to the foregoing embodiment, the service end 31 may include the API service module 313, and may send the API request to the API service module 313, and when receiving the API message, the API service module 313 may decapsulate the API message to obtain API parameters, that is, API parameters of the application 321 in step 401.

In step 406, the server 31 generates a task to be processed according to the API parameter.

For example, the server 31 may obtain a first library file of the application program from the real accelerator library 314, and generate the to-be-processed task according to the first library file and the API parameter. For example, API parameters may be provided to the real accelerator library 314, and the real accelerator library 314 substitutes the API parameters into the API functions in the first library file, thereby obtaining the to-be-processed tasks based on the API functions and the API parameters.

Illustratively, since the real accelerator library 314 includes a first library file, and the first library file includes a plurality of API functions, which are all functions related to the accelerator 311, and a task that can be processed by the accelerator 311 is generated based on the API functions, after the API parameters are substituted into the API functions in the first library file, a task to be processed that can be processed by the accelerator 311 can be generated.

In step 407, the server 31 sends the task to be processed to the accelerator 311.

In step 408, the accelerator 311 processes the task to be processed.

Illustratively, the accelerator 311 is a module capable of processing the task to be processed, and therefore, after sending the task to be processed to the accelerator 311, the accelerator 311 can process the task to be processed.

Optionally, after step 408, the following steps (not shown in the figure) are further included:

in step 409, the service end 31 obtains a task processing result corresponding to the task to be processed.

In step 410, the service end 31 performs API encapsulation processing on the task processing result to obtain an API response.

Referring to the foregoing embodiment, the server 31 may include the API service module 313, and may send the task processing result to the API service module 313, and when receiving the task processing result, the API service module 313 may perform encapsulation processing on the task processing result to obtain an API response (i.e., an accelerated processing response).

In step 411, the server 31 sends the API response to the client 32.

Referring to the above embodiment, the server 31 may include the network communication layer 312, and the API response may be sent to the network communication layer 323 of the client 32 by the network communication layer 312 of the server 31.

In step 412, the client 32 receives an API response returned by the server 31 for the API request.

For example, the network communication layer 323 of the client 32 receives the API response returned by the server 31.

In step 413, the client 32 performs API decapsulation processing on the API response to obtain a task processing result.

Referring to the above embodiment, the function agent library 322 may include a second library file, and the second library file may include a large number of API functions, so that the client 32 may obtain the second library file from the function agent library 322, and perform API decapsulation processing on the API response according to the second library file, to obtain a task processing result. For example, based on the API function in the second library file, the client 32 may perform API decapsulation processing on the API response by using the API function to obtain a task processing result, and the decapsulation method is not limited thereto.

In step 414, the client 32 returns the task processing result to the application 321.

In one possible implementation, referring to fig. 5A, the execution order of the API requests may be serial execution (i.e., synchronous execution), and the serial execution process of the API requests is described below with reference to fig. 5A.

Suppose that the client starts the CPU thread 10 and the CPU thread 11, the server starts the CPU thread 20 and the CPU thread 21, the CPU thread 20 corresponds to the CPU thread 10, and the CPU thread 21 corresponds to the CPU thread 11. The CPU thread 10 generates an API request 0 and sends the API request 0 to the server, and the CPU thread 20 on the server corresponding to the CPU thread 10 can process the API request 0, that is, obtain the task to be processed K0 according to the API request 0, and then process the task to be processed K0 by the accelerator 311.

After the K0 processing is completed (when the CPU thread 20 returns the task processing result to the client and the task processing result is transmitted to the application, indicating that the K0 processing is completed), the CPU thread 11 may generate an API request 1 and send the API request 1 to the server, and the CPU thread 21 obtains the task to be processed K1 according to the API request 1, and the accelerator 311 processes the K1. After the K1 processing is completed, the CPU thread 10 may generate an API request 2 and send the API request 2 to the server, and the CPU thread 20 obtains the task to be processed K2 according to the API request 2, and the accelerator 311 processes K2. After the K2 processing is completed, the CPU thread 11 may generate an API request 3 and send the API request 3 to the server, the CPU thread 21 obtains a task to be processed K3 according to the API request 3, the accelerator 311 processes K3, and so on.

In the above manner, for each API request, after the client sends the API request to the server, the client processes the idle state, and returns from the API call until the task to be processed is completed and the task processing result is returned, returns the execution right to the user program, and then the user program calls the next API request.

In another possible implementation, referring to fig. 5B, the execution sequence of the API request may be parallel execution (i.e., asynchronous execution), and the parallel execution process of the API request is described below with reference to fig. 5B.

Unlike fig. 5A, after the CPU thread 10 sends the API request 0 to the server, the predicted task processing result can be returned to the application immediately, and in order to distinguish from the task processing result in fig. 5A, the task processing result is referred to as a predicted task processing result, that is, a predicted task processing result generated by the CPU thread 10 according to the current state of the CPU thread 10 instead of the task processing result returned by the server.

When the CPU thread 10 returns the predicted task processing result to the application, the accelerator 311 may not process the K0. At this point, CPU thread 10 may continue to process API request 2, thereby reducing the latency of the API request. The CPU thread 21 on the server obtains the task K1 to be processed according to the API request 1, and the accelerator 311 processes the task K1.

The CPU thread 11 may return the predicted task processing result to the application immediately after sending the API request 1 to the server, and the CPU thread 10 may generate the API request 2 and send the API request 2 to the server after returning the predicted task processing result to the application. The CPU thread 20 on the server can obtain the task to be processed K2 according to the API request 2, and the accelerator 311 processes the task K2.

The CPU thread 10 may return the predicted task processing result to the application immediately after sending the API request 2 to the server, and the CPU thread 11 may generate the API request 3 and send the API request 3 to the server after returning the predicted task processing result to the application. The CPU thread 21 on the server can obtain the task to be processed K3 according to the API request 3, and the accelerator 311 processes the task K3, and so on.

If the result executed by the accelerator 311 is inconsistent with the predicted result, the notification may be returned through a subsequent API, or periodically queried by the client, or an active message pushing mechanism from the server to the client, to notify that an execution error occurs somewhere in the client, where the notification is not limited.

In the above manner, the API request may be changed to asynchronous execution, after the client sends the API request to the server, the client does not need to call and return from the API after the task to be processed is executed and the task processing result is returned, but sends the API request to the server, and then directly returns from the API call, returns the execution right to the user program, and calls the next API request by the user program, thereby avoiding processing the idle state for a long time. Therefore, the execution of the client and the server can be asynchronous, a certain network delay is hidden, and the overall execution time is reduced. In summary, when the API returns, the accelerator 311 may not have processed the task to be processed and the CPU may proceed to execute the next API request. Thus, API requests may be changed to execute asynchronously, thereby reducing overall execution time.

However, as shown in fig. 5C, in the case of unstable or congested network, the order in which the API requests are received by the servers corresponding to the CPU threads may change, so that the execution order of the accelerators changes, and the function may be incorrect. For example, the CPU thread 10 sends the API request 0 to the server, the CPU thread 11 sends the API request 1 to the server, the CPU thread 10 sends the API request 2 to the server, and the CPU thread 11 sends the API request 3 to the server. The server receives the API request 0, then receives the API request 1, receives the API request 3, and finally receives the API request 2. Thus, the accelerator 311 processes K0 corresponding to API request 0, then processes K1 corresponding to API request 1, then processes K3 corresponding to API request 3, and finally processes K2 corresponding to API request 2.

Obviously, since the receiving order of the API request 2 and the API request 3 is changed, that is, the API request 3 is received first, and then the API request 2 is received, the accelerator 311 is caused to process the K3 first and then the K2, that is, the execution order of the K2 and the K3 is changed, which may cause a service processing error.

For example, in order to solve the above problem, the following manner may be adopted in the embodiment of the present application:

mode 1: the client side sends the data in parallel, and the server side ensures the correct serialization of the queue a to be processed (the number of the queues a to be processed can be multiple, and then one queue a to be processed is taken as an example). That is, the client CPU thread 10 sends an API request to the server, via the server CPU thread 20 and then to the queue a to be processed, and the client CPU thread 11 sends an API request to the server, via the server CPU thread 21 to the queue a to be processed.

For example, the client maintains a global counter for the queue a to be processed, and in the subsequent process, the initial count value of the global counter may be 0 (of course, other values may also be used, taking 0 as an example), and each time 1 API request is added to the queue a to be processed, the count value of the global counter is increased by 1. Each request sent to the server contains a count value at the time of transmission.

After obtaining the API request 0, the CPU thread 10 determines the count value of the global counter of the queue a to be processed, where the count value of the global counter is 0. Then, the count value 0 is used as a sequence number value corresponding to the API request 0, and the sequence number value 0 is added to the API request 0 and sent to the server. Then, the count value of the counter is increased by 1, i.e., the count value becomes 1.

After obtaining the API request 1, the CPU thread 11 determines the count value of the global counter of the queue a to be processed, where the count value of the global counter is 1. Then, the count value 1 is used as a sequence number value corresponding to the API request 1, and the sequence number value 1 is added to the API request 1 and sent to the server. Then, the count value of the counter is increased by 1, i.e., the count value becomes 2.

After obtaining the API request 2, the CPU thread 10 determines the count value of the global counter of the queue a to be processed, where the count value of the global counter is 2. Then, the count value 2 is used as a sequence number value corresponding to the API request 2, and the sequence number value 2 is added to the API request 2 and sent to the server. Then, the count value of the counter is increased by 1, i.e., the count value becomes 3.

After obtaining the API request 3, the CPU thread 11 determines the count value of the global counter of the queue a to be processed, where the count value of the global counter is 3. Then, the count value 3 is used as a sequence number value corresponding to the API request 3, and the sequence number value 3 is added to the API request 3 and transmitted to the server. Then the counter value of the counter is increased by 1, i.e. the counter value becomes 4, and so on.

In summary, based on the sequence of each API request in the queue a to be processed, the API request 0 is first sent to the server, then the API request 1 is sent to the server, then the API request 2 is sent to the server, and finally the API request 3 is sent to the server. Further, assume that the server receives API request 0, then receives API request 1, then receives API request 3, and finally receives API request 2.

Illustratively, the server also maintains a global counter for the queue a to be processed, and the initial count value of the global counter may be 0 (of course, other values are also possible, taking 0 as an example, in the subsequent process, the count value of the global counter is increased by 1 each time 1 API request is added to the queue a to be processed).

After receiving the API request 0, first determining whether the API request 0 is a target API request of the queue a to be processed, for example, determining whether the API request 0 is a target API request of the queue a to be processed according to a sequence number value of the API request 0 and a count value of a global counter of the queue a to be processed. Obviously, since the count value of the server global counter is 0 and the sending end sequence number value included in the API request 0 is also 0, it may be determined that the API request 0 is the target API request of the pending queue a. API request 0 may then be added to pending queue a.

After receiving the API request 1, first, it is determined whether the API request 1 is a target API request of the queue a to be processed. Since the count value of the global counter is 1 (the count value represents the sequence number value of the last API request in the queue a to be processed), and the sequence number value of the API request 1 is 1, it can be determined that the API request 1 is the target API request of the queue a to be processed. API request 1 may then be added to pending queue a.

After receiving the API request 3, first, it is determined whether the API request 3 is a target API request of the queue a to be processed. Since the count value of the global counter is 2 and the sequence number value of the API request 3 is 3, that is, the difference between the sequence number value 3 of the API request 3 and the count value 2 of the global counter is 1, it may be determined that the API request 3 is not the target API request of the pending queue a, and the API request 3 is not added to the pending queue a.

After receiving the API request 2, first, it is determined whether the API request 2 is a target API request of the queue a to be processed. Since the count value of the global counter is 2 and the sequence number value of the API request 2 is 2, that is, the sequence number value 2 of the API request 2 is equal to the count value 2 of the current server global counter, the API request 2 is a target API request of the queue a to be processed, and the API request 2 may be added to the queue a to be processed.

Further, after the API request 2 is added to the queue a to be processed, the sequence number value 3 of the API request 3 is equal to the count value 3 of the global counter, so that the API request 3 is determined as the target API request of the queue a to be processed, and the API request 3 is added to the queue a to be processed.

In summary, the sequence of each API request in the queue a to be processed is: API request 0, API request 1, API request 2, and API request 3, that is, although API request 3 is received first and API request 2 is received later, API request 2 may be ordered before API request 3 based on the sequence number value of API request 2 and the sequence number value of API request 3, thereby ensuring the correct order relationship of API requests.

Based on the sequence of each API request in the queue a to be processed, the task K0 to be processed can be obtained according to the API request 0, the task K1 to be processed is obtained according to the API request 1, the task K2 to be processed is obtained according to the API request 2, and the task K3 to be processed is obtained according to the API request 3, to sum up, the accelerator 311 processes K0, then K1, then K2, and finally K3.

In summary, the client may maintain a global counter for each queue to be processed, and multiple CPU threads share the global counter, so that lock-free synchronization among the CPU threads is realized by using atomic (atomic), and overhead is reduced. The server side can also maintain a global counter for each queue to be processed, a plurality of CPU threads share the global counter, and lockless synchronization among the CPU threads is realized by using atomic.

Mode 2: the client side maintains a queue b to be processed, and the processing queue corresponds to a global lock, so that the read-write consistency protection of the queue is ensured. In addition, a dedicated send thread CPU thread tx is created for the pending queue b, which is created in the function proxy layer, in contrast to the CPU thread 10/11 created by the application. Correspondingly, the server only creates a dedicated receiving CPU thread tr for the queue b to be processed. Neither the client nor the server will create multiple CPU threads for pending queue b.

The CPU thread 10/11 from the application adds API request 0, API request 1, API request 2, and API request 3 to the pending queue b in sequence: the CPU thread 10 acquires the global lock, adds API request 0 to pending queue b, and then releases the lock. The CPU thread 11 acquires the global lock, adds the API request 1 to the pending queue b, and then releases the lock. The CPU thread 10 acquires the global lock, adds the API request 2 to the pending queue b, and then releases the lock. The CPU thread 11 acquires the global lock, adds the API request 3 to the pending queue b, then releases the lock, and so on.

And the dedicated sending thread tx acquires the global lock, takes out the API request 0 from the queue b to be processed, releases the global lock, and then sends the API request 0 to the server. And the dedicated sending thread tx acquires the global lock, takes out the API request 1 from the queue b to be processed, releases the global lock, sends the API request 1 to the server, and the like.

Based on the sending sequence, the network protocol will ensure the sequence consistency on the single network connection link, so the server receiving process tr will receive the requests in the same sequence and then send to the real function library.

In summary, the client may create a corresponding CPU thread for the queue to be processed, where the CPU thread manages the queue to be processed, and when other CPU threads need to send an API request to the server, the API request is not directly sent to the server thread corresponding to the client, but is sent to the queue to be processed, and the CPU thread dedicated to the queue to be processed sends the API request to the server. The method is to complete the merging of the queues to be processed among a plurality of CPU threads at the client, so that the server does not have the problem of time sequence competition.

Based on the same application concept as the method, an embodiment of the present application further provides a task processing device, applied to a server, as shown in fig. 6A, which is a structural diagram of the device, where the device includes:

a receiving module 611, configured to receive an acceleration processing request sent by a client;

a processing module 612, configured to process the accelerated processing request to obtain an accelerated processing parameter;

a generating module 613, configured to generate a to-be-processed task according to the accelerated processing parameter; a sending module 614, configured to send the to-be-processed task to an accelerator, so that the accelerator processes the to-be-processed task.

Illustratively, the accelerated processing request comprises an application programming interface, API, request, and the accelerated processing parameters comprise API parameters; the processing module 612 is specifically configured to: and performing API (application programming interface) decapsulation processing on the accelerated processing request to obtain the accelerated processing parameters.

The generating module 613 is specifically configured to: acquiring a first library file of an application program from a real accelerator library; and generating a task to be processed according to the first library file and the accelerated processing parameter.

The processing module 612 is further configured to: acquiring a task processing result corresponding to the task to be processed; performing API packaging processing on the task processing result to obtain an accelerated processing response;

the sending module 614 is further configured to: and sending the accelerated processing response to the client.

Based on the same application concept as the method, an embodiment of the present application further provides a task processing device applied to a client, as shown in fig. 6B, which is a structural diagram of the device, where the device includes:

an obtaining module 621, configured to obtain an acceleration processing parameter of an application; a processing module 622, configured to encapsulate the accelerated processing parameter to obtain an accelerated processing request; a sending module 623, configured to send the acceleration processing request to a server, so that the server generates a to-be-processed task according to the acceleration processing parameter, and sends the to-be-processed task to an accelerator for processing.

The accelerated processing parameters comprise API parameters, and the accelerated processing requests comprise API requests; the processing module 622 is specifically configured to: acquiring a second library file of the application program from the function agent library; and performing API packaging on the accelerated processing parameters according to the second library file to obtain an accelerated processing request.

The processing module 622 is further configured to: before acquiring a second library file from the function agent library, analyzing API function information from a first library file of a real accelerator library of a server; generating a second library file of the application program according to the API function information; and storing the second library file into a function agent library.

Illustratively, the processing module 622 is further configured to: receiving an acceleration processing response returned by the server end aiming at the acceleration processing request; performing API decapsulation processing on the accelerated processing response to obtain a task processing result; and returning the task processing result to the application program.

Based on the same application concept as the method, an embodiment of the present application further provides a server device, including: a processor and a machine-readable storage medium having stored thereon a plurality of computer instructions, the processor when executing the computer instructions performs:

receiving an acceleration processing request sent by a client;

Further, an embodiment of the present application further provides a machine-readable storage medium, where a plurality of computer instructions are stored on the machine-readable storage medium; the computer instructions when executed perform the following:

receiving an acceleration processing request sent by a client;

Based on the same application concept as the method, an embodiment of the present application further provides a client device, including: a processor and a machine-readable storage medium having stored thereon a plurality of computer instructions, the processor when executing the computer instructions performs:

acquiring an acceleration processing parameter of an application program;

Referring to fig. 7A, which is a structural diagram of a server device provided in the embodiment of the present application, the server device may include: a processor 711, a network interface 712, a bus 713, and a memory 714. The memory 714 may be any electronic, magnetic, optical, or other physical storage device that may contain or store information such as executable instructions, data, and the like. For example, the memory 714 may be: RAM (random Access Memory), volatile Memory, non-volatile Memory, flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., a compact disk, a dvd, etc.).

Referring to fig. 7B, which is a structural diagram of a client device proposed in the embodiment of the present application, the client device may include: a processor 721, a network interface 722, a bus 723, and a memory 724. The memory 724 may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the memory 724 may be: RAM (random Access Memory), volatile Memory, non-volatile Memory, flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., a compact disk, a dvd, etc.).

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method for processing a task, the method comprising:

receiving an acceleration processing request sent by a client;

2. The method of claim 1, wherein the accelerated processing request comprises an Application Programming Interface (API) request, and wherein the accelerated processing parameters comprise API parameters;

the processing the accelerated processing request to obtain accelerated processing parameters includes:

and performing API (application programming interface) decapsulation processing on the accelerated processing request to obtain the accelerated processing parameters.

3. The method of claim 1,

the generating of the task to be processed according to the accelerated processing parameter comprises:

acquiring a first library file of an application program from a real accelerator library;

and generating a task to be processed according to the first library file and the accelerated processing parameter.

4. The method of claim 1, wherein after sending the task to be processed to an accelerator so that the accelerator processes the task to be processed, the method further comprises:

acquiring a task processing result corresponding to the task to be processed;

performing API packaging processing on the task processing result to obtain an accelerated processing response;

and sending the accelerated processing response to the client.

5. The method according to any one of claims 1 to 4,

after receiving the request for accelerated processing sent by the client, the method further includes:

if so, adding the accelerated processing request to the queue to be processed;

and processing the first accelerated processing request in the queue to be processed based on the sequence of the accelerated processing requests in the queue to be processed to obtain accelerated processing parameters.

6. The method of claim 5,

the determining whether the accelerated processing request is a target accelerated processing request of a queue to be processed includes:

determining whether the accelerated processing request is a target accelerated processing request of the queue to be processed according to the sequence number value of the accelerated processing request and the count value of the global counter of the queue to be processed; wherein the count value represents a sequence number value of a last accelerated processing request in the queue to be processed.

7. A method for processing a task, the method comprising:

acquiring an acceleration processing parameter of an application program;

8. The method of claim 7, wherein the accelerated processing parameters comprise Application Programming Interface (API) parameters, and wherein the accelerated processing request comprises an API request;

the encapsulating the accelerated processing parameter to obtain an accelerated processing request includes:

acquiring a second library file of the application program from the function agent library;

and performing API packaging on the accelerated processing parameters according to the second library file to obtain an accelerated processing request.

9. The method of claim 8,

before the obtaining the second library file of the application program from the function agent library, the method further includes:

analyzing API function information from a first library file of a real accelerator library of a server;

generating a second library file of the application program according to the API function information;

and storing the second library file of the application program into a function agent library.

10. The method of claim 8,

after the sending the accelerated processing request to the server, the method further includes:

receiving an acceleration processing response returned by the server end aiming at the acceleration processing request;

performing API decapsulation processing on the accelerated processing response to obtain a task processing result;

and returning the task processing result to the application program.

11. The method according to any one of claims 7 to 10,

after the obtaining of the accelerated processing parameters of the application program, the method further includes:

and returning a predicted task processing result to the application program before obtaining a task processing result corresponding to the accelerated processing parameter, and executing a next task by the application program according to the predicted task processing result.

12. The method of claim 11,

the sending the accelerated processing request to a server includes:

determining a count value of a global counter of a queue to be processed, wherein the count value represents a sequence number value corresponding to the last accelerated processing request in the queue to be processed;

determining a sequence number value corresponding to the accelerated processing request according to the count value;

adding the sequence number value to the accelerated processing request;

adding the accelerated processing request to a queue to be processed;

and sending the accelerated processing request to the server side based on the sequence of the accelerated processing requests in the queue to be processed.

13. A method for processing a task, the method comprising:

receiving an acceleration processing request sent by a client;

if so, adding the accelerated processing request to the queue to be processed;

14. A method for processing a task, the method comprising:

acquiring an acceleration processing parameter of an application program;

adding the accelerated processing request to a queue to be processed;

15. A method for processing a task, the method comprising:

receiving an Artificial Intelligence (AI) processing request sent by a client;

processing the AI processing request to obtain an AI processing parameter;

16. A task processing apparatus, characterized in that the apparatus comprises:

17. A task processing apparatus, characterized in that the apparatus comprises:

18. A server-side device, comprising:

receiving an acceleration processing request sent by a client;

19. A client device, comprising:

acquiring an acceleration processing parameter of an application program;