WO2021143397A1 - Resource reuse method, apparatus and device based on gpu virtualization - Google Patents

Resource reuse method, apparatus and device based on gpu virtualization Download PDF

Info

Publication number
WO2021143397A1
WO2021143397A1 PCT/CN2020/134523 CN2020134523W WO2021143397A1 WO 2021143397 A1 WO2021143397 A1 WO 2021143397A1 CN 2020134523 W CN2020134523 W CN 2020134523W WO 2021143397 A1 WO2021143397 A1 WO 2021143397A1
Authority
WO
WIPO (PCT)
Prior art keywords
resource
call request
api call
framework layer
calculation
Prior art date
Application number
PCT/CN2020/134523
Other languages
French (fr)
Chinese (zh)
Inventor
赵军平
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021143397A1 publication Critical patent/WO2021143397A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation

Definitions

  • This application relates to the field of computer technology, and in particular to a resource reuse method, device and equipment based on GPU virtualization.
  • GPU Graphics Processing Unit
  • More and more artificial intelligence technologies are beginning to be implemented based on GPUs.
  • GPU virtualization technology came into being.
  • different artificial intelligence (Artificial Intelligence, AI) tasks can be used to share resources on one or more GPUs to perform calculations.
  • This safe and efficient GPU resource management method is being used by more and more users.
  • GPU virtualization technology is currently used to execute AI tasks, the operating efficiency of AI tasks based on GPU virtualization technology needs to be improved.
  • the embodiments of the present application provide a resource reuse method, device, and equipment based on GPU virtualization, which are used to improve the operating efficiency when executing AI tasks based on GPU virtualization technology.
  • the embodiment of this specification provides a resource reuse method based on GPU virtualization, which is applied to the client in the GPU virtualization system, including: obtaining the AI framework layer The sent first API call request for creating the first resource; determine the memory address where the pre-stored data matching the first resource is located; the data matching the first resource includes information about the The setting parameters of the first resource; feedback to the AI framework layer the memory address where the data matching the first resource is located; obtain the information sent by the AI framework layer for setting the first resource The second API call request; feedback to the AI framework layer a message indicating that the setting is successful; obtain the third API call request sent by the AI framework layer for calculation based on the first resource; based on the first resource Three API call requests, generating a first calculation instruction for the first resource; sending the first calculation instruction and the data matching the first resource to a GPU driver.
  • the embodiment of this specification provides a resource multiplexing device based on GPU virtualization, which is applied to a client in a GPU virtualization system and includes: a first acquisition module for acquiring the first resource sent by the AI framework layer for creating the first resource The first API call request; the first determining module is used to determine the memory address where the pre-stored data matching the first resource is located; the data matching the first resource includes information about the first resource A resource setting parameter; a first feedback module, used to feed back to the AI framework layer the memory address where the data matching the first resource is located; a second acquisition module, used to acquire the AI framework layer The second API call request for setting the first resource is sent; the second feedback module is used to feed back a message indicating that the setting is successful to the AI framework layer; the third acquisition module is used to obtain the The third API call request sent by the AI framework layer for calculation based on the first resource; the first calculation instruction generation module is configured to generate the first resource for the first resource based on the third API call request A calculation instruction; a first sending module, configured to send
  • An embodiment of this specification provides a resource multiplexing device based on GPU virtualization, including: at least one processor; and, a memory communicatively connected to the at least one processor; wherein the memory stores the memory that can be used by the An instruction executed by at least one processor, the instruction being executed by the at least one processor, so that the at least one processor can: obtain the first API call request sent by the AI framework layer for creating the first resource; determine The pre-stored memory address where the data that matches the first resource is located; the data that matches the first resource includes the setting parameters for the first resource; and the data is fed back to the AI framework layer.
  • the client can feed back the information sent by the AI framework layer in advance.
  • the first API call request and the second API call request and there is no need to send the first API call request and the second API call request to the GPU driver for processing, so that the AI framework layer does not need to wait for the GPU driver to respond to the first API call request and the second API call request.
  • the processing result of the API call request to reduce the time consumption of the AI framework layer waiting for the request response, thereby improving the execution efficiency of the AI task.
  • FIG. 1 is a schematic diagram of an application scenario of a resource reuse method based on GPU virtualization in an embodiment of this specification
  • FIG. 2 is a schematic flowchart of a resource reuse method based on GPU virtualization provided by an embodiment of this specification
  • FIG. 3 is a schematic diagram of a scenario in which an address pointer of data corresponding to a second resource is written into a queue provided in an embodiment of this specification;
  • FIG. 4 is a schematic structural diagram of a resource multiplexing device based on GPU virtualization corresponding to FIG. 2 provided by an embodiment of this specification;
  • FIG. 5 is a schematic structural diagram of a resource multiplexing device based on GPU virtualization corresponding to FIG. 2 provided by an embodiment of this specification.
  • FIG. 1 is a schematic diagram of an application scenario of a resource reuse method based on GPU virtualization provided by an embodiment of this specification.
  • the AI framework layer 1011 and the client 1012 of the GPU virtualization system can be mounted on the user's terminal device 101.
  • the AI framework layer 1011 can be used to build various modules (for example, convolutional neural network CNN, recurrent neural network RNN, long and short-term memory network LSTM, generative adversarial network GAN, etc.), and control various models on the CPU or GPU run.
  • the AI framework layer can be implemented using TensorFlow, PyTorch, or Caffe2.
  • the client 1012 of the GPU virtualization system can interact with the server 102 of the GPU virtualization system and the GPU 104 for providing resources, so as to realize the discovery, application, access, and built-in optimization of virtual GPU resources.
  • the client 1012 of the GPU virtualization system can also record the resources and status information required in an iteration cycle of the model, and reuse the recorded resources and status information to reduce the number of API call requests sent to the GPU driver 1041 in the GPU104 The number of times.
  • the GPU 104 may include a GPU driver 1041 and GPU hardware 1042.
  • the GPU driver 1041 may respond to the API call request sent by the client 1012.
  • the GPU hardware 1042 can be implemented using nvidia P100GPU, NVIDIA Tesla V100, and GeForce GTX 1080.
  • the server 102 of the GPU virtualization system can be used to be responsible for GPU services and virtualization management. Specifically, the server 102 of the GPU virtualization system can divide and pre-allocate virtual GPU resources according to the configuration strategy, and save the mapping between virtual and physical resources. Relationship, and report the GPU resource call request to the GPU resource scheduler 103.
  • the GPU resource scheduler 103 can respond to the GPU resource call request, and implement the scheduling and allocation of resources in the GPU 104. In practical applications, the GPU resource scheduler 103 can be implemented by using K8S or Kubemaker.
  • the inventor found through research that when AI tasks use algorithms such as Deep Neural Networks (DNN), the operators required to perform each round of iterative operations are usually the same. Therefore, it is possible to obtain and cache the resources and status information required by each operator in an iteration cycle of the AI task, and to reuse these resources and status information during each iteration cycle, thereby greatly reducing API call operation to optimize the operating efficiency when executing AI tasks based on GPU virtualization technology.
  • DNN Deep Neural Networks
  • FIG. 2 is a schematic flowchart of a resource reuse method based on GPU virtualization provided by an embodiment of this specification. From a program point of view, the execution subject of the process can be the client in the GPU virtualization system. As shown in FIG. 2, the process may include step 202 to step 206.
  • Step 202 Obtain a first API call request sent by the AI framework layer for creating a first resource.
  • the first API call request may be used to create a resource descriptor corresponding to the first resource.
  • the first API call request may include multiple API call requests for synchronous calls.
  • the first API call request may include an API call request for creating an input data descriptor and an output data descriptor.
  • Step 204 Determine the memory address where the pre-stored data matching the first resource is located; the data matching the first resource includes setting parameters for the first resource.
  • the terminal device where the client 101 of the GPU virtualization system is located may pre-store data matching the first resource, and the data matching the first resource may be a resource for the first resource.
  • the data generated after the properties of the descriptor are set. Therefore, when the client 101 receives the first API call request for creating the first resource sent by the AI framework layer, the client 101 does not need to send the first API call request to the GPU driver to establish data corresponding to the first resource. , And only need to determine the memory address where the pre-stored data matching the first resource is located.
  • the first N iterative cycles executed after the AI task is started are usually the warm-up phase.
  • AI tasks can use the warm-up phase of the AI task to build calculation graphs, allocate resources, and find the best operator.
  • the client 101 of the GPU virtualization system may obtain and store data matching the first resource by means of the warm-up phase, so as to facilitate subsequent use.
  • Step 206 Feed back to the AI framework layer the memory address where the data matching the first resource is located.
  • the message that the resource descriptor corresponding to the first resource is successfully created can be fed back to the AI framework layer.
  • the message can be Carry the memory address where the determined data matching the first resource is located.
  • Step 208 Obtain a second API call request sent by the AI framework layer for setting the first resource.
  • the second API call request may include multiple synchronously called API call requests, and the second API call request may be used to request the shape, filling, data type, and data type of the resource descriptor corresponding to the first resource.
  • Setting of attributes such as data alignment.
  • Step 210 Feed back a message indicating that the setting is successful to the AI framework layer.
  • the user's terminal device pre-stores the data matching the first resource, that is, the data generated after setting the attribute of the resource descriptor of the first resource. Therefore, when the client 101 of the GPU virtualization system obtains the second API call request for setting the first resource, it does not need to send the second API call request to the GPU driver for processing, but can directly feed back to the AI framework layer. It is used to indicate a message that the attribute of the resource descriptor of the first resource is successfully set. This can reduce the time that the AI framework layer waits for the GPU driver to feed back the response result of the second API call request, thereby helping to improve the execution efficiency of the AI task.
  • Step 212 Obtain a third API call request sent by the AI framework layer for calculation based on the first resource.
  • Step 214 Generate a first calculation instruction for the first resource based on the third API call request.
  • Step 216 Send the first calculation instruction and the data matching the first resource to the GPU driver.
  • the client 101 of the GPU virtualization system may send the data matching the first resource (that is, the data generated after setting the attribute of the resource descriptor of the first resource) to the GPU driver.
  • the GPU driver can configure the resource and execute the calculation task according to the received first calculation instruction for the first resource and the data matching the first resource. Therefore, there is no need to send the first API call request and the second API call request of the AI framework layer to the GPU driver.
  • the client can feed back the first API call request and the second API call sent by the AI framework layer in advance Request, and there is no need to send the first API call request and the second API call request to the GPU driver for processing, so that the AI framework layer does not need to wait for the GPU driver to feedback the processing results of the first API call request and the second API call request, so as to reduce The time it takes for the AI framework layer to wait for a response to a request, thereby improving the efficiency of AI task execution.
  • the data matching the first resource can be obtained in the warm-up phase after the AI task is started, and stored, so as to be used in the subsequent iterative process.
  • the embodiment of this specification provides an implementation manner for obtaining data matching the first resource in the warm-up phase of the AI task.
  • step 202 it may also include the following steps: in the warm-up phase after the AI task is started, obtaining a fourth API call request sent by the AI framework layer for creating a second resource; creating the second resource, Store the data corresponding to the second resource in the memory address; feed back the memory address to the AI framework layer; obtain the fifth API sent by the AI framework layer for setting the second resource Call request; based on the fifth API call request, set the data in the memory address; feedback to the AI framework layer a message indicating that the setting is successful; obtain the message sent by the AI framework layer to be based on the A sixth API call request for the second resource to perform calculations; based on the sixth API call request, a second calculation instruction for the second resource is generated; and the second calculation instruction is sent to a GPU driver.
  • the AI framework layer will generate API call requests for creating resources, setting resources, and performing calculations for each operator in the AI task, so as to make the GPU virtualization system work.
  • the client generates and stores the resource setting parameters corresponding to each operator.
  • the client of the GPU virtualization system may, after obtaining the fourth API call request sent by the AI framework layer for creating the second resource, respond to the fourth API call request, in the memory of the device where the client is located Create data corresponding to the second resource, and determine the storage address of the data as a memory address where the data corresponding to the second resource is located.
  • the client of the GPU virtualization system may send a message that carries the memory address, indicating that the creation is successful, to the API framework layer.
  • the client of the GPU virtualization system When the client of the GPU virtualization system receives the fifth API call request for setting the second resource sent by the AI framework layer, the client can respond to the fifth API call request to check the storage location of the client.
  • the data corresponding to the second resource in the device is set, the setting parameters of the second resource are obtained, and a message indicating that the setting is successful is fed back to the AI framework layer.
  • the determined pre-stored and the first resource may be the same as the memory address where the data corresponding to the second resource determined by the client of the GPU virtualization system is located.
  • the fourth API call request and the first API call request may also be the same.
  • the fifth API call request and the second API call request may also be the same.
  • the client of the GPU virtualization system may also send the fourth API call request and the fifth API call request to the GPU driver, so that the GPU driver generates data corresponding to the second resource (that is, includes data corresponding to the second resource).
  • the second resource setting parameter data so as to facilitate the GPU driver to execute the second calculation instruction.
  • the AI framework Since the data corresponding to the second resource generated by the GPU driver is usually stored in the GPU cache, in order to avoid the occupation of the GPU cache, when the GPU driver successfully responds to the second calculation instruction, the AI framework will send the instruction to the client.
  • the instruction is used to delete the data corresponding to the second resource in the GPU cache.
  • the following step may be further included: obtaining the second API call request sent by the AI framework layer for the second resource A seventh API call request for deleting data corresponding to the resource; retaining the data corresponding to the second resource; and feeding back a message indicating that the deletion is successful to the AI framework layer.
  • the client of the GPU virtualization system may send the seventh API call request sent by the AI framework layer to delete the data corresponding to the second resource to the GPU driver, so that the GPU driver can The data corresponding to the second resource in the GPU cache is deleted.
  • the device where the client of the GPU virtualization system is located also stores data corresponding to the second resource, in order to reuse the data corresponding to the second resource stored in the device where the client is located in the subsequent iteration process, the GPU virtual After receiving the seventh API call request, the client of the chemical system will reserve the data corresponding to the second resource stored in the device where the client is located. And the message used to indicate the success of the deletion is fed back to the AI framework layer, so that the AI framework layer can process other operators in the AI task.
  • the client of the GPU virtualization system by making the client of the GPU virtualization system reserve the data corresponding to the second resource in the device where it is located, the data corresponding to the second resource (that is, matching the first resource) can be realized.
  • the data is pre-stored in order to facilitate the subsequent execution of iterative calculations in AI tasks.
  • the client of the GPU virtualization system can feed back to the AI framework layer a message indicating the success of the deletion according to the storage information of the data corresponding to the second resource on the device where it is located, without the AI framework layer waiting for the GPU driver to respond to the seventh API Invoke the processing result of the request, which can reduce the waiting time of the AI framework layer, which is conducive to improving the execution efficiency of the AI task.
  • the following step may be further included: determining that the fourth API call request corresponds to The calculation flow; write the address pointer of the storage address of the data corresponding to the second resource into the queue corresponding to the calculation flow.
  • stream computing since one or more stream computing tasks (stream computing) are usually used to implement AI tasks, the stream computing tasks have strict requirements on the execution order of each operator. Therefore, in the warm-up phase after the AI task is started, the execution order of each operator included in a complete iterative process in the AI task can be determined first, and the flow computing task corresponding to each operator can be determined. Then, according to the determined execution order of each operator and the corresponding flow calculation task, the address pointer of the storage address of the resource setting parameter corresponding to each operator is written into the queue of the flow calculation task corresponding to each operator, In order to facilitate the subsequent iterative process.
  • FIG. 3 is a schematic diagram of a scenario in which an address pointer of a storage address of a resource setting parameter is written into a queue provided in an embodiment of this specification.
  • the circular queue 301 contains various operators that need to be executed in a complete iteration process. Among them, the operator in position 3011 in the circular queue 301 (that is, OP3) is the current flow computing task being executed. Operator.
  • the first queue 302 contains address pointers of storage addresses of resource setting parameters corresponding to OP1 and OP2.
  • the second queue 303 contains the address pointer of the storage address of the resource setting parameter corresponding to OP3. It can be seen that the stream computing tasks corresponding to OP1 and OP2 are the same, and the stream computing tasks corresponding to OP1 (or OP2) and OP3 are different.
  • the AI framework layer can request to perform the calculation of the OP4 operator.
  • the operator at position 3012 in the circular queue 301 ie OP4
  • the stream computing task corresponding to the OP4 operator ie, computing flow
  • the flow calculation task (ie, calculation flow) corresponding to the OP3 operator is the same.
  • the address pointer of the storage address of the resource setting parameter corresponding to the OP4 operator can also be written into the second queue, so as to obtain the updated second queue 304.
  • step 204: determining the memory address where the pre-stored data matching the first resource is located may specifically include the following steps: determining the calculation flow corresponding to the first API call request; The address pointer stored at the head of the queue is read from the queue corresponding to the calculation flow; the address pointer points to the memory address where the data matching the first resource is located.
  • the operator corresponding to the resource to be created for the first API call request can be determined, and the calculation flow (ie, stream computing task) corresponding to the determined operator can be used as the first API call request The corresponding calculation flow.
  • the address pointer of the storage address of the resource setting parameter corresponding to each operator is written into the calculation corresponding to each operator In the queue of the flow, therefore, the address pointer of the memory address where the data matching the first resource is located can be read from the queue of the calculation flow corresponding to the first API call request.
  • the address pointer can be deleted, and the address pointer can be rewritten to the end of the queue, so that the head of the queue is stored
  • the address pointer is the address pointer required for the subsequent operation of the calculation flow corresponding to the queue.
  • the address pointer stored at the head of the queue may also include: deleting the address pointer from the head of the queue; writing the address pointer to the queue The end of the team.
  • the address pointer stored at the head of the queue is deleted, and the address pointer is written to the end of the queue, thereby facilitating
  • the client of the GPU virtualization system performs the iterative calculation in the AI task, it sets the parameters for the reuse of pre-stored resources.
  • the read principle of the queue is "first in, first out". That is, the data that is first written to the queue will be read first in order.
  • the storage order of the address pointers in the queue is stored sequentially according to the steps in each iteration. Therefore, after the address pointer is stored in the queue method, when the subsequent multiplexing is performed, it is only necessary to read the first address pointer from the queue when a new round of iterative calculation process starts, and in the same round of iterative process , When it is necessary to read the address pointers of the storage addresses corresponding to the multiplexed data in the order of execution of the steps in the subsequent, it is only necessary to read the address pointers stored in the queue in sequence according to the sequence. This can simplify the processing of the mapping relationship between the sequence of steps and the multiplexed data.
  • step 202 before obtaining the first API call request sent by the AI framework layer for creating the first resource, it may further include: judging whether the current round of iterative process has been calculated, and obtaining the judgment result.
  • Step 201 Obtain the first API call request sent by the AI framework layer for creating the first resource, which may specifically include: when the judgment result indicates that the current round of iterative process calculation is completed, obtaining the API call request sent by the AI framework layer Create the first API call request for the first resource.
  • a round of iterative process in the AI task may refer to the use of forward propagation algorithm (Forward propagation algorithm) and back propagation algorithm (Backpropagation algorithm).
  • the neural network model is processed for one round.
  • the embodiment of this specification provides an implementation method for judging whether the calculation of the current round of iterative process is completed.
  • the first address pointer corresponding to the storage address of the output result of the first layer of the model in the AI task can be recorded; in the backward gradient propagation process, the second address pointer corresponding to the storage address of the current input data can be monitored; It is determined whether the second address pointer is the same as the first address pointer.
  • the first layer of the model in the AI task is convolution calculation.
  • the input when calculating the gradient of the last convolution should be the same as the output of the first layer of the model. Therefore,
  • the second address pointer corresponding to the storage address of the input data currently used when obtaining the gradient can be compared with the first address pointer corresponding to the storage address of the output result of the first layer of the model in the AI task. If they are the same, it means After the calculation of the current iteration process is completed, the operators in the next iteration process can be calculated, that is, step 202 can be executed to ensure the correct operation of the iteration loop in the AI task.
  • the embodiment of this specification provides a resource reuse method based on GPU virtualization.
  • the pre-stored resource setting data can be reused so that the client of the GPU virtualization system can process the AI framework first.
  • a CNN model for example, AlexNet
  • the GPU virtualization-based resource reuse method provided by the embodiment of this specification can be deployed flexibly, can support bare metal, container, or virtual machine operation at the same time, is cloud-friendly, and has good applicability.
  • FIG. 4 is a schematic structural diagram of a resource multiplexing device based on GPU virtualization corresponding to FIG. 2 provided by an embodiment of this specification, and the device can be applied to a client in a GPU virtualization system.
  • the device may include the following modules.
  • the first obtaining module 402 is configured to obtain the first API call request sent by the AI framework layer for creating the first resource.
  • the first determining module 404 is configured to determine a memory address where the pre-stored data matching the first resource is located; the data matching the first resource includes setting parameters for the first resource.
  • the first feedback module 406 is configured to feed back to the AI framework layer the memory address where the data matching the first resource is located.
  • the second obtaining module 408 is configured to obtain a second API call request sent by the AI framework layer for setting the first resource.
  • the second feedback module 410 is configured to feed back a message indicating that the setting is successful to the AI framework layer.
  • the third obtaining module 412 is configured to obtain a third API call request sent by the AI framework layer for calculation based on the first resource.
  • the first calculation instruction generating module 414 is configured to generate a first calculation instruction for the first resource based on the third API call request.
  • the first sending module 416 is configured to send the first calculation instruction and the data matching the first resource to the GPU driver.
  • the GPU virtualization-based resource multiplexing device pre-stores the setting parameters for the first resource, so that the client in the GPU virtualization system can feed back the first API call request sent by the AI framework layer and The second API call request, and there is no need to send the first API call request and the second API call request to the GPU driver for processing. Therefore, the AI framework layer does not need to wait for the GPU driver to respond to the first API call request and the second API call request. Process the results to reduce the time it takes for the AI framework layer to wait for the request response, thereby improving the execution efficiency of the AI task.
  • the device may further include: a fourth acquisition module, configured to acquire a fourth API call request sent by the AI framework layer for creating a second resource in a warm-up phase after the AI task is started.
  • the creation module is configured to create the second resource and store the data corresponding to the second resource in the memory address.
  • the third feedback module is configured to feed back the memory address to the AI framework layer.
  • the fifth acquisition module is configured to acquire the fifth API call request sent by the AI framework layer for setting the second resource.
  • the setting module is configured to set the data in the memory address based on the fifth API call request.
  • the fourth feedback module is used to feed back a message indicating that the setting is successful to the AI framework layer.
  • the sixth acquisition module is configured to acquire the sixth API call request sent by the AI framework layer for calculation based on the second resource.
  • the second calculation instruction generation module is configured to generate a second calculation instruction for the second resource based on the sixth API call request.
  • the second sending module is configured to send the second calculation instruction to the GPU driver.
  • the device may further include: a seventh obtaining module, configured to obtain a seventh API call request sent by the AI framework layer for deleting data corresponding to the second resource.
  • the reservation module is used to reserve the data corresponding to the second resource.
  • the fifth feedback module is used to feed back a message indicating successful deletion to the AI framework layer.
  • the first determining module 404 may be specifically used to determine the calculation flow corresponding to the first API call request.
  • the address pointer stored at the head of the queue from the queue corresponding to the calculation stream; the address pointer points to the memory address where the data matching the first resource is located.
  • the device may further include: a deletion module, configured to delete the address pointer from the head of the queue.
  • the first writing module is used to write the address pointer to the end of the queue.
  • the device may further include: a second determining module, configured to determine the calculation flow corresponding to the fourth API call request.
  • the second writing module is configured to write the address pointer of the storage address of the data corresponding to the second resource into the queue corresponding to the calculation stream.
  • the device may further include: a judgment module for judging whether the calculation of the current round of iterative process is completed, and obtaining the judgment result.
  • the first obtaining module is specifically configured to obtain the first API call request sent by the AI framework layer for creating the first resource when the judgment result indicates that the calculation of the current iteration process is completed.
  • the judgment module may be specifically used to record the first address pointer corresponding to the storage address of the output result of the first layer of the model in the AI task.
  • the second address pointer corresponding to the storage address of the current input data is monitored. It is determined whether the second address pointer is the same as the first address pointer.
  • the embodiment of this specification also provides a device corresponding to the above method.
  • FIG. 5 is a schematic structural diagram of a resource multiplexing device based on GPU virtualization corresponding to FIG. 2 provided by an embodiment of this specification.
  • the device 500 may include: at least one processor 510; and, a memory 530 communicatively connected to the at least one processor; wherein, the memory 530 stores data that can be executed by the at least one processor 510
  • the instruction 520 is executed by the at least one processor 510, so that the at least one processor 510 can: obtain the first API call request sent by the AI framework layer for creating the first resource; determine to store in advance The memory address where the data that matches the first resource is located; the data that matches the first resource includes the setting parameters for the first resource; and the data that matches the first resource is fed back to the AI framework layer.
  • the GPU virtualization-based resource multiplexing device pre-stores the setting parameters for the first resource, so that the client in the GPU virtualization system carried in the device can first feed back the first data sent by the AI framework layer.
  • One API call request and the second API call request and there is no need to send the first API call request and the second API call request to the GPU driver for processing. Therefore, the AI framework layer does not need to wait for the GPU driver’s feedback on the first API call request and the first API call request. 2.
  • the processing result of the API call request to reduce the time consumption of the AI framework layer waiting for the request response, thereby improving the execution efficiency of the AI task.
  • the improvement of a technology can be clearly distinguished between hardware improvements (for example, improvements in circuit structures such as diodes, transistors, switches, etc.) or software improvements (improvements in method flow).
  • hardware improvements for example, improvements in circuit structures such as diodes, transistors, switches, etc.
  • software improvements improvements in method flow.
  • the improvement of many methods and processes of today can be regarded as a direct improvement of the hardware circuit structure.
  • Designers almost always get the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be realized by the hardware entity module.
  • a programmable logic device for example, a Field Programmable Gate Array (Field Programmable Gate Array, FPGA)
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • HDL Hardware Description Language
  • ABEL Advanced Boolean Expression Language
  • AHDL Altera Hardware Description Language
  • HDCal JHDL
  • Lava Lava
  • Lola MyHDL
  • PALASM RHDL
  • VHDL Very-High-Speed Integrated Circuit Hardware Description Language
  • Verilog Verilog
  • the controller can be implemented in any suitable manner.
  • the controller can take the form of, for example, a microprocessor or a processor and a computer-readable medium storing computer-readable program codes (such as software or firmware) executable by the (micro)processor. , Logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers. Examples of controllers include but are not limited to the following microcontrollers: ARC625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as part of the memory control logic.
  • controllers in addition to implementing the controller in a purely computer-readable program code manner, it is entirely possible to program the method steps to make the controller use logic gates, switches, application specific integrated circuits, programmable logic controllers, and embedded logic.
  • the same function can be realized in the form of a microcontroller or the like. Therefore, such a controller can be regarded as a hardware component, and the devices included in it for realizing various functions can also be regarded as a structure within the hardware component. Or even, the device for realizing various functions can be regarded as both a software module for realizing the method and a structure within a hardware component.
  • a typical implementation device is a computer.
  • the computer may be, for example, a personal computer, a laptop computer, a cell phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Any combination of these devices.
  • the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • the computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-permanent memory in a computer-readable medium, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM).
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology.
  • Information can be computer-readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
  • This application may be described in the general context of computer-executable instructions executed by a computer, such as a program module.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • This application can also be practiced in distributed computing environments. In these distributed computing environments, tasks are performed by remote processing devices connected through a communication network. In a distributed computing environment, program modules can be located in local and remote computer storage media including storage devices.

Abstract

Disclosed in embodiments of the present specification are a resource reuse method, apparatus and device based on GPU virtualization. The scheme comprises: pre-storing a setting parameter of a first resource in a client, so that the client can locally process a first API calli request sent by an AI framework layer and used for creating the first resource and a second API calling request for setting the first resource, and a GPU driver does not need to be forwarded; and enabling the client to send a generated first calculation instruction for the first resource and the pre-stored setting parameter of the first resource to the GPU driver when obtaining a third API calling request sent by the AI framework layer and used for performing calculation on the basis of the first resource, thereby executing an AI task by utilizing a GPU virtualization technology.

Description

一种基于GPU虚拟化的资源复用方法、装置及设备Method, device and equipment for resource reuse based on GPU virtualization 技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种基于GPU虚拟化的资源复用方法、装置及设备。This application relates to the field of computer technology, and in particular to a resource reuse method, device and equipment based on GPU virtualization.
背景技术Background technique
图形处理器(Graphics Processing Unit,GPU)是一种可以用于对图像和图形进行高效计算处理的微处理器。越来越多的人工智能技术开始基于GPU而实现。为对GPU资源进行合理分配,GPU虚拟化技术应运而生。使用GPU虚拟化技术之后,可以让不同的人工智能(Artificial Intelligence,AI)任务去共享一个或多个GPU上的资源执行计算。这种安全且高效的GPU资源管理方式,被越来越多的用户使用。但是,目前利用GPU虚拟化技术执行AI任务时,基于GPU虚拟化技术执行AI任务时的运行效率还有待提高。Graphics Processing Unit (GPU) is a microprocessor that can be used for efficient calculation and processing of images and graphics. More and more artificial intelligence technologies are beginning to be implemented based on GPUs. In order to allocate GPU resources reasonably, GPU virtualization technology came into being. After using GPU virtualization technology, different artificial intelligence (Artificial Intelligence, AI) tasks can be used to share resources on one or more GPUs to perform calculations. This safe and efficient GPU resource management method is being used by more and more users. However, when GPU virtualization technology is currently used to execute AI tasks, the operating efficiency of AI tasks based on GPU virtualization technology needs to be improved.
发明内容Summary of the invention
有鉴于此,本申请实施例提供了一种基于GPU虚拟化的资源复用方法、装置及设备,用于提高基于GPU虚拟化技术执行AI任务时的运行效率。In view of this, the embodiments of the present application provide a resource reuse method, device, and equipment based on GPU virtualization, which are used to improve the operating efficiency when executing AI tasks based on GPU virtualization technology.
为解决上述技术问题,本说明书实施例是这样实现的:本说明书实施例提供的一种基于GPU虚拟化的资源复用方法,应用于GPU虚拟化系统中的客户端,包括:获取AI框架层发送的用于创建第一资源的第一API调用请求;确定预先存储的与所述第一资源相匹配的数据所在的内存地址;所述与所述第一资源相匹配的数据包含对于所述第一资源的设置参数;向所述AI框架层反馈所述与所述第一资源相匹配的数据所在的内存地址;获取所述AI框架层发送的用于对所述第一资源进行设置的第二API调用请求;向所述AI框架层反馈用于表示设置成功的消息;获取所述AI框架层发送的用于基于所述第一资源进行计算的第三API调用请求;基于所述第三API调用请求,生成针对所述第一资源的第一计算指令;将所述第一计算指令以及所述与所述第一资源相匹配的数据发送至GPU驱动。In order to solve the above technical problems, the embodiment of this specification is implemented as follows: The embodiment of this specification provides a resource reuse method based on GPU virtualization, which is applied to the client in the GPU virtualization system, including: obtaining the AI framework layer The sent first API call request for creating the first resource; determine the memory address where the pre-stored data matching the first resource is located; the data matching the first resource includes information about the The setting parameters of the first resource; feedback to the AI framework layer the memory address where the data matching the first resource is located; obtain the information sent by the AI framework layer for setting the first resource The second API call request; feedback to the AI framework layer a message indicating that the setting is successful; obtain the third API call request sent by the AI framework layer for calculation based on the first resource; based on the first resource Three API call requests, generating a first calculation instruction for the first resource; sending the first calculation instruction and the data matching the first resource to a GPU driver.
本说明书实施例提供的一种基于GPU虚拟化的资源复用装置,应用于GPU虚拟化系统中的客户端,包括:第一获取模块,用于获取AI框架层发送的用于创建第一资源的第一API调用请求;第一确定模块,用于确定预先存储的与所述第一资源相匹配的数据所在的内存地址;所述与所述第一资源相匹配的数据包含对于所述第一资源的设置 参数;第一反馈模块,用于向所述AI框架层反馈所述与所述第一资源相匹配的数据所在的内存地址;第二获取模块,用于获取所述AI框架层发送的用于对所述第一资源进行设置的第二API调用请求;第二反馈模块,用于向所述AI框架层反馈用于表示设置成功的消息;第三获取模块,用于获取所述AI框架层发送的用于基于所述第一资源进行计算的第三API调用请求;第一计算指令生成模块,用于基于所述第三API调用请求,生成针对所述第一资源的第一计算指令;第一发送模块,用于将所述第一计算指令以及所述与所述第一资源相匹配的数据发送至GPU驱动。The embodiment of this specification provides a resource multiplexing device based on GPU virtualization, which is applied to a client in a GPU virtualization system and includes: a first acquisition module for acquiring the first resource sent by the AI framework layer for creating the first resource The first API call request; the first determining module is used to determine the memory address where the pre-stored data matching the first resource is located; the data matching the first resource includes information about the first resource A resource setting parameter; a first feedback module, used to feed back to the AI framework layer the memory address where the data matching the first resource is located; a second acquisition module, used to acquire the AI framework layer The second API call request for setting the first resource is sent; the second feedback module is used to feed back a message indicating that the setting is successful to the AI framework layer; the third acquisition module is used to obtain the The third API call request sent by the AI framework layer for calculation based on the first resource; the first calculation instruction generation module is configured to generate the first resource for the first resource based on the third API call request A calculation instruction; a first sending module, configured to send the first calculation instruction and the data matching the first resource to the GPU driver.
本说明书实施例提供的一种基于GPU虚拟化的资源复用设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:获取AI框架层发送的用于创建第一资源的第一API调用请求;确定预先存储的与所述第一资源相匹配的数据所在的内存地址;所述与所述第一资源相匹配的数据包含对于所述第一资源的设置参数;向所述AI框架层反馈所述与所述第一资源相匹配的数据所在的内存地址;获取所述AI框架层发送的用于对所述第一资源进行设置的第二API调用请求;向所述AI框架层反馈用于表示设置成功的消息;获取所述AI框架层发送的用于基于所述第一资源进行计算的第三API调用请求;基于所述第三API调用请求,生成针对所述第一资源的第一计算指令;将所述第一计算指令以及所述与所述第一资源相匹配的数据发送至GPU驱动。An embodiment of this specification provides a resource multiplexing device based on GPU virtualization, including: at least one processor; and, a memory communicatively connected to the at least one processor; wherein the memory stores the memory that can be used by the An instruction executed by at least one processor, the instruction being executed by the at least one processor, so that the at least one processor can: obtain the first API call request sent by the AI framework layer for creating the first resource; determine The pre-stored memory address where the data that matches the first resource is located; the data that matches the first resource includes the setting parameters for the first resource; and the data is fed back to the AI framework layer. The memory address where the data matching the first resource is located; obtain the second API call request sent by the AI framework layer for setting the first resource; feedback to the AI framework layer to indicate Set up a successful message; obtain a third API call request sent by the AI framework layer for calculation based on the first resource; generate a first calculation for the first resource based on the third API call request Instruction; sending the first calculation instruction and the data matching the first resource to the GPU driver.
本说明书实施例采用的上述至少一个技术方案能够达到以下有益效果:通过在GPU虚拟化系统中的客户端中预先存储对于第一资源的设置参数,令该客户端可以先行反馈AI框架层发送的第一API调用请求以及第二API调用请求,且无需将第一API调用请求以及第二API调用请求发送至GPU驱动进行处理,使得AI框架层无需等待GPU驱动反馈对于第一API调用请求与第二API调用请求的处理结果,以减少AI框架层等待请求响应的耗时,进而提升AI任务的执行效率。The above-mentioned at least one technical solution adopted in the embodiment of this specification can achieve the following beneficial effects: by pre-storing the setting parameters for the first resource in the client in the GPU virtualization system, the client can feed back the information sent by the AI framework layer in advance. The first API call request and the second API call request, and there is no need to send the first API call request and the second API call request to the GPU driver for processing, so that the AI framework layer does not need to wait for the GPU driver to respond to the first API call request and the second API call request. 2. The processing result of the API call request to reduce the time consumption of the AI framework layer waiting for the request response, thereby improving the execution efficiency of the AI task.
附图说明Description of the drawings
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The exemplary embodiments and descriptions of the application are used to explain the application, and do not constitute an improper limitation of the application. In the attached picture:
图1为本说明书实施例中的一种基于GPU虚拟化的资源复用方法的应用场景的示意图;FIG. 1 is a schematic diagram of an application scenario of a resource reuse method based on GPU virtualization in an embodiment of this specification;
图2为本说明书实施例提供的一种基于GPU虚拟化的资源复用方法的流程示意图;2 is a schematic flowchart of a resource reuse method based on GPU virtualization provided by an embodiment of this specification;
图3为本说明书实施例中提供的一种将第二资源对应的数据的地址指针写入队列的场景示意图;3 is a schematic diagram of a scenario in which an address pointer of data corresponding to a second resource is written into a queue provided in an embodiment of this specification;
图4为本说明书实施例提供的对应于图2的一种基于GPU虚拟化的资源复用装置的结构示意图;FIG. 4 is a schematic structural diagram of a resource multiplexing device based on GPU virtualization corresponding to FIG. 2 provided by an embodiment of this specification;
图5为本说明书实施例提供的对应于图2的一种基于GPU虚拟化的资源复用设备的结构示意图。FIG. 5 is a schematic structural diagram of a resource multiplexing device based on GPU virtualization corresponding to FIG. 2 provided by an embodiment of this specification.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the present application clearer, the technical solutions of the present application will be described clearly and completely in conjunction with specific embodiments of the present application and the corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
图1为本说明书实施例提供的一种基于GPU虚拟化的资源复用方法的应用场景示意图。如图1所示,AI框架层1011以及GPU虚拟化系统的客户端1012可以搭载在用户的终端设备101上。其中,AI框架层1011可以用于搭建各类模块(例如,卷积神经网络CNN、循环神经网络RNN、长短期记忆网络LSTM、生成对抗网络GAN等),并控制各类模型在CPU或GPU上运行。在实际应用中,AI框架层可以采用TensorFlow、PyTorch或Caffe2等实现。FIG. 1 is a schematic diagram of an application scenario of a resource reuse method based on GPU virtualization provided by an embodiment of this specification. As shown in FIG. 1, the AI framework layer 1011 and the client 1012 of the GPU virtualization system can be mounted on the user's terminal device 101. Among them, the AI framework layer 1011 can be used to build various modules (for example, convolutional neural network CNN, recurrent neural network RNN, long and short-term memory network LSTM, generative adversarial network GAN, etc.), and control various models on the CPU or GPU run. In practical applications, the AI framework layer can be implemented using TensorFlow, PyTorch, or Caffe2.
GPU虚拟化系统的客户端1012可以与GPU虚拟化系统的服务端102以及用于提供资源的GPU104进行交互,从而实现对虚拟GPU资源的发现、申请、访问以及内置优化。GPU虚拟化系统的客户端1012还可以记录模型的一个迭代周期内所需的资源及状态信息,并对记录的资源及状态信息进行复用,以减少向GPU104中的GPU驱动1041发送API调用请求的次数。The client 1012 of the GPU virtualization system can interact with the server 102 of the GPU virtualization system and the GPU 104 for providing resources, so as to realize the discovery, application, access, and built-in optimization of virtual GPU resources. The client 1012 of the GPU virtualization system can also record the resources and status information required in an iteration cycle of the model, and reuse the recorded resources and status information to reduce the number of API call requests sent to the GPU driver 1041 in the GPU104 The number of times.
其中,GPU104可以包括GPU驱动1041以及GPU硬件1042。GPU驱动1041可以对客户端1012发送的API调用请求进行响应。而GPU硬件1042则可以利用nvidia P100GPU、NVIDIA Tesla V100以及GeForce GTX 1080等实现。Wherein, the GPU 104 may include a GPU driver 1041 and GPU hardware 1042. The GPU driver 1041 may respond to the API call request sent by the client 1012. The GPU hardware 1042 can be implemented using nvidia P100GPU, NVIDIA Tesla V100, and GeForce GTX 1080.
GPU虚拟化系统的服务端102可以用于负责GPU服务与虚拟化管理,具体的,GPU虚拟化系统的服务端102可以根据配置策略,划分并预分配虚拟GPU资源,保存虚拟与物理资源的影射关系,并上报GPU资源调用请求到GPU资源调度器103。GPU 资源调度器103则可以对GPU资源调用请求进行响应,实现对GPU104中的资源的调度、分配。在实际应用中,GPU资源调度器103可以采用K8S或Kubemaker实现。The server 102 of the GPU virtualization system can be used to be responsible for GPU services and virtualization management. Specifically, the server 102 of the GPU virtualization system can divide and pre-allocate virtual GPU resources according to the configuration strategy, and save the mapping between virtual and physical resources. Relationship, and report the GPU resource call request to the GPU resource scheduler 103. The GPU resource scheduler 103 can respond to the GPU resource call request, and implement the scheduling and allocation of resources in the GPU 104. In practical applications, the GPU resource scheduler 103 can be implemented by using K8S or Kubemaker.
经研究发现,当利用虚拟化GPU资源执行AI任务时,GPU虚拟化系统中的客户端需要通过应用程序编程接口(Application Programming Interface,API)向GPU驱动调用各个算子所需的资源。例如,针对AI任务中的卷积(Convolution)算子而言,调用的API多达14个。表1为卷积算子所对应的API信息表,如表1所示。Research has found that when using virtualized GPU resources to perform AI tasks, the client in the GPU virtualization system needs to call the GPU driver the resources required by each operator through an application programming interface (Application Programming Interface, API). For example, for the Convolution operator in the AI task, as many as 14 APIs are called. Table 1 is the API information table corresponding to the convolution operator, as shown in Table 1.
Figure PCTCN2020134523-appb-000001
Figure PCTCN2020134523-appb-000001
表1Table 1
根据表1中的内容,可知,针对卷积算子,GPU虚拟化系统中的客户端需执行的14次API调用中,13次为同步调用,1次为异步调用。由于在对API进行同步调用时,客户端都会将API对应的操作数据转发至GPU驱动,等待GPU驱动处理完成后,客户端再向AI框架层反馈调用成功的消息,即,AI框架层需要在接收到客户端反馈的API调用结果后,才能继续对下一个API进行调用,从而对AI任务的延迟影响较大。且AI任务通常需要执行几万到几十万轮迭代运算,而每轮迭代运算都需要反复的调用这些同步API,从而严重影响AI任务的运算效率。According to the content in Table 1, it can be seen that for the convolution operator, of the 14 API calls that the client in the GPU virtualization system needs to perform, 13 are synchronous calls and 1 is asynchronous. When the API is synchronously called, the client will forward the operation data corresponding to the API to the GPU driver. After the GPU driver processing is completed, the client will report the successful call to the AI framework layer. That is, the AI framework layer needs to be After receiving the API call result feedback from the client, the next API call can be continued, which has a greater impact on the delay of the AI task. And AI tasks usually need to perform tens of thousands to hundreds of thousands of iterative operations, and each round of iterative operations needs to call these synchronization APIs repeatedly, which seriously affects the computing efficiency of AI tasks.
针对上述问题,发明人经研究发现,当AI任务采用的是诸如深度神经网络(Deep Neural Networks,DNN)这类的算法时,每轮迭代运算所需执行的算子通常是相同的。因此,可以通过获取并缓存AI任务中的一个迭代周期内的各个算子所需的资源及状态信息,并在各个循环迭代过程中对这些资源和状态信息进行重复使用,从而可以大幅地减少对API的调用操作,以优化基于GPU虚拟化技术执行AI任务时的运行效率。In response to the above problems, the inventor found through research that when AI tasks use algorithms such as Deep Neural Networks (DNN), the operators required to perform each round of iterative operations are usually the same. Therefore, it is possible to obtain and cache the resources and status information required by each operator in an iteration cycle of the AI task, and to reuse these resources and status information during each iteration cycle, thereby greatly reducing API call operation to optimize the operating efficiency when executing AI tasks based on GPU virtualization technology.
图2为本说明书实施例提供的一种基于GPU虚拟化的资源复用方法的流程示意图。从程序角度而言,流程的执行主体可以为GPU虚拟化系统中的客户端。如图2所示,该流程可以包括步骤202~步骤206。FIG. 2 is a schematic flowchart of a resource reuse method based on GPU virtualization provided by an embodiment of this specification. From a program point of view, the execution subject of the process can be the client in the GPU virtualization system. As shown in FIG. 2, the process may include step 202 to step 206.
步骤202:获取AI框架层发送的用于创建第一资源的第一API调用请求。Step 202: Obtain a first API call request sent by the AI framework layer for creating a first resource.
在本说明书实施例中,第一API调用请求可以用于创建与第一资源对应的资源描述符。在实际应用中,第一API调用请求可以包括多个进行同步调用的API调用请求,例如,第一API调用请求可以包括用于创建输入数据描述符的API调用请求、用于创建输出数据描述符的API调用请求、用于创建权重数据描述符的API调用请求以及用于创建卷积描述符的API调用请求等。In the embodiment of this specification, the first API call request may be used to create a resource descriptor corresponding to the first resource. In actual applications, the first API call request may include multiple API call requests for synchronous calls. For example, the first API call request may include an API call request for creating an input data descriptor and an output data descriptor. API call request, API call request for creating weight data descriptor, API call request for creating convolution descriptor, etc.
步骤204:确定预先存储的与所述第一资源相匹配的数据所在的内存地址;所述与所述第一资源相匹配的数据包含对于所述第一资源的设置参数。Step 204: Determine the memory address where the pre-stored data matching the first resource is located; the data matching the first resource includes setting parameters for the first resource.
在本说明书实施例中,GPU虚拟化系统的客户端101所在的终端设备中可以预先存储有与第一资源相匹配的数据,该与第一资源相匹配的数据可以为对第一资源的资源描述符的属性进行设置后所生成的数据。因此,当客户端101接收到AI框架层发送的用于创建第一资源的第一API调用请求时,客户端101无需将第一API调用请求发送至GPU驱动以建立与第一资源对应的数据,而只需确定预先存储的与第一资源相匹配的数据所在的内存地址即可。In the embodiment of this specification, the terminal device where the client 101 of the GPU virtualization system is located may pre-store data matching the first resource, and the data matching the first resource may be a resource for the first resource. The data generated after the properties of the descriptor are set. Therefore, when the client 101 receives the first API call request for creating the first resource sent by the AI framework layer, the client 101 does not need to send the first API call request to the GPU driver to establish data corresponding to the first resource. , And only need to determine the memory address where the pre-stored data matching the first resource is located.
在实际应用中,AI任务启动后所执行的前N个迭代循环过程通常为预热(warm up)阶段,AI任务可以通过AI任务的预热阶段去构建计算图、分配资源、寻找最佳的算子。在本说明书实施例中,GPU虚拟化系统的客户端101可以借助预热阶段来获取并存储与第一资源相匹配的数据,以便于后续使用。In practical applications, the first N iterative cycles executed after the AI task is started are usually the warm-up phase. AI tasks can use the warm-up phase of the AI task to build calculation graphs, allocate resources, and find the best operator. In the embodiment of the present specification, the client 101 of the GPU virtualization system may obtain and store data matching the first resource by means of the warm-up phase, so as to facilitate subsequent use.
步骤206:向所述AI框架层反馈所述与所述第一资源相匹配的数据所在的内存地址。Step 206: Feed back to the AI framework layer the memory address where the data matching the first resource is located.
在本说明书实施例中,当确定出预先存储的与第一资源相匹配的数据所在的内存地址后,可以向AI框架层反馈与第一资源对应的资源描述符创建成功的消息,该消息可以携带确定出的与第一资源相匹配的数据所在的内存地址。In the embodiment of this specification, after determining the memory address where the pre-stored data matching the first resource is located, the message that the resource descriptor corresponding to the first resource is successfully created can be fed back to the AI framework layer. The message can be Carry the memory address where the determined data matching the first resource is located.
步骤208:获取所述AI框架层发送的用于对所述第一资源进行设置的第二API调用请求。Step 208: Obtain a second API call request sent by the AI framework layer for setting the first resource.
在本说明书实施例中,第二API调用请求可以包含多个同步调用的API调用请求,该第二API调用请求可以用于请求对第一资源对应的资源描述符进行形状、填充、数据类型、数据对齐等属性的设置。In the embodiment of this specification, the second API call request may include multiple synchronously called API call requests, and the second API call request may be used to request the shape, filling, data type, and data type of the resource descriptor corresponding to the first resource. Setting of attributes such as data alignment.
步骤210:向所述AI框架层反馈用于表示设置成功的消息。Step 210: Feed back a message indicating that the setting is successful to the AI framework layer.
在本说明书实施例中,由于用户的终端设备中预先存储了与第一资源相匹配的数 据,即对第一资源的资源描述符的属性进行设置后所生成的数据。因此,当GPU虚拟化系统的客户端101获取到对第一资源进行设置的第二API调用请求时,无需将第二API调用请求发送至GPU驱动进行处理,而是可以直接向AI框架层反馈用于表示对第一资源的资源描述符的属性设置成功的消息。从而可以减少AI框架层等待GPU驱动反馈对于第二API调用请求的响应结果所用的时间,进而有利于提升AI任务的执行效率。In the embodiment of this specification, since the user's terminal device pre-stores the data matching the first resource, that is, the data generated after setting the attribute of the resource descriptor of the first resource. Therefore, when the client 101 of the GPU virtualization system obtains the second API call request for setting the first resource, it does not need to send the second API call request to the GPU driver for processing, but can directly feed back to the AI framework layer. It is used to indicate a message that the attribute of the resource descriptor of the first resource is successfully set. This can reduce the time that the AI framework layer waits for the GPU driver to feed back the response result of the second API call request, thereby helping to improve the execution efficiency of the AI task.
步骤212:获取所述AI框架层发送的用于基于所述第一资源进行计算的第三API调用请求。Step 212: Obtain a third API call request sent by the AI framework layer for calculation based on the first resource.
步骤214:基于所述第三API调用请求,生成针对所述第一资源的第一计算指令。Step 214: Generate a first calculation instruction for the first resource based on the third API call request.
步骤216:将所述第一计算指令以及所述与所述第一资源相匹配的数据发送至GPU驱动。Step 216: Send the first calculation instruction and the data matching the first resource to the GPU driver.
在本说明书实施例中,GPU虚拟化系统的客户端101可以将与第一资源相匹配的数据(即包含对第一资源的资源描述符的属性进行设置后所生成的数据)发送至GPU驱动,GPU驱动可以根据接收到的针对第一资源的第一计算指令以及与第一资源相匹配的数据,配置资源,执行计算任务。从而无需将AI框架层的第一API调用请求及第二API调用请求发送至GPU驱动。In the embodiment of this specification, the client 101 of the GPU virtualization system may send the data matching the first resource (that is, the data generated after setting the attribute of the resource descriptor of the first resource) to the GPU driver. , The GPU driver can configure the resource and execute the calculation task according to the received first calculation instruction for the first resource and the data matching the first resource. Therefore, there is no need to send the first API call request and the second API call request of the AI framework layer to the GPU driver.
在本说明书实施例中,通过在GPU虚拟化系统中的客户端处预先存储对于第一资源的设置参数,令该客户端可以先行反馈AI框架层发送的第一API调用请求以及第二API调用请求,且无需将第一API调用请求以及第二API调用请求发送至GPU驱动进行处理,使得AI框架层无需等待GPU驱动反馈对于第一API调用请求与第二API调用请求的处理结果,以减少AI框架层等待请求响应的耗时,进而提升AI任务的执行效率。In the embodiment of this specification, by pre-storing the setting parameters for the first resource at the client in the GPU virtualization system, the client can feed back the first API call request and the second API call sent by the AI framework layer in advance Request, and there is no need to send the first API call request and the second API call request to the GPU driver for processing, so that the AI framework layer does not need to wait for the GPU driver to feedback the processing results of the first API call request and the second API call request, so as to reduce The time it takes for the AI framework layer to wait for a response to a request, thereby improving the efficiency of AI task execution.
基于图2的方法,本说明书实施例还提供了该方法的一些具体实施方案,下面进行说明。Based on the method in FIG. 2, the examples of this specification also provide some specific implementations of the method, which are described below.
在本说明书实施例中,可以在AI任务启动后的预热阶段获取与第一资源相匹配的数据,并进行存储,以便于后续迭代过程使用。本说明书实施例中提供了一种在AI任务的预热阶段获取与第一资源相匹配的数据的实现方式。具体的,在步骤202之前,还可以包括以下步骤:在AI任务启动后的预热阶段,获取AI框架层发送的用于创建第二资源的第四API调用请求;创建所述第二资源,将所述第二资源对应的数据存储在所述内存地址;向所述AI框架层反馈所述内存地址;获取所述AI框架层发送的用于对所述第二资源进行设置的第五API调用请求;基于所述第五API调用请求,对所述内存地址 中的数据进行设置;向所述AI框架层反馈用于表示设置成功的消息;获取所述AI框架层发送的用于基于所述第二资源进行计算的第六API调用请求;基于所述第六API调用请求,生成针对所述第二资源的第二计算指令;将所述第二计算指令发送至GPU驱动。In the embodiment of this specification, the data matching the first resource can be obtained in the warm-up phase after the AI task is started, and stored, so as to be used in the subsequent iterative process. The embodiment of this specification provides an implementation manner for obtaining data matching the first resource in the warm-up phase of the AI task. Specifically, before step 202, it may also include the following steps: in the warm-up phase after the AI task is started, obtaining a fourth API call request sent by the AI framework layer for creating a second resource; creating the second resource, Store the data corresponding to the second resource in the memory address; feed back the memory address to the AI framework layer; obtain the fifth API sent by the AI framework layer for setting the second resource Call request; based on the fifth API call request, set the data in the memory address; feedback to the AI framework layer a message indicating that the setting is successful; obtain the message sent by the AI framework layer to be based on the A sixth API call request for the second resource to perform calculations; based on the sixth API call request, a second calculation instruction for the second resource is generated; and the second calculation instruction is sent to a GPU driver.
在本说明书实施例中,在AI任务启动后的预热阶段,AI框架层会针对AI任务中的各个算子生成创建资源、设置资源以及执行计算的API调用请求,以令GPU虚拟化系统的客户端生成并存储与各个算子对应的资源的设置参数。In the embodiment of this specification, in the warm-up phase after the AI task is started, the AI framework layer will generate API call requests for creating resources, setting resources, and performing calculations for each operator in the AI task, so as to make the GPU virtualization system work. The client generates and stores the resource setting parameters corresponding to each operator.
具体的,GPU虚拟化系统的客户端可以在获取到AI框架层发送的用于创建第二资源的第四API调用请求后,响应于第四API调用请求,在该客户端所在设备的内存中创建与第二资源对应的数据,并将该数据的存储地址确定为与第二资源对应的数据所在的内存地址。GPU虚拟化系统的客户端可以发送携带有该内存地址的表示创建成功的消息至API框架层。Specifically, the client of the GPU virtualization system may, after obtaining the fourth API call request sent by the AI framework layer for creating the second resource, respond to the fourth API call request, in the memory of the device where the client is located Create data corresponding to the second resource, and determine the storage address of the data as a memory address where the data corresponding to the second resource is located. The client of the GPU virtualization system may send a message that carries the memory address, indicating that the creation is successful, to the API framework layer.
当GPU虚拟化系统的客户端接收到AI框架层发送的用于对第二资源进行设置的第五API调用请求时,该客户端可以响应于第五API调用请求,对存储于该客户端所在设备中的与第二资源对应的数据进行设置,得到第二资源的设置参数,并向AI框架层反馈用于表示设置成功的消息。When the client of the GPU virtualization system receives the fifth API call request for setting the second resource sent by the AI framework layer, the client can respond to the fifth API call request to check the storage location of the client. The data corresponding to the second resource in the device is set, the setting parameters of the second resource are obtained, and a message indicating that the setting is successful is fed back to the AI framework layer.
在本说明书实施例中,当第二资源对应的算子与第一资源对应的算子为不同迭代过程中的同一算子时,在执行步骤204时,确定出的预先存储的与第一资源相匹配的数据所在的内存地址,与GPU虚拟化系统的客户端确定出的第二资源对应的数据所在的内存地址可以是相同的。对应的,第四API调用请求与第一API调用请求也可以是相同的。第五API调用请求与第二API调用请求也可以是相同的。In the embodiment of this specification, when the operator corresponding to the second resource and the operator corresponding to the first resource are the same operator in different iterations, when step 204 is performed, the determined pre-stored and the first resource The memory address where the matched data is located may be the same as the memory address where the data corresponding to the second resource determined by the client of the GPU virtualization system is located. Correspondingly, the fourth API call request and the first API call request may also be the same. The fifth API call request and the second API call request may also be the same.
在本说明书实施例中,GPU虚拟化系统的客户端还可以将第四API调用请求及第五API调用请求发送至GPU驱动,以令GPU驱动生成与第二资源对应的数据(即包含对于第二资源的设置参数的数据),从而便于GPU驱动执行第二计算指令。In the embodiment of the present specification, the client of the GPU virtualization system may also send the fourth API call request and the fifth API call request to the GPU driver, so that the GPU driver generates data corresponding to the second resource (that is, includes data corresponding to the second resource). The second resource setting parameter data), so as to facilitate the GPU driver to execute the second calculation instruction.
由于GPU驱动生成的与第二资源对应的数据通常会存储于GPU缓存中,为避免对GPU缓存的占用,当GPU驱动对第二计算指令响应成功后,AI框架会向客户端发送指令,该指令用于删除GPU缓存中的与第二资源对应的数据。Since the data corresponding to the second resource generated by the GPU driver is usually stored in the GPU cache, in order to avoid the occupation of the GPU cache, when the GPU driver successfully responds to the second calculation instruction, the AI framework will send the instruction to the client. The instruction is used to delete the data corresponding to the second resource in the GPU cache.
因此,在获取所述AI框架层发送的用于基于所述第二资源进行计算的第六API调用请求之后,还可以包括以下步骤:获取所述AI框架层发送的用于对所述第二资源对应的数据进行删除的第七API调用请求;保留所述第二资源对应的数据;向所述AI框 架层反馈用于表示删除成功的消息。Therefore, after obtaining the sixth API call request sent by the AI framework layer for calculation based on the second resource, the following step may be further included: obtaining the second API call request sent by the AI framework layer for the second resource A seventh API call request for deleting data corresponding to the resource; retaining the data corresponding to the second resource; and feeding back a message indicating that the deletion is successful to the AI framework layer.
在本说明书实施例中,GPU虚拟化系统的客户端可以将AI框架层发送的用于对所述第二资源对应的数据进行删除的第七API调用请求发送至GPU驱动,以令GPU驱动对GPU缓存中的与第二资源对应的数据进行删除。但是,由于GPU虚拟化系统的客户端所在设备中也存储了与第二资源对应的数据,但为了后续迭代过程可以复用该客户端所在设备中存储的与第二资源对应的数据,GPU虚拟化系统的客户端在接收到第七API调用请求后,会对该客户端所在设备中存储的与第二资源对应的数据进行保留。并将用于表示删除成功的消息反馈至AI框架层,以便于AI框架层对AI任务中的其他算子进行处理。In the embodiment of this specification, the client of the GPU virtualization system may send the seventh API call request sent by the AI framework layer to delete the data corresponding to the second resource to the GPU driver, so that the GPU driver can The data corresponding to the second resource in the GPU cache is deleted. However, since the device where the client of the GPU virtualization system is located also stores data corresponding to the second resource, in order to reuse the data corresponding to the second resource stored in the device where the client is located in the subsequent iteration process, the GPU virtual After receiving the seventh API call request, the client of the chemical system will reserve the data corresponding to the second resource stored in the device where the client is located. And the message used to indicate the success of the deletion is fed back to the AI framework layer, so that the AI framework layer can process other operators in the AI task.
在本说明书实施例中,通过令GPU虚拟化系统的客户端对其所在设备中的与第二资源对应的数据进行保留,从而可以实现对第二资源对应的数据(即与第一资源相匹配的数据)的预先存储,以便于后续执行AI任务中的迭代运算使用。且由于GPU虚拟化系统的客户端可以根据其所在设备对第二资源对应的数据的存储信息,先行向AI框架层反馈用于表示删除成功的消息,无需AI框架层等待GPU驱动对于第七API调用请求的处理结果,从而可以减少AI框架层的等待耗时,有利于提升AI任务的执行效率。In the embodiment of this specification, by making the client of the GPU virtualization system reserve the data corresponding to the second resource in the device where it is located, the data corresponding to the second resource (that is, matching the first resource) can be realized. The data) is pre-stored in order to facilitate the subsequent execution of iterative calculations in AI tasks. And because the client of the GPU virtualization system can feed back to the AI framework layer a message indicating the success of the deletion according to the storage information of the data corresponding to the second resource on the device where it is located, without the AI framework layer waiting for the GPU driver to respond to the seventh API Invoke the processing result of the request, which can reduce the waiting time of the AI framework layer, which is conducive to improving the execution efficiency of the AI task.
在本说明书实施例中,为便于后续执行AI任务时使用与第二资源对应的数据,在保留所述第二资源对应的数据之后,还可以包括以下步骤:确定所述第四API调用请求对应的计算流;将所述第二资源对应的数据的存储地址的地址指针,写入所述计算流对应的队列中。In the embodiment of this specification, in order to facilitate the subsequent use of the data corresponding to the second resource when executing the AI task, after the data corresponding to the second resource is reserved, the following step may be further included: determining that the fourth API call request corresponds to The calculation flow; write the address pointer of the storage address of the data corresponding to the second resource into the queue corresponding to the calculation flow.
在本说明书实施例中,由于通常会使用一个或多个流计算任务(stream computing)去实现AI任务,但流计算任务中对于各个算子的执行顺序有着严格的要求。因此,在AI任务启动后的预热(warm up)阶段,可以先确定AI任务中的一个完整迭代过程所包含的各个算子的执行顺序,并确定各个算子对应的流计算任务。然后,根据确定出的各个算子的执行顺序以及对应的流计算任务,将各个算子对应的资源设置参数的存储地址的地址指针,写入与各个算子对应的流计算任务的队列中,以便于后续迭代过程使用。In the embodiments of this specification, since one or more stream computing tasks (stream computing) are usually used to implement AI tasks, the stream computing tasks have strict requirements on the execution order of each operator. Therefore, in the warm-up phase after the AI task is started, the execution order of each operator included in a complete iterative process in the AI task can be determined first, and the flow computing task corresponding to each operator can be determined. Then, according to the determined execution order of each operator and the corresponding flow calculation task, the address pointer of the storage address of the resource setting parameter corresponding to each operator is written into the queue of the flow calculation task corresponding to each operator, In order to facilitate the subsequent iterative process.
为便于理解,对将资源设置参数的存储地址的地址指针写入队列的流程进行举例说明。假定,AI任务中的一个完整迭代过程中所需执行的算子由先至后分别为:OP1、OP2、OP3及OP4。图3为本说明书实施例中提供的一种将资源设置参数的存储地址的地址指针写入队列的场景示意图。如图3(a)所示,环形队列301中包含一个完整迭代过程中所需执行的各个算子,其中,环形队列301中位置3011中的算子(即OP3)为 当前流计算任务正在执行的算子。第一队列302中包含OP1与OP2对应的资源设置参数的存储地址的地址指针。第二队列303中包含OP3对应的资源设置参数的存储地址的地址指针。可知,OP1与OP2对应的流计算任务相同,OP1(或OP2)与OP3对应的流计算任务不同。For ease of understanding, the process of writing the address pointer of the storage address of the resource setting parameter into the queue is illustrated by an example. Assume that the operators that need to be executed during a complete iteration of the AI task are: OP1, OP2, OP3, and OP4, respectively. FIG. 3 is a schematic diagram of a scenario in which an address pointer of a storage address of a resource setting parameter is written into a queue provided in an embodiment of this specification. As shown in Figure 3(a), the circular queue 301 contains various operators that need to be executed in a complete iteration process. Among them, the operator in position 3011 in the circular queue 301 (that is, OP3) is the current flow computing task being executed. Operator. The first queue 302 contains address pointers of storage addresses of resource setting parameters corresponding to OP1 and OP2. The second queue 303 contains the address pointer of the storage address of the resource setting parameter corresponding to OP3. It can be seen that the stream computing tasks corresponding to OP1 and OP2 are the same, and the stream computing tasks corresponding to OP1 (or OP2) and OP3 are different.
当流计算任务完成了对OP3算子的计算后,AI框架层可以请求执行对OP4算子的计算。如图3(b)所示,环形队列301中位置3012中的算子(即OP4)为当前流计算任务正在执行的算子,若确定出OP4算子对应的流计算任务(即计算流)与OP3算子对应的流计算任务(即计算流)相同。则可以将OP4算子对应的资源设置参数的存储地址的地址指针也写入第二队列,从而得到更新后的第二队列304。After the stream computing task has completed the calculation of the OP3 operator, the AI framework layer can request to perform the calculation of the OP4 operator. As shown in Figure 3(b), the operator at position 3012 in the circular queue 301 (ie OP4) is the operator currently being executed by the stream computing task. If the stream computing task corresponding to the OP4 operator (ie, computing flow) is determined The flow calculation task (ie, calculation flow) corresponding to the OP3 operator is the same. Then, the address pointer of the storage address of the resource setting parameter corresponding to the OP4 operator can also be written into the second queue, so as to obtain the updated second queue 304.
在本说明书实施例中,步骤204:确定预先存储的与所述第一资源相匹配的数据所在的内存地址,具体可以包括以下步骤:确定所述第一API调用请求对应的计算流;从所述计算流对应的队列中读取队首存储的地址指针;所述地址指针指向与所述第一资源相匹配的数据所在的内存地址。In the embodiment of this specification, step 204: determining the memory address where the pre-stored data matching the first resource is located may specifically include the following steps: determining the calculation flow corresponding to the first API call request; The address pointer stored at the head of the queue is read from the queue corresponding to the calculation flow; the address pointer points to the memory address where the data matching the first resource is located.
在本说明书实施例中,可以确定第一API调用请求所需创建的资源所对应的算子,并将确定出的算子所对应的计算流(即流计算任务),作为第一API调用请求对应的计算流。In the embodiment of this specification, the operator corresponding to the resource to be created for the first API call request can be determined, and the calculation flow (ie, stream computing task) corresponding to the determined operator can be used as the first API call request The corresponding calculation flow.
在本说明书实施例中,由于在AI任务的预热阶段,根据各个算子的执行顺序,将各个算子对应的资源的设置参数的存储地址的地址指针,写入了各个算子对应的计算流的队列中,因此,可以从第一API调用请求对应的计算流的队列中,读取与第一资源相匹配的数据所在的内存地址的地址指针。In the embodiment of this specification, because in the warm-up phase of the AI task, according to the execution order of each operator, the address pointer of the storage address of the resource setting parameter corresponding to each operator is written into the calculation corresponding to each operator In the queue of the flow, therefore, the address pointer of the memory address where the data matching the first resource is located can be read from the queue of the calculation flow corresponding to the first API call request.
在实际应用中,可以每次从队列中读取一个地址指针后,即对该地址指针进行删除,并将该地址指针重新写入该队列的队尾,以令该队列的队首所存储的地址指针,即为该队列对应的计算流后续运行时所需的地址指针。In practical applications, each time an address pointer is read from the queue, the address pointer can be deleted, and the address pointer can be rewritten to the end of the queue, so that the head of the queue is stored The address pointer is the address pointer required for the subsequent operation of the calculation flow corresponding to the queue.
因此,从所述计算流对应的队列中读取队首存储的地址指针之后,还可以包括:将所述地址指针从所述队列的队首删除;将所述地址指针写入所述队列的队尾。Therefore, after reading the address pointer stored at the head of the queue from the queue corresponding to the calculation flow, it may also include: deleting the address pointer from the head of the queue; writing the address pointer to the queue The end of the team.
在本说明书实施例中,通过在读取队列中的队首处存储的地址指针后,对该队首处存储的地址指针进行删除,并将该地址指针写入该队列的队尾,从而便于GPU虚拟化系统的客户端在执行AI任务中的迭代运算时,对预先存储的资源设置参数的重复利用。In the embodiment of this specification, after reading the address pointer stored at the head of the queue in the queue, the address pointer stored at the head of the queue is deleted, and the address pointer is written to the end of the queue, thereby facilitating When the client of the GPU virtualization system performs the iterative calculation in the AI task, it sets the parameters for the reuse of pre-stored resources.
由于采用了队列的方式对地址指针进行存储,而队列的读取原则是“先进先出”。即最先被写入队列中的数据,按照顺序会被最先读取到。而队列中的地址指针的存储顺序,又是按照每一轮迭代中的步骤依次进行存储的。所以,采用队列的方式对地址指针进行存储以后,在后续进行复用时,只需在新的一轮迭代计算过程开始时,从队列中读取第一个地址指针,在同一轮迭代过程中,在后续需要按照步骤执行顺序读取复用数据对应的存储地址的地址指针时,只要按照先后顺序依次读取队列中存储的地址指针即可。这样做可以简化步骤顺序与复用的数据之间的映射关系的处理过程。Because the address pointer is stored in a queue, the read principle of the queue is "first in, first out". That is, the data that is first written to the queue will be read first in order. The storage order of the address pointers in the queue is stored sequentially according to the steps in each iteration. Therefore, after the address pointer is stored in the queue method, when the subsequent multiplexing is performed, it is only necessary to read the first address pointer from the queue when a new round of iterative calculation process starts, and in the same round of iterative process , When it is necessary to read the address pointers of the storage addresses corresponding to the multiplexed data in the order of execution of the steps in the subsequent, it is only necessary to read the address pointers stored in the queue in sequence according to the sequence. This can simplify the processing of the mapping relationship between the sequence of steps and the multiplexed data.
但是,采用队列方式存储地址指针后,需要确保在每一轮迭代计算开始时,读取队列中存储的第一个地址指针。可以采用下述方式实现这一过程。However, after the address pointer is stored in the queue method, it is necessary to ensure that the first address pointer stored in the queue is read at the beginning of each round of iterative calculation. This process can be achieved in the following ways.
具体的,在实际应用中,需在确定当前一轮的迭代过程执行完成后,在下一轮迭代过程开始时,执行下一轮迭代过程中的算子对应的计算任务,以避免发生错误。Specifically, in practical applications, it is necessary to execute the calculation tasks corresponding to the operators in the next iteration after it is determined that the execution of the current iteration process is completed and at the beginning of the next iteration process to avoid errors.
因此,步骤202:获取AI框架层发送的用于创建第一资源的第一API调用请求之前,还可以包括:判断当前一轮的迭代过程是否计算完毕,得到判断结果。Therefore, step 202: before obtaining the first API call request sent by the AI framework layer for creating the first resource, it may further include: judging whether the current round of iterative process has been calculated, and obtaining the judgment result.
步骤201:获取AI框架层发送的用于创建第一资源的第一API调用请求,具体可以包括:当所述判断结果表示当前一轮的迭代过程计算完毕时,获取AI框架层发送的用于创建第一资源的第一API调用请求。Step 201: Obtain the first API call request sent by the AI framework layer for creating the first resource, which may specifically include: when the judgment result indicates that the current round of iterative process calculation is completed, obtaining the API call request sent by the AI framework layer Create the first API call request for the first resource.
在本说明书实施例中,当AI任务采用的模型为神经网络模型时,AI任务中的一轮迭代过程可以指分别采用正向传播算法(Forward propagation algorithm)以及反向传播算法(Backpropagation algorithm)对神经网络模型处理一轮。In the embodiment of this specification, when the model adopted by the AI task is a neural network model, a round of iterative process in the AI task may refer to the use of forward propagation algorithm (Forward propagation algorithm) and back propagation algorithm (Backpropagation algorithm). The neural network model is processed for one round.
基于此,本说明书实施例中给出了一种判断当前一轮的迭代过程是否计算完毕的实现方式。Based on this, the embodiment of this specification provides an implementation method for judging whether the calculation of the current round of iterative process is completed.
具体的,可以记录AI任务中的模型的第一层的输出结果的存储地址对应的第一地址指针;在后向梯度传播过程中,监测当前的输入数据的存储地址对应的第二地址指针;判断所述第二地址指针是否与所述第一地址指针相同。Specifically, the first address pointer corresponding to the storage address of the output result of the first layer of the model in the AI task can be recorded; in the backward gradient propagation process, the second address pointer corresponding to the storage address of the current input data can be monitored; It is determined whether the second address pointer is the same as the first address pointer.
例如,假定AI任务中的模型的第一层为卷积计算,由于使用反向传播算法时,对最后一个卷积求梯度时的输入与该模型的第一个层的输出应相同,因此,可以将当前求梯度时使用的输入数据的存储地址对应的第二地址指针,与AI任务中的模型的第一层的输出结果的存储地址对应的第一地址指针进行对比,若相同,则表示当前一轮的迭代过程计算完毕,可以对下一轮迭代过程中的算子进行计算,即可以执行步骤202,从而 保证AI任务中的迭代循环的正确运行。For example, suppose that the first layer of the model in the AI task is convolution calculation. When using the backpropagation algorithm, the input when calculating the gradient of the last convolution should be the same as the output of the first layer of the model. Therefore, The second address pointer corresponding to the storage address of the input data currently used when obtaining the gradient can be compared with the first address pointer corresponding to the storage address of the output result of the first layer of the model in the AI task. If they are the same, it means After the calculation of the current iteration process is completed, the operators in the next iteration process can be calculated, that is, step 202 can be executed to ensure the correct operation of the iteration loop in the AI task.
本说明书实施例提供了一种基于GPU虚拟化的资源复用方法,当执行AI任务时,可以通过对预先存储的资源设置数据的复用,使得GPU虚拟化系统的客户端可以先行处理AI框架层发送的API调用请求,以减少AI框架层所需向GPU驱动发起的80%左右的API同步调用。可以显著减少GPU虚拟化技术的性能损失,并减少AI任务执行时的资源消耗及时间消耗。经实验证明,采用本申请实施例提供的基于GPU虚拟化的资源复用方法在TensorFlow中运行CNN模型(例如,AlexNet)时,相较于现有技术的运行效率提升了11%。且本说明书实施例提供的一种基于GPU虚拟化的资源复用方法部署灵活,可同时支持裸机、容器或者虚拟机运行,同时对云化友好,其适用性也较好。The embodiment of this specification provides a resource reuse method based on GPU virtualization. When executing AI tasks, the pre-stored resource setting data can be reused so that the client of the GPU virtualization system can process the AI framework first. API call requests sent by the layer to reduce about 80% of the API synchronization calls that the AI framework layer needs to initiate to the GPU driver. It can significantly reduce the performance loss of GPU virtualization technology, and reduce the resource consumption and time consumption during the execution of AI tasks. Experiments have proved that when a CNN model (for example, AlexNet) is run in TensorFlow using the GPU virtualization-based resource reuse method provided by the embodiments of the present application, the operating efficiency is increased by 11% compared with the prior art. In addition, the GPU virtualization-based resource reuse method provided by the embodiment of this specification can be deployed flexibly, can support bare metal, container, or virtual machine operation at the same time, is cloud-friendly, and has good applicability.
基于同样的思路,本说明书实施例还提供了上述方法对应的装置。图4为本说明书实施例提供的对应于图2的一种基于GPU虚拟化的资源复用装置的结构示意图,该装置可以应用于GPU虚拟化系统中的客户端。如图4所示,该装置可以包括以下模块。Based on the same idea, the embodiment of this specification also provides a device corresponding to the above method. FIG. 4 is a schematic structural diagram of a resource multiplexing device based on GPU virtualization corresponding to FIG. 2 provided by an embodiment of this specification, and the device can be applied to a client in a GPU virtualization system. As shown in Figure 4, the device may include the following modules.
第一获取模块402,用于获取AI框架层发送的用于创建第一资源的第一API调用请求。The first obtaining module 402 is configured to obtain the first API call request sent by the AI framework layer for creating the first resource.
第一确定模块404,用于确定预先存储的与所述第一资源相匹配的数据所在的内存地址;所述与所述第一资源相匹配的数据包含对于所述第一资源的设置参数。The first determining module 404 is configured to determine a memory address where the pre-stored data matching the first resource is located; the data matching the first resource includes setting parameters for the first resource.
第一反馈模块406,用于向所述AI框架层反馈所述与所述第一资源相匹配的数据所在的内存地址。The first feedback module 406 is configured to feed back to the AI framework layer the memory address where the data matching the first resource is located.
第二获取模块408,用于获取所述AI框架层发送的用于对所述第一资源进行设置的第二API调用请求。The second obtaining module 408 is configured to obtain a second API call request sent by the AI framework layer for setting the first resource.
第二反馈模块410,用于向所述AI框架层反馈用于表示设置成功的消息。The second feedback module 410 is configured to feed back a message indicating that the setting is successful to the AI framework layer.
第三获取模块412,用于获取所述AI框架层发送的用于基于所述第一资源进行计算的第三API调用请求。The third obtaining module 412 is configured to obtain a third API call request sent by the AI framework layer for calculation based on the first resource.
第一计算指令生成模块414,用于基于所述第三API调用请求,生成针对所述第一资源的第一计算指令。The first calculation instruction generating module 414 is configured to generate a first calculation instruction for the first resource based on the third API call request.
第一发送模块416,用于将所述第一计算指令以及所述与所述第一资源相匹配的数据发送至GPU驱动。The first sending module 416 is configured to send the first calculation instruction and the data matching the first resource to the GPU driver.
在本说明书实施例中,基于GPU虚拟化的资源复用装置通过预先存储对于第一资 源的设置参数,使得GPU虚拟化系统中的客户端可以先行反馈AI框架层发送的第一API调用请求以及第二API调用请求,且无需将第一API调用请求以及第二API调用请求发送至GPU驱动进行处理,因此,AI框架层无需等待GPU驱动反馈对于第一API调用请求与第二API调用请求的处理结果,以减少AI框架层等待请求响应的耗时,进而提升AI任务的执行效率。In the embodiment of this specification, the GPU virtualization-based resource multiplexing device pre-stores the setting parameters for the first resource, so that the client in the GPU virtualization system can feed back the first API call request sent by the AI framework layer and The second API call request, and there is no need to send the first API call request and the second API call request to the GPU driver for processing. Therefore, the AI framework layer does not need to wait for the GPU driver to respond to the first API call request and the second API call request. Process the results to reduce the time it takes for the AI framework layer to wait for the request response, thereby improving the execution efficiency of the AI task.
在本说明书实施例中,所述装置还可以包括:第四获取模块,用于在AI任务启动后的预热阶段,获取AI框架层发送的用于创建第二资源的第四API调用请求。创建模块,用于创建所述第二资源,将所述第二资源对应的数据存储在所述内存地址。第三反馈模块,用于向所述AI框架层反馈所述内存地址。第五获取模块,用于获取所述AI框架层发送的用于对所述第二资源进行设置的第五API调用请求。设置模块,用于基于所述第五API调用请求,对所述内存地址中的数据进行设置。第四反馈模块,用于向所述AI框架层反馈用于表示设置成功的消息。第六获取模块,用于获取所述AI框架层发送的用于基于所述第二资源进行计算的第六API调用请求。第二计算指令生成模块,用于基于所述第六API调用请求,生成针对所述第二资源的第二计算指令。第二发送模块,用于将所述第二计算指令发送至GPU驱动。In the embodiment of this specification, the device may further include: a fourth acquisition module, configured to acquire a fourth API call request sent by the AI framework layer for creating a second resource in a warm-up phase after the AI task is started. The creation module is configured to create the second resource and store the data corresponding to the second resource in the memory address. The third feedback module is configured to feed back the memory address to the AI framework layer. The fifth acquisition module is configured to acquire the fifth API call request sent by the AI framework layer for setting the second resource. The setting module is configured to set the data in the memory address based on the fifth API call request. The fourth feedback module is used to feed back a message indicating that the setting is successful to the AI framework layer. The sixth acquisition module is configured to acquire the sixth API call request sent by the AI framework layer for calculation based on the second resource. The second calculation instruction generation module is configured to generate a second calculation instruction for the second resource based on the sixth API call request. The second sending module is configured to send the second calculation instruction to the GPU driver.
在本说明书实施例中,所述装置还可以包括:第七获取模块,用于获取所述AI框架层发送的用于对所述第二资源对应的数据进行删除的第七API调用请求。保留模块,用于保留所述第二资源对应的数据。第五反馈模块,用于向所述AI框架层反馈用于表示删除成功的消息。In the embodiment of this specification, the device may further include: a seventh obtaining module, configured to obtain a seventh API call request sent by the AI framework layer for deleting data corresponding to the second resource. The reservation module is used to reserve the data corresponding to the second resource. The fifth feedback module is used to feed back a message indicating successful deletion to the AI framework layer.
在本说明书实施例中,所述第一确定模块404,具体可以用于:确定所述第一API调用请求对应的计算流。In the embodiment of this specification, the first determining module 404 may be specifically used to determine the calculation flow corresponding to the first API call request.
从所述计算流对应的队列中读取队首存储的地址指针;所述地址指针指向与所述第一资源相匹配的数据所在的内存地址。Read the address pointer stored at the head of the queue from the queue corresponding to the calculation stream; the address pointer points to the memory address where the data matching the first resource is located.
在本说明书实施例中,所述装置还可以包括:删除模块,用于将所述地址指针从所述队列的队首删除。第一写入模块,用于将所述地址指针写入所述队列的队尾。In the embodiment of the present specification, the device may further include: a deletion module, configured to delete the address pointer from the head of the queue. The first writing module is used to write the address pointer to the end of the queue.
在本说明书实施例中,所述装置还可以包括:第二确定模块,用于确定所述第四API调用请求对应的计算流。第二写入模块,用于将所述第二资源对应的数据的存储地址的地址指针,写入所述计算流对应的队列中。In the embodiment of the present specification, the device may further include: a second determining module, configured to determine the calculation flow corresponding to the fourth API call request. The second writing module is configured to write the address pointer of the storage address of the data corresponding to the second resource into the queue corresponding to the calculation stream.
在本说明书实施例中,所述装置还可以包括:判断模块,用于判断当前一轮的迭 代过程是否计算完毕,得到判断结果。所述第一获取模块,具体用于当所述判断结果表示当前一轮的迭代过程计算完毕时,获取AI框架层发送的用于创建第一资源的第一API调用请求。In the embodiment of this specification, the device may further include: a judgment module for judging whether the calculation of the current round of iterative process is completed, and obtaining the judgment result. The first obtaining module is specifically configured to obtain the first API call request sent by the AI framework layer for creating the first resource when the judgment result indicates that the calculation of the current iteration process is completed.
在本说明书实施例中,所述判断模块,具体可以用于:记录AI任务中的模型的第一层的输出结果的存储地址对应的第一地址指针。在后向梯度传播过程中,监测当前的输入数据的存储地址对应的第二地址指针。判断所述第二地址指针是否与所述第一地址指针相同。In the embodiment of this specification, the judgment module may be specifically used to record the first address pointer corresponding to the storage address of the output result of the first layer of the model in the AI task. During the backward gradient propagation process, the second address pointer corresponding to the storage address of the current input data is monitored. It is determined whether the second address pointer is the same as the first address pointer.
基于同样的思路,本说明书实施例还提供了上述方法对应的设备。Based on the same idea, the embodiment of this specification also provides a device corresponding to the above method.
图5为本说明书实施例提供的对应于图2的一种基于GPU虚拟化的资源复用设备的结构示意图。如图5所示,设备500可以包括:至少一个处理器510;以及,与所述至少一个处理器通信连接的存储器530;其中,所述存储器530存储有可被所述至少一个处理器510执行的指令520,所述指令被所述至少一个处理器510执行,以使所述至少一个处理器510能够:获取AI框架层发送的用于创建第一资源的第一API调用请求;确定预先存储的与所述第一资源相匹配的数据所在的内存地址;所述与所述第一资源相匹配的数据包含对于所述第一资源的设置参数;向所述AI框架层反馈所述与所述第一资源相匹配的数据所在的内存地址;获取所述AI框架层发送的用于对所述第一资源进行设置的第二API调用请求;向所述AI框架层反馈用于表示设置成功的消息;获取所述AI框架层发送的用于基于所述第一资源进行计算的第三API调用请求;基于所述第三API调用请求,生成针对所述第一资源的第一计算指令。将所述第一计算指令以及所述与所述第一资源相匹配的数据发送至GPU驱动。FIG. 5 is a schematic structural diagram of a resource multiplexing device based on GPU virtualization corresponding to FIG. 2 provided by an embodiment of this specification. As shown in FIG. 5, the device 500 may include: at least one processor 510; and, a memory 530 communicatively connected to the at least one processor; wherein, the memory 530 stores data that can be executed by the at least one processor 510 The instruction 520 is executed by the at least one processor 510, so that the at least one processor 510 can: obtain the first API call request sent by the AI framework layer for creating the first resource; determine to store in advance The memory address where the data that matches the first resource is located; the data that matches the first resource includes the setting parameters for the first resource; and the data that matches the first resource is fed back to the AI framework layer. The memory address where the data matched by the first resource is located; obtain the second API call request sent by the AI framework layer for setting the first resource; feedback to the AI framework layer to indicate that the setting is successful Get the third API call request sent by the AI framework layer for calculation based on the first resource; based on the third API call request, generate a first calculation instruction for the first resource. Sending the first calculation instruction and the data matching the first resource to a GPU driver.
在本说明书实施例中,基于GPU虚拟化的资源复用设备通过预先存储对于第一资源的设置参数,令该设备中搭载的GPU虚拟化系统中的客户端可以先行反馈AI框架层发送的第一API调用请求以及第二API调用请求,且无需将第一API调用请求以及第二API调用请求发送至GPU驱动进行处理,因此,AI框架层无需等待GPU驱动反馈对于第一API调用请求与第二API调用请求的处理结果,以减少AI框架层等待请求响应的耗时,进而提升AI任务的执行效率。In the embodiment of this specification, the GPU virtualization-based resource multiplexing device pre-stores the setting parameters for the first resource, so that the client in the GPU virtualization system carried in the device can first feed back the first data sent by the AI framework layer. One API call request and the second API call request, and there is no need to send the first API call request and the second API call request to the GPU driver for processing. Therefore, the AI framework layer does not need to wait for the GPU driver’s feedback on the first API call request and the first API call request. 2. The processing result of the API call request to reduce the time consumption of the AI framework layer waiting for the request response, thereby improving the execution efficiency of the AI task.
在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的 硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable Gate Array,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等,目前最普遍使用的是VHDL(Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。In the 1990s, the improvement of a technology can be clearly distinguished between hardware improvements (for example, improvements in circuit structures such as diodes, transistors, switches, etc.) or software improvements (improvements in method flow). However, with the development of technology, the improvement of many methods and processes of today can be regarded as a direct improvement of the hardware circuit structure. Designers almost always get the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be realized by the hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (for example, a Field Programmable Gate Array (Field Programmable Gate Array, FPGA)) is such an integrated circuit whose logic function is determined by the user's programming of the device. It is programmed by the designer to "integrate" a digital system on a piece of PLD, without requiring chip manufacturers to design and manufacture dedicated integrated circuit chips. Moreover, nowadays, instead of manually making integrated circuit chips, this kind of programming is mostly realized with "logic compiler" software, which is similar to the software compiler used in program development and writing, but before compilation The original code must also be written in a specific programming language, which is called Hardware Description Language (HDL), and there is not only one type of HDL, but many types, such as ABEL (Advanced Boolean Expression Language) , AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description), etc., currently most commonly used It is VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. It should also be clear to those skilled in the art that just a little bit of logic programming of the method flow in the above-mentioned hardware description languages and programming into an integrated circuit can easily obtain the hardware circuit that implements the logic method flow.
控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。The controller can be implemented in any suitable manner. For example, the controller can take the form of, for example, a microprocessor or a processor and a computer-readable medium storing computer-readable program codes (such as software or firmware) executable by the (micro)processor. , Logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers. Examples of controllers include but are not limited to the following microcontrollers: ARC625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as part of the memory control logic. Those skilled in the art also know that, in addition to implementing the controller in a purely computer-readable program code manner, it is entirely possible to program the method steps to make the controller use logic gates, switches, application specific integrated circuits, programmable logic controllers, and embedded logic. The same function can be realized in the form of a microcontroller or the like. Therefore, such a controller can be regarded as a hardware component, and the devices included in it for realizing various functions can also be regarded as a structure within the hardware component. Or even, the device for realizing various functions can be regarded as both a software module for realizing the method and a structure within a hardware component.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。The systems, devices, modules, or units illustrated in the above embodiments may be specifically implemented by computer chips or entities, or implemented by products with certain functions. A typical implementation device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a cell phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Any combination of these devices.
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above device, the functions are divided into various units and described separately. Of course, when implementing this application, the functions of each unit can be implemented in the same or multiple software and/or hardware.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present invention. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are generated It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, the computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。The memory may include non-permanent memory in a computer-readable medium, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer readable media.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数 据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带式磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. Information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or equipment including a series of elements not only includes those elements, but also includes Other elements that are not explicitly listed, or also include elements inherent to such processes, methods, commodities, or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, commodity, or equipment that includes the element.
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。This application may be described in the general context of computer-executable instructions executed by a computer, such as a program module. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. This application can also be practiced in distributed computing environments. In these distributed computing environments, tasks are performed by remote processing devices connected through a communication network. In a distributed computing environment, program modules can be located in local and remote computer storage media including storage devices.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。The various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the difference from other embodiments. In particular, as for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above descriptions are only examples of the present application, and are not used to limit the present application. For those skilled in the art, this application can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the scope of the claims of this application.

Claims (17)

  1. 一种基于GPU虚拟化的资源复用方法,应用于GPU虚拟化系统中的客户端,包括:A resource reuse method based on GPU virtualization, applied to a client in a GPU virtualization system, includes:
    获取AI框架层发送的用于创建第一资源的第一API调用请求;Obtain the first API call request sent by the AI framework layer for creating the first resource;
    确定预先存储的与所述第一资源相匹配的数据所在的内存地址;所述与所述第一资源相匹配的数据包含对于所述第一资源的设置参数;Determining a memory address where pre-stored data matching the first resource is located; the data matching the first resource includes setting parameters for the first resource;
    向所述AI框架层反馈所述与所述第一资源相匹配的数据所在的内存地址;Feedback to the AI framework layer the memory address where the data matching the first resource is located;
    获取所述AI框架层发送的用于对所述第一资源进行设置的第二API调用请求;Acquiring a second API call request sent by the AI framework layer for setting the first resource;
    向所述AI框架层反馈用于表示设置成功的消息;Feedback to the AI framework layer a message indicating that the setting is successful;
    获取所述AI框架层发送的用于基于所述第一资源进行计算的第三API调用请求;Acquiring a third API call request sent by the AI framework layer for calculation based on the first resource;
    基于所述第三API调用请求,生成针对所述第一资源的第一计算指令;Generating a first calculation instruction for the first resource based on the third API call request;
    将所述第一计算指令以及所述与所述第一资源相匹配的数据发送至GPU驱动。Sending the first calculation instruction and the data matching the first resource to a GPU driver.
  2. 如权利要求1所述的方法,所述获取AI框架层发送的用于创建第一资源的API调用请求之前,还包括:The method according to claim 1, before acquiring the API call request sent by the AI framework layer for creating the first resource, the method further comprises:
    在AI任务启动后的预热阶段,获取AI框架层发送的用于创建第二资源的第四API调用请求;In the warm-up phase after the AI task is started, obtain the fourth API call request sent by the AI framework layer to create the second resource;
    创建所述第二资源,将所述第二资源对应的数据存储在所述内存地址;Creating the second resource, and storing data corresponding to the second resource in the memory address;
    向所述AI框架层反馈所述内存地址;Feedback the memory address to the AI framework layer;
    获取所述AI框架层发送的用于对所述第二资源进行设置的第五API调用请求;Acquiring a fifth API call request sent by the AI framework layer for setting the second resource;
    基于所述第五API调用请求,对所述内存地址中的数据进行设置;Setting the data in the memory address based on the fifth API call request;
    向所述AI框架层反馈用于表示设置成功的消息;Feedback to the AI framework layer a message indicating that the setting is successful;
    获取所述AI框架层发送的用于基于所述第二资源进行计算的第六API调用请求;Acquiring a sixth API call request sent by the AI framework layer for calculation based on the second resource;
    基于所述第六API调用请求,生成针对所述第二资源的第二计算指令;Generating a second calculation instruction for the second resource based on the sixth API call request;
    将所述第二计算指令发送至GPU驱动。Send the second calculation instruction to the GPU driver.
  3. 如权利要求2所述的方法,所述获取所述AI框架层发送的用于基于所述第二资源进行计算的第六API调用请求之后,还包括:The method according to claim 2, after obtaining the sixth API call request sent by the AI framework layer for calculation based on the second resource, the method further comprises:
    获取所述AI框架层发送的用于对所述第二资源对应的数据进行删除的第七API调用请求;Acquiring a seventh API call request sent by the AI framework layer for deleting data corresponding to the second resource;
    保留所述第二资源对应的数据;Retain the data corresponding to the second resource;
    向所述AI框架层反馈用于表示删除成功的消息。Feed back to the AI framework layer a message indicating that the deletion is successful.
  4. 如权利要求1所述的方法,所述确定预先存储的与所述第一资源相匹配的数据所在的内存地址,具体包括:The method according to claim 1, wherein the determining a memory address where the pre-stored data matching the first resource is located specifically includes:
    确定所述第一API调用请求对应的计算流;Determine the calculation flow corresponding to the first API call request;
    从所述计算流对应的队列中读取队首存储的地址指针;所述地址指针指向与所述第一资源相匹配的数据所在的内存地址。Read the address pointer stored at the head of the queue from the queue corresponding to the calculation stream; the address pointer points to the memory address where the data matching the first resource is located.
  5. 如权利要求4所述的方法,所述从所述计算流对应的队列中读取队首存储的地址指针之后,还包括:5. The method according to claim 4, after reading the address pointer stored at the head of the queue from the queue corresponding to the calculation stream, the method further comprises:
    将所述地址指针从所述队列的队首删除;Deleting the address pointer from the head of the queue;
    将所述地址指针写入所述队列的队尾。Write the address pointer to the end of the queue.
  6. 如权利要求3所述的方法,所述保留所述第二资源对应的数据之后,还包括:5. The method according to claim 3, after the reserving the data corresponding to the second resource, further comprising:
    确定所述第四API调用请求对应的计算流;Determine the calculation flow corresponding to the fourth API call request;
    将所述第二资源对应的数据的存储地址的地址指针,写入所述计算流对应的队列中。The address pointer of the storage address of the data corresponding to the second resource is written into the queue corresponding to the calculation flow.
  7. 如权利要求1所述的方法,所述获取AI框架层发送的用于创建第一资源的第一API调用请求之前,还包括:The method according to claim 1, before acquiring the first API call request sent by the AI framework layer for creating the first resource, the method further comprises:
    判断当前一轮的迭代过程是否计算完毕,得到判断结果;Judge whether the calculation of the current round of iterative process is completed, and get the judgment result;
    所述获取AI框架层发送的用于创建第一资源的第一API调用请求,具体包括:The acquiring the first API call request sent by the AI framework layer for creating the first resource specifically includes:
    当所述判断结果表示当前一轮的迭代过程计算完毕时,获取AI框架层发送的用于创建第一资源的第一API调用请求。When the judgment result indicates that the calculation of the current iteration process is completed, the first API call request sent by the AI framework layer for creating the first resource is obtained.
  8. 如权利要求7所述的方法,所述判断当前一轮的迭代过程是否计算完毕,具体包括:8. The method according to claim 7, wherein said determining whether the calculation of the current round of iterative process is completed includes:
    记录AI任务中的模型的第一层的输出结果的存储地址对应的第一地址指针;Record the first address pointer corresponding to the storage address of the output result of the first layer of the model in the AI task;
    在后向梯度传播过程中,监测当前的输入数据的存储地址对应的第二地址指针;During the backward gradient propagation process, monitor the second address pointer corresponding to the storage address of the current input data;
    判断所述第二地址指针是否与所述第一地址指针相同。It is determined whether the second address pointer is the same as the first address pointer.
  9. 一种基于GPU虚拟化的资源复用装置,应用于GPU虚拟化系统中的客户端,包括:A resource multiplexing device based on GPU virtualization, applied to a client in a GPU virtualization system, includes:
    第一获取模块,用于获取AI框架层发送的用于创建第一资源的第一API调用请求;The first obtaining module is used to obtain the first API call request sent by the AI framework layer for creating the first resource;
    第一确定模块,用于确定预先存储的与所述第一资源相匹配的数据所在的内存地址;所述与所述第一资源相匹配的数据包含对于所述第一资源的设置参数;A first determining module, configured to determine a memory address where pre-stored data matching the first resource is located; the data matching the first resource includes setting parameters for the first resource;
    第一反馈模块,用于向所述AI框架层反馈所述与所述第一资源相匹配的数据所在的内存地址;The first feedback module is configured to feed back to the AI framework layer the memory address where the data matching the first resource is located;
    第二获取模块,用于获取所述AI框架层发送的用于对所述第一资源进行设置的第二API调用请求;The second acquisition module is configured to acquire a second API call request sent by the AI framework layer for setting the first resource;
    第二反馈模块,用于向所述AI框架层反馈用于表示设置成功的消息;The second feedback module is configured to feed back a message indicating that the setting is successful to the AI framework layer;
    第三获取模块,用于获取所述AI框架层发送的用于基于所述第一资源进行计算的第三API调用请求;The third obtaining module is configured to obtain a third API call request sent by the AI framework layer for calculation based on the first resource;
    第一计算指令生成模块,用于基于所述第三API调用请求,生成针对所述第一资源 的第一计算指令;A first calculation instruction generation module, configured to generate a first calculation instruction for the first resource based on the third API call request;
    第一发送模块,用于将所述第一计算指令以及所述与所述第一资源相匹配的数据发送至GPU驱动。The first sending module is configured to send the first calculation instruction and the data matching the first resource to the GPU driver.
  10. 如权利要求9所述的装置,所述装置还包括:The device according to claim 9, further comprising:
    第四获取模块,用于在AI任务启动后的预热阶段,获取AI框架层发送的用于创建第二资源的第四API调用请求;The fourth acquisition module is used to acquire the fourth API call request sent by the AI framework layer for creating the second resource in the warm-up phase after the AI task is started;
    创建模块,用于创建所述第二资源,将所述第二资源对应的数据存储在所述内存地址;A creation module, configured to create the second resource, and store data corresponding to the second resource in the memory address;
    第三反馈模块,用于向所述AI框架层反馈所述内存地址;The third feedback module is configured to feed back the memory address to the AI framework layer;
    第五获取模块,用于获取所述AI框架层发送的用于对所述第二资源进行设置的第五API调用请求;A fifth acquisition module, configured to acquire a fifth API call request sent by the AI framework layer for setting the second resource;
    设置模块,用于基于所述第五API调用请求,对所述内存地址中的数据进行设置;A setting module, configured to set the data in the memory address based on the fifth API call request;
    第四反馈模块,用于向所述AI框架层反馈用于表示设置成功的消息;The fourth feedback module is used to feed back a message indicating that the setting is successful to the AI framework layer;
    第六获取模块,用于获取所述AI框架层发送的用于基于所述第二资源进行计算的第六API调用请求;A sixth obtaining module, configured to obtain a sixth API call request sent by the AI framework layer for calculation based on the second resource;
    第二计算指令生成模块,用于基于所述第六API调用请求,生成针对所述第二资源的第二计算指令;A second calculation instruction generating module, configured to generate a second calculation instruction for the second resource based on the sixth API call request;
    第二发送模块,用于将所述第二计算指令发送至GPU驱动。The second sending module is configured to send the second calculation instruction to the GPU driver.
  11. 如权利要求10所述的装置,所述装置还包括:The device according to claim 10, further comprising:
    第七获取模块,用于获取所述AI框架层发送的用于对所述第二资源对应的数据进行删除的第七API调用请求;A seventh obtaining module, configured to obtain a seventh API call request sent by the AI framework layer for deleting data corresponding to the second resource;
    保留模块,用于保留所述第二资源对应的数据;A reservation module, used to reserve data corresponding to the second resource;
    第五反馈模块,用于向所述AI框架层反馈用于表示删除成功的消息。The fifth feedback module is used to feed back a message indicating successful deletion to the AI framework layer.
  12. 如权利要求9所述的装置,所述第一确定模块,具体用于:The device according to claim 9, wherein the first determining module is specifically configured to:
    确定所述第一API调用请求对应的计算流;Determine the calculation flow corresponding to the first API call request;
    从所述计算流对应的队列中读取队首存储的地址指针;所述地址指针指向与所述第一资源相匹配的数据所在的内存地址。Read the address pointer stored at the head of the queue from the queue corresponding to the calculation stream; the address pointer points to the memory address where the data matching the first resource is located.
  13. 如权利要求12所述的装置,所述装置还包括:The device of claim 12, the device further comprising:
    删除模块,用于将所述地址指针从所述队列的队首删除;A delete module, used to delete the address pointer from the head of the queue;
    第一写入模块,用于将所述地址指针写入所述队列的队尾。The first writing module is used to write the address pointer to the end of the queue.
  14. 如权利要求11所述的装置,所述装置还包括:The device according to claim 11, further comprising:
    第二确定模块,用于确定所述第四API调用请求对应的计算流;The second determining module is configured to determine the calculation flow corresponding to the fourth API call request;
    第二写入模块,用于将所述第二资源对应的数据的存储地址的地址指针,写入所述 计算流对应的队列中。The second writing module is configured to write the address pointer of the storage address of the data corresponding to the second resource into the queue corresponding to the calculation stream.
  15. 如权利要求9所述的装置,所述装置还包括:The device according to claim 9, further comprising:
    判断模块,用于判断当前一轮的迭代过程是否计算完毕,得到判断结果;The judgment module is used to judge whether the calculation of the current round of iterative process is completed, and the judgment result is obtained;
    所述第一获取模块,具体用于当所述判断结果表示当前一轮的迭代过程计算完毕时,获取AI框架层发送的用于创建第一资源的第一API调用请求。The first obtaining module is specifically configured to obtain the first API call request sent by the AI framework layer for creating the first resource when the judgment result indicates that the calculation of the current iteration process is completed.
  16. 如权利要求15所述的装置,所述判断模块,具体用于:The device according to claim 15, wherein the judgment module is specifically configured to:
    记录AI任务中的模型的第一层的输出结果的存储地址对应的第一地址指针;Record the first address pointer corresponding to the storage address of the output result of the first layer of the model in the AI task;
    在后向梯度传播过程中,监测当前的输入数据的存储地址对应的第二地址指针;During the backward gradient propagation process, monitor the second address pointer corresponding to the storage address of the current input data;
    判断所述第二地址指针是否与所述第一地址指针相同。It is determined whether the second address pointer is the same as the first address pointer.
  17. 一种基于GPU虚拟化的资源复用设备,包括:A resource multiplexing device based on GPU virtualization, including:
    至少一个处理器;以及,At least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,A memory communicatively connected with the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can:
    获取AI框架层发送的用于创建第一资源的第一API调用请求;Obtain the first API call request sent by the AI framework layer for creating the first resource;
    确定预先存储的与所述第一资源相匹配的数据所在的内存地址;所述与所述第一资源相匹配的数据包含对于所述第一资源的设置参数;Determining a memory address where pre-stored data matching the first resource is located; the data matching the first resource includes setting parameters for the first resource;
    向所述AI框架层反馈所述与所述第一资源相匹配的数据所在的内存地址;Feedback to the AI framework layer the memory address where the data matching the first resource is located;
    获取所述AI框架层发送的用于对所述第一资源进行设置的第二API调用请求;Acquiring a second API call request sent by the AI framework layer for setting the first resource;
    向所述AI框架层反馈用于表示设置成功的消息;Feedback to the AI framework layer a message indicating that the setting is successful;
    获取所述AI框架层发送的用于基于所述第一资源进行计算的第三API调用请求;Acquiring a third API call request sent by the AI framework layer for calculation based on the first resource;
    基于所述第三API调用请求,生成针对所述第一资源的第一计算指令;Generating a first calculation instruction for the first resource based on the third API call request;
    将所述第一计算指令以及所述与所述第一资源相匹配的数据发送至GPU驱动。Sending the first calculation instruction and the data matching the first resource to a GPU driver.
PCT/CN2020/134523 2020-01-14 2020-12-08 Resource reuse method, apparatus and device based on gpu virtualization WO2021143397A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010037822.2 2020-01-14
CN202010037822.2A CN110851285B (en) 2020-01-14 2020-01-14 Resource multiplexing method, device and equipment based on GPU virtualization

Publications (1)

Publication Number Publication Date
WO2021143397A1 true WO2021143397A1 (en) 2021-07-22

Family

ID=69610693

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/134523 WO2021143397A1 (en) 2020-01-14 2020-12-08 Resource reuse method, apparatus and device based on gpu virtualization

Country Status (2)

Country Link
CN (1) CN110851285B (en)
WO (1) WO2021143397A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851285B (en) * 2020-01-14 2020-04-24 支付宝(杭州)信息技术有限公司 Resource multiplexing method, device and equipment based on GPU virtualization
CN111427702A (en) * 2020-03-12 2020-07-17 北京明略软件系统有限公司 Artificial intelligence AI system and data processing method
EP4195045A4 (en) * 2020-08-14 2023-09-27 Huawei Technologies Co., Ltd. Data interaction method between main cpu and npu, and computing device
CN112035220A (en) * 2020-09-30 2020-12-04 北京百度网讯科技有限公司 Processing method, device and equipment for operation task of development machine and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216783A (en) * 2014-08-20 2014-12-17 上海交通大学 Method for automatically managing and controlling virtual GPU (Graphics Processing Unit) resource in cloud gaming
CN105242957A (en) * 2015-09-28 2016-01-13 广州云晫信息科技有限公司 Method and system for cloud computing system to allocate GPU resources to virtual machine
CN110058926A (en) * 2018-01-18 2019-07-26 伊姆西Ip控股有限责任公司 For handling the method, equipment and computer-readable medium of GPU task
CN110851285A (en) * 2020-01-14 2020-02-28 支付宝(杭州)信息技术有限公司 Resource multiplexing method, device and equipment based on GPU virtualization

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108248A (en) * 2017-12-28 2018-06-01 郑州云海信息技术有限公司 A kind of CPU+GPU cluster management methods, device and equipment for realizing target detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216783A (en) * 2014-08-20 2014-12-17 上海交通大学 Method for automatically managing and controlling virtual GPU (Graphics Processing Unit) resource in cloud gaming
CN105242957A (en) * 2015-09-28 2016-01-13 广州云晫信息科技有限公司 Method and system for cloud computing system to allocate GPU resources to virtual machine
CN110058926A (en) * 2018-01-18 2019-07-26 伊姆西Ip控股有限责任公司 For handling the method, equipment and computer-readable medium of GPU task
CN110851285A (en) * 2020-01-14 2020-02-28 支付宝(杭州)信息技术有限公司 Resource multiplexing method, device and equipment based on GPU virtualization

Also Published As

Publication number Publication date
CN110851285A (en) 2020-02-28
CN110851285B (en) 2020-04-24

Similar Documents

Publication Publication Date Title
WO2021143397A1 (en) Resource reuse method, apparatus and device based on gpu virtualization
WO2018177235A1 (en) Block chain consensus method and device
WO2018214835A1 (en) Block chain service acceptance and consensus method and device
CN112540806B (en) Method and device for rendering small program page, electronic equipment and storage medium
WO2021159820A1 (en) Data transmission and task processing methods, apparatuses and devices
CN108549562A (en) A kind of method and device of image load
WO2021223658A1 (en) Mini program update
EP3644254A1 (en) Virtual card opening method and system, payment system, and card issuing system
TW201944314A (en) Payment process configuration and execution method, apparatus and device
TW201942843A (en) Account keeping method and device
WO2021143371A1 (en) Method, apparatus, and device for generating applet page
WO2020119804A1 (en) Page view display method, apparatus, device and storage medium
WO2021164368A1 (en) Container application starting method, system, and apparatus, and electronic device
WO2020199709A1 (en) Method and system for refershing cascaded cache, and device
JP2020027616A (en) Command execution method and device
TW201918878A (en) Task execution method and apparatus
TWI694392B (en) Method and device for displaying unique identifier of digital object
KR20160130491A (en) Application dehydration and rehydration during application-to-application calls
TW201913361A (en) Page update method and apparatus
CN113765818A (en) Distributed current limiting method, device, equipment, storage medium and system
CN113312182A (en) Cloud computing node, file processing method and device
CN111597035A (en) Simulation engine time advancing method and system based on multiple threads
WO2023231342A1 (en) Method and apparatus for automatically executing contract on the basis of variable state
TW202005423A (en) Scanning starting and stopping method for wireless device, and wireless device
WO2015135404A1 (en) Method and apparatus for downloading data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20913375

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20913375

Country of ref document: EP

Kind code of ref document: A1