WO2021159820A1 - Data transmission and task processing methods, apparatuses and devices - Google Patents

Data transmission and task processing methods, apparatuses and devices Download PDF

Info

Publication number
WO2021159820A1
WO2021159820A1 PCT/CN2020/132846 CN2020132846W WO2021159820A1 WO 2021159820 A1 WO2021159820 A1 WO 2021159820A1 CN 2020132846 W CN2020132846 W CN 2020132846W WO 2021159820 A1 WO2021159820 A1 WO 2021159820A1
Authority
WO
WIPO (PCT)
Prior art keywords
address
virtual address
physical memory
request
gpu
Prior art date
Application number
PCT/CN2020/132846
Other languages
French (fr)
Chinese (zh)
Inventor
赵军平
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021159820A1 publication Critical patent/WO2021159820A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/109Address translation for multiple virtual address spaces, e.g. segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0873Mapping of cache memory to specific storage devices or parts thereof

Definitions

  • This application relates to the field of computer technology, in particular to a data transmission and task processing method, device and equipment.
  • Deep Learning is widely used in the field of Artificial Intelligence (AI).
  • AI Artificial Intelligence
  • Typical DL tasks require strong computing power. Therefore, most tasks currently run on acceleration devices such as Graphics Processing Unit (GPU). Accelerator chips represented by graphics processors are an important guarantee for the development and landing of AI.
  • GPU Graphics Processing Unit
  • Accelerator chips represented by graphics processors are an important guarantee for the development and landing of AI.
  • the embodiments of the present application provide a data transmission and task processing method, device, and equipment for improving the efficiency of GPU resource virtualization.
  • An embodiment of this specification provides a data transmission method, which is applied to a server in a GPU virtualization system, and includes: obtaining a data transmission request sent by a client; obtaining a first virtual address in the data transmission request; Obtain the physical memory address corresponding to the first virtual address; determine the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address; obtain the GPU address allocated for the data transmission request; generate A data copy instruction from the second virtual address to the GPU address; calling an interface driven by the GPU to execute the data copy instruction.
  • An embodiment of this specification provides a task processing method, which is applied to a server in a GPU virtualization system, and includes: obtaining a task calculation request sent by a client; obtaining a first virtual address in the task calculation request; Obtain the physical memory address corresponding to the first virtual address; determine the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address; obtain the GPU address allocated for the task calculation request; generate The data copy instruction from the second virtual address to the GPU address; call the GPU driver interface to execute the data copy instruction; send the task calculation request to the GPU; when the GPU completes the task calculation request corresponding After the calculation task, the processing state information corresponding to the task calculation request is generated; the processing state information is stored.
  • the embodiment of this specification provides a data transmission method, which is applied to a client in a GPU virtualization system, and includes: obtaining a data transmission request sent by an application; obtaining a first virtual address in the data transmission request; based on The mapping relationship between the physical memory address and the virtual address is determined, and the physical memory address corresponding to the first virtual address is determined; the data transmission request and the physical memory address are sent to the server, so that the server transmits according to the data Request and the physical memory address to perform data transmission, wherein the server determines the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address; obtains the GPU allocated for the data transmission request Address; generate a data copy instruction from the second virtual address to the GPU address; call an interface driven by the GPU to execute the data copy instruction.
  • the embodiment of this specification provides a task processing method, which is applied to a client in a GPU virtualization system, and includes: obtaining a task processing request sent by an application; forwarding the task processing request so that the server can obtain it; When the processing status information of the task processing request sent by the server is obtained, a synchronization request is issued; when the success notification of the synchronization request sent by the server is obtained, the first of the task processing request is obtained.
  • Virtual address based on the mapping relationship between the physical memory address and the virtual address, determine the physical memory address corresponding to the first virtual address; read the calculation result of the task processing request from the physical memory address corresponding to the first virtual address ; Send the calculation result of the task processing request to the application.
  • a data transmission device includes: a data transmission request acquisition module, configured to acquire a data transmission request sent by a client; a first virtual address acquisition module, configured to acquire the first data transmission request in the data transmission request; A virtual address; a physical memory address acquiring module for acquiring the physical memory address corresponding to the first virtual address; a second virtual address determining module for determining the physical memory address based on the mapping relationship between the physical memory address and the virtual address Corresponding second virtual address; GPU address acquisition module, used to acquire the GPU address allocated for the data transmission request; data copy instruction generation module, used to generate a data copy from the second virtual address to the GPU address Instruction; interface calling module, used to call the GPU-driven interface to execute the data copy instruction.
  • a task processing device includes: a task calculation request obtaining module, which is used to obtain a task calculation request sent by a client; and a first virtual address obtaining module, which is used to obtain the first task calculation request in the task calculation request.
  • a data transmission device includes: a data transmission request obtaining module, which is used to obtain a data transmission request sent by an application; a first virtual address obtaining module, which is used to obtain a first virtual address in the data transmission request. Address; physical memory address determining module, used to determine the physical memory address corresponding to the first virtual address based on the mapping relationship between physical memory address and virtual address; data transmission request and the physical memory address sending module, used to transfer all The data transmission request and the physical memory address are sent to the server, so that the server performs data transmission according to the data transmission request and the physical memory address, wherein the server is based on the mapping of the physical memory address and the virtual address Relationship, determine the second virtual address corresponding to the physical memory address; obtain the GPU address allocated for the data transmission request; generate a data copy instruction from the second virtual address to the GPU address; call the GPU driver interface Execute the data copy instruction.
  • a task processing device includes: a task processing request acquisition module for acquiring a task processing request sent by an application; a task processing request forwarding module for forwarding the task processing request so that the server can obtain it Synchronization request sending module, used to send a synchronization request when the processing status information of the task processing request sent by the server is obtained; the first virtual address obtaining module, used to obtain the information sent by the server After the successful notification of the synchronization request, the first virtual address of the task processing request is obtained; the physical memory address determination module is configured to determine the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address A memory address; a calculation result reading module for reading the calculation result of the task processing request from the physical memory address corresponding to the first virtual address; a calculation result sending module for calculating the task processing request The result is sent to the application.
  • the embodiment of this specification provides a data transmission device, including: a data transmission request obtaining module, used to obtain a data transmission request sent by an application; a first virtual address obtaining module, used to obtain a first virtual address in the data transmission request
  • the physical memory address determination module is used to determine the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address; the data transmission request and the physical memory address sending module are used to transfer the The data transmission request and the physical memory address are sent to the server, so that the server performs data transmission according to the data transmission request and the physical memory address, wherein the server is based on the mapping relationship between the physical memory address and the virtual address , Determine the second virtual address corresponding to the physical memory address; obtain the GPU address allocated for the data transmission request; generate a data copy instruction from the second virtual address to the GPU address; call the GPU driver interface to execute The data copy instruction.
  • a task processing device includes: a task processing request acquisition module for acquiring a task processing request sent by an application; a task processing request forwarding module for forwarding the task processing request so that the server can obtain it Synchronization request sending module, used to send a synchronization request when the processing status information of the task processing request sent by the server is obtained; the first virtual address obtaining module, used to obtain the information sent by the server After the successful notification of the synchronization request, the first virtual address of the task processing request is acquired; the physical memory address determining module is configured to determine the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address A memory address; a calculation result reading module for reading the calculation result of the task processing request from the physical memory address corresponding to the first virtual address; a calculation result sending module for calculating the task processing request The result is sent to the application.
  • a data transmission device includes: at least one processor; Instruction, the instruction is executed by the at least one processor, so that the at least one processor can: obtain the data transmission request sent by the client; obtain the first virtual address in the data transmission request; obtain the first virtual address A physical memory address corresponding to a virtual address; determine the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address; obtain the GPU address allocated for the data transmission request; 2.
  • a data copy instruction from a virtual address to the GPU address; calling an interface driven by the GPU to execute the data copy instruction.
  • a task processing device includes: at least one processor; Instruction, the instruction is executed by the at least one processor, so that the at least one processor can: obtain the task calculation request sent by the client; obtain the first virtual address in the task calculation request; obtain the first virtual address in the task calculation request; A physical memory address corresponding to a virtual address; determine the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address; obtain the GPU address allocated for the task calculation request; 2.
  • a data transmission device includes: at least one processor; Instructions, the instructions are executed by the at least one processor, so that the at least one processor can: obtain the data transmission request sent by the application; obtain the first virtual address in the data transmission request; based on the physical memory address and The mapping relationship of the virtual address is determined, and the physical memory address corresponding to the first virtual address is determined; the data transmission request and the physical memory address are sent to the server, so that the server can perform the The physical memory address performs data transmission, wherein the server determines the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address; obtains the GPU address allocated for the data transmission request; and generates the slave A data copy instruction from the second virtual address to the GPU address; calling an interface driven by the GPU to execute the data copy instruction.
  • a task processing device includes: at least one processor; Instruction, the instruction is executed by the at least one processor, so that the at least one processor can: obtain the task processing request sent by the application; forward the task processing request so that the server can obtain it; when the When the processing status information of the task processing request sent by the server is sent, a synchronization request is issued; when the success notification of the synchronization request sent by the server is obtained, the first virtual address of the task processing request is obtained; based on The mapping relationship between the physical memory address and the virtual address is determined, the physical memory address corresponding to the first virtual address is determined; the calculation result of the task processing request is read from the physical memory address corresponding to the first virtual address; The calculation result of the task processing request is sent to the application.
  • An embodiment of the present specification provides a computer-readable medium having computer-readable instructions stored thereon, and the computer-readable instructions can be executed by a processor to implement the foregoing method.
  • the above-mentioned at least one technical solution adopted in the embodiments of this specification can achieve the following beneficial effects: by mapping the physical memory address with the first virtual address of the client and the second virtual address of the server respectively, that is, the client and the server share the same physical In the memory, the data copy instruction is generated to directly copy the data in the physical memory address to the GPU address. Because the first virtual address of the client and the second virtual address of the server are retained, the original program is not changed, and transparency is realized. Moreover, only one copy of the data from the physical memory address to the GPU address has been experienced, which reduces the number of data memory copies. Therefore, there is no need to allocate temporary memory for the client and server to store the copied data, which significantly improves the utilization rate and effectively reduces the cost. , Improve the efficiency of GPU resource virtualization.
  • Figure 1 is a schematic diagram of the GPU software virtualization process based on request-data forwarding (multiple memory copies);
  • FIG. 2 is a schematic diagram of the results of the overall GPU virtualization module based on transparent memory sharing provided by an embodiment of this specification;
  • FIG. 3 is a schematic flowchart of a data transmission method provided by an embodiment of this specification.
  • FIG. 5 is a schematic diagram of multi-queue management provided by an embodiment of this specification.
  • FIG. 6 is a schematic flowchart of another data transmission method provided by an embodiment of this specification.
  • FIG. 7 is a schematic flowchart of another task processing method provided by an embodiment of the specification.
  • FIG. 8 is a schematic structural diagram of a data transmission device corresponding to FIG. 3 provided by an embodiment of this specification.
  • FIG. 9 is a schematic structural diagram of a data transmission device corresponding to FIG. 3 provided by an embodiment of this specification.
  • Transparent memory sharing On the data plane, for the purpose of no changes to existing programs, no data movement, and transparent virtualization, a specially designed and optimized method of sharing data.
  • this solution uses a transparent memory sharing mechanism to implement GPU control requests and data exchange.
  • FIG. 2 is a schematic diagram of the results of the overall GPU virtualization module based on transparent memory sharing provided by an embodiment of this specification. as shown in picture 2:
  • Models and applications models, such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Generative Adversarial Networks (Generative) Adversarial Networks, GAN, etc.
  • Applications include model training or model online services.
  • AL framework layer common DL frameworks, such as TensorFlow, PyTorch, Caffe2, etc.
  • GPU Server responsible for GPU service and virtualization management, long running daemon running on the GPU driver.
  • a GPU server has 1 service instance (which can be packaged and run in docker), and according to configuration strategies (such as environment variables or configuration files), it divides and pre-allocates virtual GPU resources, saves the mapping relationship between virtual and physical resources, and Report to the cluster scheduler (such as K8S, Kubemaker).
  • the client lib (for example, packaged together as a docker image) together with the application model is responsible for the discovery, application, access and necessary built-in optimization of virtual GPU resources, and records the correspondence between virtual and physical resources.
  • the client exports the GPU access API to the application, such as Nvidia CUDA (only internal resource application and low-level implementation for decoupling).
  • One server or one physical GPU can run multiple clients.
  • a transparent shared memory module and an efficient GPU request processing module need to be arranged on both the server and the client.
  • the client and server in the embodiments of this specification are applied to the client and server in the GPU virtualization system, which are different from the client and server in the conventional sense. They are modeled after the functions and roles of the client and server in the conventional sense, and are constructed using software programming, and are not entities.
  • Scheduler GPU resource scheduler within the cluster, such as K8S, the client application needs to apply for GPU resources from the scheduler first, and then the scheduler is responsible for scheduling execution.
  • FIG. 3 is a schematic flowchart of a data transmission method provided by an embodiment of this specification. From a program point of view, the execution subject of the process can be the server applied to the GPU virtualization system.
  • the process may include step 302 to step 314.
  • Step 302 Obtain a data transmission request sent by the client.
  • the data transmission request is initiated by the client application, and the client receives the data transmission request initiated by the application, and then forwards the data transmission request to the server.
  • the data transmission request can be an independent data request or a subtask in a task processing request. For example, if the GPU is required to complete a calculation task, then the data to be calculated must first be transmitted to the GPU address, and then the GPU will perform calculations based on the data in the GPU address. Then the data output request in step 302 may be a subtask of data transmission of the foregoing computing task.
  • Step 304 Obtain the first virtual address in the data transmission request.
  • Virtual Address (Virtual Address) identifies a non-physical physical address.
  • the direction of the data transmission is often included, that is, the data is transmitted from one address to another address.
  • the data request can only use the virtual address, and cannot include the actual physical address. This is done to prevent the client from directly calling the physical address and performing unsafe behaviors such as data tampering. Therefore, the address in the data transmission request is a virtual address.
  • the first virtual address is used to distinguish it from other virtual addresses.
  • “first” has no other meaning.
  • Step 306 Obtain the physical memory address corresponding to the first virtual address.
  • the physical memory address corresponding to the first virtual address needs to be determined according to the relationship between the virtual address and the physical memory address.
  • the physical memory address is an address of a divided memory pool dedicated to storing shared data, and the specific location can be represented by an offset.
  • the relationship between the virtual address and the physical memory address can be saved in a table and stored on the client.
  • the client may also send the physical memory address corresponding to the first virtual address to the client.
  • the server sends a request to the client to "get the physical memory address corresponding to the first virtual address", and after receiving the request, the client forwards the physical memory address corresponding to the first virtual address to the server.
  • Step 308 Determine a second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address.
  • the server In order to achieve no changes to existing programs, no data movement, and transparent memory sharing, after the server obtains the actual physical memory address corresponding to the data, it also needs to convert to a virtual address that can be used and recognized in the server process.
  • the second virtual address is used for identification.
  • the second virtual address only appears in the program in the process of the server, and has nothing to do with the client.
  • the correspondence between the physical memory address and the second virtual address is unique, that is, a physical memory address corresponds to a unique second virtual address, and the description can be determined based on the mapping relationship between the physical memory address and the virtual address.
  • the second virtual address corresponding to the physical memory address.
  • the mapping relationship between the physical memory address and the virtual address can be represented by a mapping table.
  • mapping table Query whether the physical memory address is included in the mapping table. If it can be found, it means that the physical memory address has been mapped on the server side, and there is no need to perform the mapping. If it is not found, it means that the physical memory address is not mapped on the server side, so it needs to be mapped to obtain the second virtual address.
  • Step 310 Obtain the GPU address allocated for the data transmission request.
  • the data in the physical memory address is copied to the GPU address to realize the data calculation of the GPU. Therefore, it is also necessary to allocate a corresponding GPU address for the transmitted data.
  • the server can issue a GPU address allocation instruction according to the data transmission request, and the GPU driver calls the interface to complete the GPU address. Then the GPU driver sends the assigned GPU address to the server.
  • the server may also obtain the GPU address allocated for the data transmission request through other methods, which is not specifically limited in the embodiment of this specification.
  • the data transmission request may also include the length of the data.
  • the GPU address allocation subject can be the server or other execution subjects.
  • Step 312 Generate a data copy instruction from the second virtual address to the GPU address.
  • the data transfer instruction can be generated according to the original address and the transfer address of the data.
  • Step 314 Call the GPU driver interface to execute the data copy instruction.
  • the data copy instruction generated in step 312 requires the GPU driver to complete the interaction with the GPU. Therefore, it is necessary to call the GPU driver interface to execute the data copy instruction, that is, complete the data copy from the physical memory address to the GPU address.
  • the method in Figure 3 maps the physical memory address to the first virtual address of the client and the second virtual address of the server respectively, that is, the client and the server share the same physical memory, and the data copy instruction is generated directly to the physical memory address Copy the data in the GPU address. Because the first virtual address of the client and the second virtual address of the server are retained, the original program is not changed, and transparency is realized. Moreover, only one data copy from the physical memory address to the GPU address has been experienced, which reduces the number of data memory copies, so there is no need to allocate temporary memory for the client and server to store the copied data, which significantly improves the utilization rate and effectively reduces It reduces the cost and improves the efficiency of GPU resource virtualization.
  • the method may further include: determining whether the physical memory address is stored in a mapping table; if not, generating a second virtual address corresponding to the physical memory address. Virtual address, and store the mapping relationship between the physical memory address and the second virtual address in the mapping table; the determining the second virtual address corresponding to the physical memory address specifically includes: if yes, obtaining all The second virtual address corresponding to the physical memory address.
  • the server determines the second virtual address corresponding to the physical memory address. First, it needs to determine whether the physical memory address exists in the mapping table. If it exists, it means that the physical memory address on the server has been completed. Virtual address mapping. In other words, the physical memory is mapped only once on the server. If there is no physical memory address in the mapping table, it means that the physical memory address has not been mapped by the server. You can perform the mapping operation to generate a second virtual address for the physical memory address, and then compare the physical memory address with the second virtual address. The mapping relationship of the virtual address is stored.
  • FIG. 4 is a schematic flowchart of a task processing method provided by an embodiment of this specification. From a program point of view, the execution subject of the process can be the server applied to the GPU virtualization system. As shown in FIG. 4, the process may include step 402 to step 420.
  • Step 402 Obtain the task calculation request sent by the client.
  • the task calculation request can be various calculation tasks, such as matrix multiplication, convolution and so on.
  • the task calculation request is initiated by the application, and after the client obtains it, it is forwarded to the server.
  • Step 404 Obtain the first virtual address in the task calculation request.
  • the task calculation request some information related to the calculation data can be included. However, the task calculation request will not directly include these data, but the address where the data is stored is recorded. Since the actual physical memory address cannot be reflected in the client, it is expressed in the form of a virtual address. For example, if matrix A and matrix B are multiplied, then the first virtual address is the address representing the storage matrix A and matrix B. Among them, the first virtual address may be one or multiple, and the number is related to the specific task calculation request.
  • Step 406 Obtain the physical memory address corresponding to the first virtual address.
  • step 404 after the first virtual address is determined, it is also necessary to determine the physical memory address corresponding to the first virtual address. For details, refer to step 306 in the first embodiment.
  • Step 408 Determine a second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address.
  • Step 410 Obtain the GPU address allocated for the task calculation request.
  • the data in the physical memory address is copied to the GPU address to implement GPU data calculation. Therefore, it is also necessary to allocate a corresponding GPU address for the transmitted data.
  • the server can issue a GPU address allocation instruction according to the data transmission request, and the GPU driver calls the interface to complete the GPU address. Then the GPU driver sends the assigned GPU address to the server.
  • the server may also obtain the GPU address allocated for the data transmission request through other methods, which is not specifically limited in the embodiment of this specification.
  • the data transmission request may also include the length of the data.
  • the GPU address allocation subject can be the server or other execution subjects.
  • the GPU address allocated for the task calculation request may include the storage address of the calculation data, and may also include the storage address of the calculation result.
  • the allocation of GPU addresses can also be allocated according to the size of the data.
  • Step 412 Generate a data copy instruction from the second virtual address to the GPU address.
  • Step 414 Call the interface of the GPU driver to execute the data copy instruction.
  • Step 416 Send the task calculation request to the GPU.
  • Step 418 After the GPU completes the calculation task corresponding to the task calculation request, generate processing state information corresponding to the task calculation request.
  • the GPU obtains the data in the GPU address, and then performs calculations according to the task calculation request, and then stores the calculation result in the GPU address allocated for the calculation result. Then, the GPU will notify the server of the completed state of the calculation, and then the server generates processing state information corresponding to the task calculation request.
  • Step 420 Store the processing status information.
  • the server can actively send the processing status information to the client, and can also store the information process, which is convenient for the client to query.
  • the method in Figure 4 maps the physical memory address to the first virtual address of the client and the second virtual address of the server respectively, that is, the client and the server share the same physical memory, and the data copy instruction is generated to directly transfer the physical memory address Copy the data in the GPU address. Because the first virtual address of the client and the second virtual address of the server are retained, the original program is not changed, and transparency is realized. Moreover, only one copy of the data from the physical memory address to the GPU address has been experienced, which reduces the number of data memory copies. Therefore, there is no need to allocate temporary memory for the client and server to store the copied data, which significantly improves the utilization rate and effectively reduces the cost. , Improve the efficiency of GPU resource virtualization.
  • the first virtual address may have different functions, for example, one type is used to store calculation data, and the other type is used to store calculation results.
  • the first virtual address includes a calculation data acquisition virtual address and a calculation result storage virtual address
  • the second virtual address may specifically include: determining the second virtual address corresponding to the first physical memory address.
  • the first virtual address includes two types
  • the obtaining the GPU address allocated for the task calculation request specifically includes: obtaining the calculation data storage GPU address and the calculation result storage GPU address allocated for the task calculation request; and the generating from the second virtual
  • the data copy instruction from the address to the GPU address specifically includes: generating a data copy instruction from the second virtual address to the computing data storage GPU address.
  • the calculation data storage GPU address is used to store the data copied from the physical memory address, that is, to store the source data for calculation.
  • the calculation result storage GPU address is used to store the calculation result. After the GPU completes the calculation task, it temporarily stores the calculation result in the calculation result storage GPU address, and when the client calls it, it copies the data to the corresponding physical memory address.
  • the processing status information corresponding to the task calculation request may further include: when the calculation result synchronization request sent by the client is obtained, obtaining the The calculation result stores the second physical memory address corresponding to the virtual address; based on the mapping relationship between the physical memory address and the virtual address, the third virtual address corresponding to the second physical memory address is determined; the calculation result is stored in the GPU from the calculation result The data copy instruction copied to the third virtual address in the address; the interface of the GPU driver is invoked to execute the data copy instruction.
  • the client or application cannot obtain the calculation result. Therefore, after the client obtains that the GPU has completed the calculation task, it will issue a calculation result synchronization request, that is, copy the calculation result in the GPU address to the physical memory address for the client application to read.
  • Copying the calculation result from the GPU address to the physical memory address is the opposite process of copying the calculation data from the physical memory address to the GPU address. Since the server initiates the data copy request, it is necessary to determine which virtual address corresponds to the server side of the physical memory address where the calculation result is stored, that is, the third virtual address.
  • the first virtual address includes the calculation data acquisition virtual address and the calculation result storage virtual address
  • the second physical memory address corresponding to the calculation result storage virtual address may be determined based on the mapping relationship between the physical memory address and the virtual address.
  • the server determines the third virtual address corresponding to the second physical memory address according to the mapping relationship.
  • a data copy instruction for copying the calculation result from the calculation result storage GPU address to the third virtual address is generated, and the GPU driver interface is called to execute the data copy instruction.
  • the obtaining the task calculation request sent by the client may specifically include: obtaining the task calculation request sent by the client from a submission queue; the submission queue contains a plurality of unprocessed submissions submitted by the client The task calculation request; after generating the processing status information corresponding to the task calculation request, it further includes: sending the processing status information corresponding to the task calculation request to a completion queue, and the completion queue contains a plurality of the server Submitted processing status information that has not been read by the client.
  • the client When the client submits a GPU request, it puts the request in the submission queue and then returns (for example, for asynchronous requests), and then the worker thread is responsible for sending the request to the server; or the server actively queries for new requests in the request queue. After the server receives the request, it executes the processing and puts the processing result on the completion queue.
  • the client can asynchronously query the processing status information in the completion queue.
  • Multiple task calculation requests are stored in the submission queue, and they are sorted according to the time when they enter the submission queue.
  • the task calculation request that enters the submission queue first is first obtained by the server, and the task calculation request that enters the submission queue later is obtained by the server. For example, if task 1, task 2, and task 3 are submitted to the submission queue one after another, then the server will first obtain task 1, then task 2, and finally task 3 from the submission queue.
  • completion queues The same principle applies to completion queues.
  • This solution also combines the queuing mechanism with transparent shared memory, that is, all requests from the client and the server are allocated on the shared memory to avoid memory copies of the request message (request) encountered when the request is forwarded.
  • This method proposes to use an efficient software method to make GPU hardware as efficient and lossless as the CPU, thereby significantly improving the utilization rate and effectively reducing the cost; and in the case of exclusive use, it can also optimize the performance; in the case of large-scale deployment, pure software is used Virtual methods to simplify operation, maintenance and management.
  • FIG. 6 is a schematic flowchart of another data transmission method provided by an embodiment of this specification. From a program point of view, the execution subject of the process can be the client applied to the GPU virtualization system. As shown in FIG. 6, the process may include step 602 to step 608.
  • Step 602 Obtain a data transmission request sent by the application.
  • the application and the client are together, and the data transmission request sent by the application will be transmitted through the client.
  • Step 604 Obtain the first virtual address in the data transmission request.
  • the data address in the data transmission request is a virtual address, and the client first needs to obtain the first virtual address, and then perform related operations.
  • Step 606 Determine the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address.
  • the physical memory address corresponding to the first virtual address needs to be determined according to the relationship between the virtual address and the physical memory address.
  • the relationship between the virtual address and the physical memory address can be saved in a table and stored on the client. After obtaining the first virtual address, the client can query the stored correspondence between the first virtual address and the physical memory address, and then determine the physical memory address corresponding to the first virtual address.
  • Step 608 Send the data transmission request and the physical memory address to the server, so that the server performs data transmission according to the data transmission request and the physical memory address, wherein the server is based on the physical memory address A mapping relationship with a virtual address, determining a second virtual address corresponding to the physical memory address; acquiring a GPU address allocated for the data transmission request; generating a data copy instruction from the second virtual address to the GPU address; Call the GPU driver interface to execute the data copy instruction.
  • the data transmission method provided in the third embodiment and the data transmission method provided in the first embodiment are respectively described from the perspective of the client and the server, and many of the contents are similar.
  • the method in Figure 6 maps the physical memory address with the first virtual address of the client and the second virtual address of the server respectively, that is, the client and the server share the same physical memory, and the data copy instruction is generated directly into the physical memory address Copy the data to the GPU address. Because the first virtual address of the client and the second virtual address of the server are retained, the original program is not changed, and transparency is realized. Moreover, only one data copy from the physical memory address to the GPU address has been experienced, which reduces the number of data memory copies. Therefore, there is no need to allocate temporary memory for the client and server to store the copied data, which significantly improves the utilization rate and effectively reduces It reduces the cost and improves the efficiency of GPU resource virtualization.
  • the acquiring the data transmission request sent by the application may further include: acquiring the memory allocation request sent by the application; acquiring the data in the memory allocation request; Store to a first physical memory address; map the first physical memory address to the process space of the application to generate a first virtual address corresponding to the first physical memory address; send the first virtual address to all The application, and store the mapping relationship between the physical memory address and the first virtual address.
  • the application initiates a memory allocation request, such as calling malloc(len).
  • the client obtains the memory allocation request sent by the application, and allocates the required memory that meets the length requirement from the memory pool, such as a segment of the memory pool starting address offset and length L. And map the memory to the process space of the application to obtain the mapped virtual address H. Recording the virtual address H and the location information (offset offset) in the memory pool are stored in the mapping table, and can be recorded through a hash table. Then, the virtual address H is returned to the application, so that the application can perform normal data reading and writing.
  • the determining the physical memory address corresponding to the first virtual address may specifically include: determining the physical memory address corresponding to the first virtual address according to the mapping relationship.
  • FIG. 7 is a schematic flowchart of another task processing method provided by an embodiment of this specification. From a program point of view, the execution subject of the process can be the client applied to the GPU virtualization system. As shown in FIG. 7, the process may include step 702 to step 714.
  • Step 702 Obtain a task processing request sent by the application.
  • Step 704 Forward the task processing request so that the server can obtain it.
  • Step 706 When the processing status information of the task processing request sent by the server is obtained, a synchronization request is issued.
  • Step 708 After obtaining the successful notification of the synchronization request sent by the server, obtain the first virtual address of the task processing request.
  • Step 710 Determine the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address.
  • Step 712 Read the calculation result of the task processing request from the physical memory address corresponding to the first virtual address.
  • Step 714 Send the calculation result of the task processing request to the application.
  • the first virtual address includes a virtual address for obtaining calculation data and a virtual address for storing calculation results.
  • the obtaining the first virtual address of the task processing request may specifically include: obtaining a virtual address for storing a calculation result of the task processing request.
  • the determining the physical memory address corresponding to the first virtual address may specifically include: determining the physical memory address corresponding to the virtual address stored in the calculation result.
  • the forwarding the task processing request may specifically include: sending the task processing request to a submission queue, so that the server can obtain the task processing request from the submission queue, and the submission queue Contains multiple unprocessed task calculation requests submitted by the client.
  • a synchronization request is sent, which specifically includes: querying the completion queue, when the processing status information of the task processing request sent by the server is queried , Send a synchronization request, and the completion queue contains a plurality of processing status information submitted by the server that has not been read by the client.
  • obtaining the first virtual address of the task processing request specifically includes: querying the completion queue, and when querying the completion queue sent by the server After the successful notification of the synchronization request, the first virtual address of the task processing request is acquired.
  • the client When the client submits a GPU request, it puts the request in the submission queue and then returns (for example, for asynchronous requests), and then the worker thread is responsible for sending the request to the server; or the server actively queries for new requests in the request queue. After the server receives the request, it executes the processing and puts the processing result on the completion queue.
  • the client can asynchronously query the processing status information in the completion queue.
  • This solution also combines the queuing mechanism with transparent shared memory, that is, all requests from the client and the server are allocated on the shared memory to avoid memory copies of the request message (request) encountered when the request is forwarded.
  • This method proposes to use an efficient software method to make GPU hardware as efficient and lossless as the CPU, thereby significantly improving the utilization rate and effectively reducing the cost; and in the case of exclusive use, it can also optimize the performance; in the case of large-scale deployment, pure software is used Virtual methods to simplify operation, maintenance and management.
  • the execution subject is a machine equipped with a client and a server.
  • the method may include the following steps: the client obtains the task calculation request sent by the application; the client has a virtual memory sharing function; the client sends the task calculation request to the submission queue; the server receives the task calculation request from the submission queue Obtain the task calculation request; the server has a virtual memory sharing function; the server obtains the calculation data storage GPU address and the calculation result storage GPU address in the task calculation request; the server obtains the calculation data storage The first physical memory address corresponding to the GPU address; the server determines the second virtual address corresponding to the first physical memory address; the server obtains the GPU address allocated for the task calculation request; the server generates A data copy instruction from the second virtual address to the GPU address, so as to call an interface to execute the data copy from the physical memory address to the GPU address; the server sends the task calculation request to the GPU; After the GPU completes the calculation task corresponding to the task calculation request, the server generates processing status information corresponding to the
  • the program adopted by the method provided in the embodiment of this specification is in the user mode and can be applied in the user space. There are multiple implementation methods for different scenarios and flexible deployment. Summarized as follows:
  • Both the server and the client run on the host OS (such as linux), and the server takes over all GPU access through the GPU driver, including exclusive use of a certain GPU0 or sharing based on configuration Use GPU1. If the client and the server are on the same machine, the communication can be IPC (such as UNIX socket, Pipe or shmem); if not on the same machine, socket/RDMA communication is used.
  • IPC such as UNIX socket, Pipe or shmem
  • Containerized environment In a containerized environment, the server can run in a containerized manner, take over the physical GPU, and export virtual GPU resources.
  • the client (such as K8S pod) runs on the same physical machine and is connected to the server.
  • the communication between the client and the server can be IPC or network.
  • Virtual machine environment In a typical virtual machine environment, the GPU passes through to a specific physical machine, and then in the VM guest OS, the server or client is started, and then it is equivalent to a bare metal environment.
  • High performance A transparent memory sharing mechanism is used to avoid additional memory copying. Multi-queue request processing based on polling (polling) can efficiently respond to high-frequency request calls for typical deep learning tasks. Compared with known methods, the performance is significantly improved. The software virtualization using this method can achieve no performance loss, and the virtualization efficiency is significantly better than the known industrial/academic hardware and software virtualization solutions.
  • multiple request queues are provided for each device, including submission and completion queues, to improve scalability and cope with concurrent access by multiple cards.
  • the method supports a variety of deployment environments, can interface with all known AI frameworks and models, and is transparent and non-intrusive; the core method can be independent of the GPU device, and can also support other acceleration devices such as Ali AI chips and so on.
  • FIG. 8 is a schematic structural diagram of a data transmission device corresponding to FIG. 3 provided by an embodiment of the specification.
  • the device may include: a data transmission request obtaining module 801, configured to obtain a data transmission request sent by a client; and a first virtual address obtaining module 802, configured to obtain a first virtual address in the data transmission request.
  • physical memory address obtaining module 803, configured to obtain the physical memory address corresponding to the first virtual address
  • second virtual address determining module 804, configured to determine the physical memory based on the mapping relationship between the physical memory address and the virtual address The second virtual address corresponding to the address
  • the GPU address acquisition module 805, which is used to acquire the GPU address allocated for the data transmission request
  • the data copy instruction generation module 806, which is used to generate from the second virtual address to the GPU address
  • the interface calling module 807 is used to call the GPU-driven interface to execute the data copy instruction.
  • the device may further include: a judging module, configured to judge whether the physical memory address is stored in a mapping table.
  • the second virtual address generating module is configured to, if not, generate a second virtual address corresponding to the physical memory address, and store the mapping relationship between the physical memory address and the second virtual address in the mapping table.
  • the second virtual address determining module 804 may be specifically configured to: if yes, obtain the second virtual address corresponding to the physical memory address.
  • the embodiment of this specification also provides a task processing device corresponding to FIG. 4, the device includes: a task calculation request obtaining module, configured to obtain a task calculation request sent by a client; and a first virtual address obtaining module, configured to obtain The first virtual address in the task calculation request; a physical memory address obtaining module for obtaining the physical memory address corresponding to the first virtual address; a second virtual address determining module for calculating the physical memory address and the virtual address
  • the mapping relationship is used to determine the second virtual address corresponding to the physical memory address;
  • the first GPU address obtaining module is used to obtain the GPU address allocated for the task calculation request;
  • the data copy instruction generation module is used to generate the slave 2.
  • the data copy instruction from the virtual address to the GPU address; the first GPU driver interface calling module is used to call the GPU driver interface to execute the data copy instruction;
  • the task calculation request sending module is used to send the task calculation request To the GPU;
  • a processing state information generation module used to generate processing state information corresponding to the task calculation request after the GPU completes the calculation task corresponding to the task calculation request;
  • processing state information storage module used to store the Processing status information.
  • the first virtual address includes a calculation data acquisition virtual address and a calculation result storage virtual address
  • the first virtual address acquisition module may be specifically used to: acquire the calculation data acquisition virtual address in the task calculation request
  • the physical memory address obtaining module may be specifically used to: obtain the first physical memory address corresponding to the virtual address obtained by the calculation data;
  • the second virtual address determining module may be specifically used to: determine the first physical The second virtual address corresponding to the memory address.
  • the GPU address obtaining module may be specifically used to: obtain the calculation data storage GPU address and the calculation result storage GPU address allocated for the task calculation request; the generation from the second virtual address to the The data copy instruction of the GPU address specifically includes: generating a data copy instruction from the second virtual address to the computing data storage GPU address.
  • the device may further include: a second physical memory address obtaining module, configured to obtain the second physical memory corresponding to the virtual address of the calculation result when the calculation result synchronization request sent by the client is obtained Address; the third virtual address acquisition module, used to determine the third virtual address corresponding to the second physical memory address based on the mapping relationship between the physical memory address and the virtual address; the second data copy instruction generation module, used to generate the calculation The result is copied from the calculation result storage GPU address to the data copy instruction of the third virtual address; the second GPU driver interface calling module is used to call the GPU driver interface to execute the data copy instruction.
  • a second physical memory address obtaining module configured to obtain the second physical memory corresponding to the virtual address of the calculation result when the calculation result synchronization request sent by the client is obtained Address
  • the third virtual address acquisition module used to determine the third virtual address corresponding to the second physical memory address based on the mapping relationship between the physical memory address and the virtual address
  • the second data copy instruction generation module used to generate the calculation The result is copied from the calculation result storage GPU address to the data copy
  • the task calculation request obtaining module may be specifically used to: obtain a task calculation request sent by the client from a submission queue; the submission queue contains a plurality of unprocessed submissions submitted by the client Task calculation request; after generating the processing status information corresponding to the task calculation request, the device may further include: a processing status information sending module, configured to send the processing status information corresponding to the task calculation request to the completion queue, so The completion queue contains a plurality of processing status information submitted by the server but not read by the client.
  • the embodiment of this specification also provides a data transmission device corresponding to FIG. 6, including: a data transmission request obtaining module, configured to obtain a data transmission request sent by an application; and a first virtual address obtaining module, configured to obtain the data transmission The first virtual address in the request; a physical memory address determining module, configured to determine the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address; the data transmission request and the physical memory address sending Module for sending the data transmission request and the physical memory address to the server, so that the server performs data transmission according to the data transmission request and the physical memory address, wherein the server is based on the physical memory
  • the mapping relationship between the address and the virtual address is determined, the second virtual address corresponding to the physical memory address is determined; the GPU address allocated for the data transmission request is obtained; the data copy instruction from the second virtual address to the GPU address is generated ; Call the GPU driver interface to execute the data copy instruction.
  • the device may further include: a memory allocation request acquiring module, configured to acquire the memory allocation request sent by the application; and a data acquiring module, configured to acquire the memory allocation The data in the request; a data storage module for storing the data in a first physical memory address; a first virtual address generating module for mapping the first physical memory address to the process space of the application to generate A first virtual address corresponding to the first physical memory address; a storage module, configured to send the first virtual address to the application, and store the mapping relationship between the physical memory address and the first virtual address.
  • a memory allocation request acquiring module configured to acquire the memory allocation request sent by the application
  • a data acquiring module configured to acquire the memory allocation The data in the request
  • a data storage module for storing the data in a first physical memory address
  • a first virtual address generating module for mapping the first physical memory address to the process space of the application to generate A first virtual address corresponding to the first physical memory address
  • a storage module configured to send the first virtual address to the application, and store the mapping relationship between
  • the physical memory address determining module may be specifically configured to determine the physical memory address corresponding to the first virtual address according to the mapping relationship.
  • the embodiment of the present specification also provides a task processing device corresponding to FIG. 7, including: a task processing request acquisition module for acquiring a task processing request sent by an application; a task processing request forwarding module for forwarding the task processing request , So that the server can obtain; the synchronization request sending module is used to send a synchronization request when the processing status information of the task processing request sent by the server is obtained; the first virtual address obtaining module is used to obtain the After the successful notification of the synchronization request sent by the server, the first virtual address of the task processing request is acquired; the physical memory address determination module is configured to determine the first virtual address based on the mapping relationship between the physical memory address and the virtual address A physical memory address corresponding to a virtual address; a calculation result reading module for reading the calculation result of the task processing request from the physical memory address corresponding to the first virtual address; a calculation result sending module for sending all The calculation result of the task processing request is sent to the application.
  • the first virtual address includes a calculation data acquisition virtual address and a calculation result storage virtual address; the first virtual address acquisition module may be specifically used to: obtain a calculation result storage virtual address of the task processing request; The physical memory address determining module may be specifically used to determine the physical memory address corresponding to the virtual address stored in the calculation result.
  • the task processing request forwarding module may be specifically configured to: send the task processing request to a submission queue, so that the server can obtain the task processing request from the submission queue, and the submission queue Contains multiple unprocessed task calculation requests submitted by the client;
  • the synchronization request sending module can be specifically used to query the completion queue, when the processing of the task processing request sent by the server is queried When the status information, a synchronization request is sent, and the completion queue contains a plurality of processing status information submitted by the server and not read by the client;
  • the first virtual address acquisition module can be specifically used for: query The completion queue acquires the first virtual address of the task processing request after querying the success notification of the synchronization request sent by the server.
  • the embodiment of this specification also provides a device corresponding to the above method.
  • FIG. 9 is a schematic structural diagram of a data transmission device corresponding to FIG. 3 provided by an embodiment of this specification.
  • the device 900 may include: at least one processor 910; The instruction 920 of the instruction is executed by the at least one processor 910, so that the at least one processor 910 can: obtain the data transmission request sent by the client; obtain the first virtual address in the data transmission request; Obtain the physical memory address corresponding to the first virtual address; determine the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address; obtain the GPU address allocated for the data transmission request; generate A data copy instruction from the second virtual address to the GPU address; calling an interface driven by the GPU to execute the data copy instruction.
  • the embodiment of this specification also provides a task processing device corresponding to FIG. 4.
  • the device may include: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are Executed by at least one processor, so that the at least one processor can: obtain a task calculation request sent by a client; obtain a first virtual address in the task calculation request; obtain a physical memory address corresponding to the first virtual address Based on the mapping relationship between the physical memory address and the virtual address, determine the second virtual address corresponding to the physical memory address; obtain the GPU address allocated for the task calculation request; generate from the second virtual address to the GPU address Call the GPU-driven interface to execute the data copy instruction; send the task calculation request to the GPU; when the GPU completes the calculation task corresponding to the task calculation request, generate the task calculation request corresponding The processing status information; storing the processing status information.
  • the embodiment of this specification also provides a data transmission device corresponding to FIG. 6.
  • the device may include: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are Executed by at least one processor, so that the at least one processor can: obtain a data transmission request sent by an application; obtain a first virtual address in the data transmission request; The physical memory address corresponding to the first virtual address; sending the data transmission request and the physical memory address to the server, so that the server performs data transmission according to the data transmission request and the physical memory address, wherein The server determines the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address; obtains the GPU address allocated for the data transmission request; The data copy instruction of the GPU address; call the GPU driver interface to execute the data copy instruction.
  • the embodiment of this specification also provides a task processing device corresponding to FIG. 7.
  • the device may include: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are Executed by at least one processor, so that the at least one processor can: obtain the task processing request sent by the application; forward the task processing request so that the server can obtain it; when the task processing sent by the server is obtained When the requested processing status information, a synchronization request is issued; when the successful notification of the synchronization request sent by the server is obtained, the first virtual address of the task processing request is obtained; based on the mapping of physical memory addresses and virtual addresses Relationship, determine the physical memory address corresponding to the first virtual address; read the calculation result of the task processing request from the physical memory address corresponding to the first virtual address; send the calculation result of the task processing request to The application.
  • An embodiment of the present specification provides a computer-readable medium having computer-readable instructions stored thereon, and the computer-readable instructions can be executed by a processor to implement any of the methods described above.
  • the improvement of a technology can be clearly distinguished between hardware improvements (for example, improvements in circuit structures such as diodes, transistors, switches, etc.) or software improvements (improvements in method flow).
  • hardware improvements for example, improvements in circuit structures such as diodes, transistors, switches, etc.
  • software improvements improvements in method flow.
  • the improvement of many methods and processes of today can be regarded as a direct improvement of the hardware circuit structure.
  • Designers almost always get the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be realized by the hardware entity module.
  • a programmable logic device for example, a Field Programmable Gate Array (Field Programmable Gate Array, FPGA)
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • HDL Hardware Description Language
  • ABEL Advanced Boolean Expression Language
  • AHDL Altera Hardware Description Language
  • HDCal JHDL
  • Lava Lava
  • Lola MyHDL
  • PALASM RHDL
  • VHDL Very-High-Speed Integrated Circuit Hardware Description Language
  • Verilog Verilog
  • the controller can be implemented in any suitable manner.
  • the controller can take the form of, for example, a microprocessor or a processor and a computer-readable medium storing computer-readable program codes (such as software or firmware) executable by the (micro)processor. , Logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers. Examples of controllers include but are not limited to the following microcontrollers: ARC625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as part of the memory control logic.
  • controllers in addition to implementing the controller in a purely computer-readable program code manner, it is entirely possible to program the method steps to make the controller use logic gates, switches, application specific integrated circuits, programmable logic controllers, and embedded logic.
  • the same function can be realized in the form of a microcontroller or the like. Therefore, such a controller can be regarded as a hardware component, and the devices included in it for realizing various functions can also be regarded as a structure within the hardware component. Or even, the device for realizing various functions can be regarded as both a software module for realizing the method and a structure within a hardware component.
  • a typical implementation device is a computer.
  • the computer may be, for example, a personal computer, a laptop computer, a cell phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Any combination of these devices.
  • the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • the computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-permanent memory in a computer readable medium, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology.
  • the information can be computer-readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
  • This application may be described in the general context of computer-executable instructions executed by a computer, such as a program module.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • This application can also be practiced in distributed computing environments. In these distributed computing environments, tasks are performed by remote processing devices connected through a communication network. In a distributed computing environment, program modules can be located in local and remote computer storage media including storage devices.

Abstract

Data transmission and task processing methods, apparatuses and devices. A solution comprises: obtaining a data transmission request sent by a client (302); obtaining a first virtual address in the data transmission request (304); obtaining a physical memory address corresponding to the first virtual address (306); on the basis of a mapping relationship between the physical memory address and the virtual address, determining a second virtual address corresponding to the physical memory address (308); obtaining a GPU address allocated for the data transmission request (310); generating a data copying instruction from the second virtual address to the GPU address (312); and calling an interface driven by the GPU to execute the data copying instruction (314).

Description

一种数据传输和任务处理方法、装置及设备Method, device and equipment for data transmission and task processing 技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种数据传输和任务处理方法、装置及设备。This application relates to the field of computer technology, in particular to a data transmission and task processing method, device and equipment.
背景技术Background technique
深度学习(DeepLearning,DL)被广泛利用在人工智能(Artificial Intelligence,AI)领域。AI特别是深度学习目前已经广泛用于支付(人脸识别)、定损(图片识别)、交互与客服(语音识别)等多种场景,并且取得了显著效果。典型DL任务需要强大的算力支撑,因此当前绝大多数任务运行在图形处理器(Graphics Processing Unit,GPU)等加速设备之上。以图形处理器等为代表的加速器芯片是推动AI发展和落地的重要保障。然而GPU在使用过程中普遍存在平均利用率不高的问题。Deep Learning (DL) is widely used in the field of Artificial Intelligence (AI). AI, especially deep learning, has been widely used in various scenarios such as payment (face recognition), loss determination (picture recognition), interaction and customer service (voice recognition), and has achieved remarkable results. Typical DL tasks require strong computing power. Therefore, most tasks currently run on acceleration devices such as Graphics Processing Unit (GPU). Accelerator chips represented by graphics processors are an important guarantee for the development and landing of AI. However, there is a general problem of low average utilization of GPUs during use.
需要提供一个提高GPU的平均利用率的方案。It is necessary to provide a solution to increase the average utilization rate of the GPU.
发明内容Summary of the invention
有鉴于此,本申请实施例提供了一种数据传输和任务处理方法、装置及设备,用于提高GPU资源虚拟化的效率。In view of this, the embodiments of the present application provide a data transmission and task processing method, device, and equipment for improving the efficiency of GPU resource virtualization.
本说明书实施例提供的一种数据传输方法,所述方法应用于GPU虚拟化系统中的服务端,包括:获取客户端发送的数据传输请求;获取所述数据传输请求中的第一虚拟地址;获取所述第一虚拟地址对应的物理内存地址;基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;获取为所述数据传输请求分配的GPU地址;生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;调用GPU驱动的接口执行所述数据拷贝指令。An embodiment of this specification provides a data transmission method, which is applied to a server in a GPU virtualization system, and includes: obtaining a data transmission request sent by a client; obtaining a first virtual address in the data transmission request; Obtain the physical memory address corresponding to the first virtual address; determine the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address; obtain the GPU address allocated for the data transmission request; generate A data copy instruction from the second virtual address to the GPU address; calling an interface driven by the GPU to execute the data copy instruction.
本说明书实施例提供的一种任务处理方法,所述方法应用于GPU虚拟化系统中的服务端,包括:获取客户端发送的任务计算请求;获取所述任务计算请求中的第一虚拟地址;获取所述第一虚拟地址对应的物理内存地址;基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;获取为所述任务计算请求分配的GPU地址;生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;调用GPU驱动的接口执行所述数据拷贝指令;将所述任务计算请求发送至GPU;当所述GPU完成所述任务计算请求对应的计算任务后,生成所述任务计算请求对应的处理状态信息;存储所述处理状态信息。An embodiment of this specification provides a task processing method, which is applied to a server in a GPU virtualization system, and includes: obtaining a task calculation request sent by a client; obtaining a first virtual address in the task calculation request; Obtain the physical memory address corresponding to the first virtual address; determine the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address; obtain the GPU address allocated for the task calculation request; generate The data copy instruction from the second virtual address to the GPU address; call the GPU driver interface to execute the data copy instruction; send the task calculation request to the GPU; when the GPU completes the task calculation request corresponding After the calculation task, the processing state information corresponding to the task calculation request is generated; the processing state information is stored.
本说明书实施例提供的一种数据传输方法,所述方法应用于GPU虚拟化系统中的客户端,包括:获取应用发送的数据传输请求;获取所述数据传输请求中的第一虚拟地址;基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;将所述数据传输请求和所述物理内存地址发送至服务端,以便所述服务端根据所述数据传输请求和所述物理内存地址执行数据传输,其中,所述服务器基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;获取为所述数据传输请求分配的GPU地址;生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;调用GPU驱动的接口执行所述数据拷贝指令。The embodiment of this specification provides a data transmission method, which is applied to a client in a GPU virtualization system, and includes: obtaining a data transmission request sent by an application; obtaining a first virtual address in the data transmission request; based on The mapping relationship between the physical memory address and the virtual address is determined, and the physical memory address corresponding to the first virtual address is determined; the data transmission request and the physical memory address are sent to the server, so that the server transmits according to the data Request and the physical memory address to perform data transmission, wherein the server determines the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address; obtains the GPU allocated for the data transmission request Address; generate a data copy instruction from the second virtual address to the GPU address; call an interface driven by the GPU to execute the data copy instruction.
本说明书实施例提供的一种任务处理方法,所述方法应用于GPU虚拟化系统中的客户端,包括:获取应用发送的任务处理请求;转发所述任务处理请求,以便服务端进行获取;当获取到所述服务端发送的所述任务处理请求的处理状态信息时,发出同步请求;当获取到所述服务端发送的所述同步请求的成功通知后,获取所述任务处理请求的第一虚拟地址;基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;从所述第一虚拟地址对应的物理内存地址中读取所述任务处理请求的 计算结果;将所述任务处理请求的计算结果发送至所述应用。The embodiment of this specification provides a task processing method, which is applied to a client in a GPU virtualization system, and includes: obtaining a task processing request sent by an application; forwarding the task processing request so that the server can obtain it; When the processing status information of the task processing request sent by the server is obtained, a synchronization request is issued; when the success notification of the synchronization request sent by the server is obtained, the first of the task processing request is obtained. Virtual address; based on the mapping relationship between the physical memory address and the virtual address, determine the physical memory address corresponding to the first virtual address; read the calculation result of the task processing request from the physical memory address corresponding to the first virtual address ; Send the calculation result of the task processing request to the application.
本说明书实施例提供的一种数据传输装置,包括:数据传输请求获取模块,用于获取客户端发送的数据传输请求;第一虚拟地址获取模块,用于获取所述数据传输请求中的第一虚拟地址;物理内存地址获取模块,用于获取所述第一虚拟地址对应的物理内存地址;第二虚拟地址确定模块,用于基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;GPU地址获取模块,用于获取为所述数据传输请求分配的GPU地址;数据拷贝指令生成模块,用于生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;接口调用模块,用于调用GPU驱动的接口执行所述数据拷贝指令。A data transmission device provided by an embodiment of this specification includes: a data transmission request acquisition module, configured to acquire a data transmission request sent by a client; a first virtual address acquisition module, configured to acquire the first data transmission request in the data transmission request; A virtual address; a physical memory address acquiring module for acquiring the physical memory address corresponding to the first virtual address; a second virtual address determining module for determining the physical memory address based on the mapping relationship between the physical memory address and the virtual address Corresponding second virtual address; GPU address acquisition module, used to acquire the GPU address allocated for the data transmission request; data copy instruction generation module, used to generate a data copy from the second virtual address to the GPU address Instruction; interface calling module, used to call the GPU-driven interface to execute the data copy instruction.
本说明书实施例提供的一种任务处理装置,包括:任务计算请求获取模块,用于获取客户端发送的任务计算请求;第一虚拟地址获取模块,用于获取所述任务计算请求中的第一虚拟地址;物理内存地址获取模块,用于获取所述第一虚拟地址对应的物理内存地址;第二虚拟地址确定模块,用于基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;GPU地址获取模块,用于获取为所述任务计算请求分配的GPU地址;数据拷贝指令生成模块,用于生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;GPU驱动接口调用模块,用于调用GPU驱动的接口执行所述数据拷贝指令;任务计算请求发送模块,用于将所述任务计算请求发送至GPU;处理状态信息生成模块,用于当所述GPU完成所述任务计算请求对应的计算任务后,生成所述任务计算请求对应的处理状态信息;处理状态信息存储模块,用于存储所述处理状态信息。A task processing device provided by an embodiment of this specification includes: a task calculation request obtaining module, which is used to obtain a task calculation request sent by a client; and a first virtual address obtaining module, which is used to obtain the first task calculation request in the task calculation request. A virtual address; a physical memory address acquiring module for acquiring the physical memory address corresponding to the first virtual address; a second virtual address determining module for determining the physical memory address based on the mapping relationship between the physical memory address and the virtual address Corresponding second virtual address; GPU address acquisition module, used to acquire the GPU address allocated for the task calculation request; data copy instruction generation module, used to generate a data copy from the second virtual address to the GPU address Instruction; GPU driver interface call module, used to call the GPU driver interface to execute the data copy instruction; task calculation request sending module, used to send the task calculation request to the GPU; processing status information generation module, used to After the GPU completes the calculation task corresponding to the task calculation request, it generates processing state information corresponding to the task calculation request; a processing state information storage module is used to store the processing state information.
本说明书实施例提供的一种数据传输装置,包括:数据传输请求获取模块,用于获取应用发送的数据传输请求;第一虚拟地址获取模块,用于获取所述数据传输请求中的第一虚拟地址;物理内存地址确定模块,用于基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;数据传输请求和所述物理内存地址发送模块,用于将所述数据传输请求和所述物理内存地址发送至服务端,以便所述服务端根据所述数据传输请求和所述物理内存地址执行数据传输,其中,所述服务器基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;获取为所述数据传输请求分配的GPU地址;生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;调用GPU驱动的接口执行所述数据拷贝指令。A data transmission device provided by an embodiment of this specification includes: a data transmission request obtaining module, which is used to obtain a data transmission request sent by an application; a first virtual address obtaining module, which is used to obtain a first virtual address in the data transmission request. Address; physical memory address determining module, used to determine the physical memory address corresponding to the first virtual address based on the mapping relationship between physical memory address and virtual address; data transmission request and the physical memory address sending module, used to transfer all The data transmission request and the physical memory address are sent to the server, so that the server performs data transmission according to the data transmission request and the physical memory address, wherein the server is based on the mapping of the physical memory address and the virtual address Relationship, determine the second virtual address corresponding to the physical memory address; obtain the GPU address allocated for the data transmission request; generate a data copy instruction from the second virtual address to the GPU address; call the GPU driver interface Execute the data copy instruction.
本说明书实施例提供的一种任务处理装置,包括:任务处理请求获取模块,用于获取应用发送的任务处理请求;任务处理请求转发模块,用于转发所述任务处理请求,以便服务端进行获取;同步请求发送模块,用于当获取到所述服务端发送的所述任务处理请求的处理状态信息时,发出同步请求;第一虚拟地址获取模块,用于当获取到所述服务端发送的所述同步请求的成功通知后,获取所述任务处理请求的第一虚拟地址;物理内存地址确定模块,用于基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;计算结果读取模块,用于从所述第一虚拟地址对应的物理内存地址中读取所述任务处理请求的计算结果;计算结果发送模块,用于将所述任务处理请求的计算结果发送至所述应用。A task processing device provided by an embodiment of this specification includes: a task processing request acquisition module for acquiring a task processing request sent by an application; a task processing request forwarding module for forwarding the task processing request so that the server can obtain it Synchronization request sending module, used to send a synchronization request when the processing status information of the task processing request sent by the server is obtained; the first virtual address obtaining module, used to obtain the information sent by the server After the successful notification of the synchronization request, the first virtual address of the task processing request is obtained; the physical memory address determination module is configured to determine the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address A memory address; a calculation result reading module for reading the calculation result of the task processing request from the physical memory address corresponding to the first virtual address; a calculation result sending module for calculating the task processing request The result is sent to the application.
本说明书实施例提供一种数据传输装置,包括:数据传输请求获取模块,用于获取应用发送的数据传输请求;第一虚拟地址获取模块,用于获取所述数据传输请求中的第一虚拟地址;物理内存地址确定模块,用于基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;数据传输请求和所述物理内存地址发送模块,用于将所述数据传输请求和所述物理内存地址发送至服务端,以便所述服务端根据所述数据传输请求和所述物理内存地址执行数据传输,其中,所述服务器基于物理内存 地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;获取为所述数据传输请求分配的GPU地址;生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;调用GPU驱动的接口执行所述数据拷贝指令。The embodiment of this specification provides a data transmission device, including: a data transmission request obtaining module, used to obtain a data transmission request sent by an application; a first virtual address obtaining module, used to obtain a first virtual address in the data transmission request The physical memory address determination module is used to determine the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address; the data transmission request and the physical memory address sending module are used to transfer the The data transmission request and the physical memory address are sent to the server, so that the server performs data transmission according to the data transmission request and the physical memory address, wherein the server is based on the mapping relationship between the physical memory address and the virtual address , Determine the second virtual address corresponding to the physical memory address; obtain the GPU address allocated for the data transmission request; generate a data copy instruction from the second virtual address to the GPU address; call the GPU driver interface to execute The data copy instruction.
本说明书实施例提供的一种任务处理装置,包括:任务处理请求获取模块,用于获取应用发送的任务处理请求;任务处理请求转发模块,用于转发所述任务处理请求,以便服务端进行获取;同步请求发送模块,用于当获取到所述服务端发送的所述任务处理请求的处理状态信息时,发出同步请求;第一虚拟地址获取模块,用于当获取到所述服务端发送的所述同步请求的成功通知后,获取所述任务处理请求的第一虚拟地址;物理内存地址确定模块,用于基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;计算结果读取模块,用于从所述第一虚拟地址对应的物理内存地址中读取所述任务处理请求的计算结果;计算结果发送模块,用于将所述任务处理请求的计算结果发送至所述应用。A task processing device provided by an embodiment of this specification includes: a task processing request acquisition module for acquiring a task processing request sent by an application; a task processing request forwarding module for forwarding the task processing request so that the server can obtain it Synchronization request sending module, used to send a synchronization request when the processing status information of the task processing request sent by the server is obtained; the first virtual address obtaining module, used to obtain the information sent by the server After the successful notification of the synchronization request, the first virtual address of the task processing request is acquired; the physical memory address determining module is configured to determine the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address A memory address; a calculation result reading module for reading the calculation result of the task processing request from the physical memory address corresponding to the first virtual address; a calculation result sending module for calculating the task processing request The result is sent to the application.
本说明书实施例提供的一种数据传输设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:获取客户端发送的数据传输请求;获取所述数据传输请求中的第一虚拟地址;获取所述第一虚拟地址对应的物理内存地址;基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;获取为所述数据传输请求分配的GPU地址;生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;调用GPU驱动的接口执行所述数据拷贝指令。A data transmission device provided by an embodiment of this specification includes: at least one processor; Instruction, the instruction is executed by the at least one processor, so that the at least one processor can: obtain the data transmission request sent by the client; obtain the first virtual address in the data transmission request; obtain the first virtual address A physical memory address corresponding to a virtual address; determine the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address; obtain the GPU address allocated for the data transmission request; 2. A data copy instruction from a virtual address to the GPU address; calling an interface driven by the GPU to execute the data copy instruction.
本说明书实施例提供的一种任务处理设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:获取客户端发送的任务计算请求;获取所述任务计算请求中的第一虚拟地址;获取所述第一虚拟地址对应的物理内存地址;基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;获取为所述任务计算请求分配的GPU地址;生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;调用GPU驱动的接口执行所述数据拷贝指令;将所述任务计算请求发送至GPU;当所述GPU完成所述任务计算请求对应的计算任务后,生成所述任务计算请求对应的处理状态信息;存储所述处理状态信息。A task processing device provided by an embodiment of this specification includes: at least one processor; Instruction, the instruction is executed by the at least one processor, so that the at least one processor can: obtain the task calculation request sent by the client; obtain the first virtual address in the task calculation request; obtain the first virtual address in the task calculation request; A physical memory address corresponding to a virtual address; determine the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address; obtain the GPU address allocated for the task calculation request; 2. A data copy instruction from a virtual address to the GPU address; call the GPU-driven interface to execute the data copy instruction; send the task calculation request to the GPU; when the GPU completes the calculation task corresponding to the task calculation request , Generate processing state information corresponding to the task calculation request; store the processing state information.
本说明书实施例提供的一种数据传输设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:获取应用发送的数据传输请求;获取所述数据传输请求中的第一虚拟地址;基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;将所述数据传输请求和所述物理内存地址发送至服务端,以便所述服务端根据所述数据传输请求和所述物理内存地址执行数据传输,其中,所述服务器基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;获取为所述数据传输请求分配的GPU地址;生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;调用GPU驱动的接口执行所述数据拷贝指令。A data transmission device provided by an embodiment of this specification includes: at least one processor; Instructions, the instructions are executed by the at least one processor, so that the at least one processor can: obtain the data transmission request sent by the application; obtain the first virtual address in the data transmission request; based on the physical memory address and The mapping relationship of the virtual address is determined, and the physical memory address corresponding to the first virtual address is determined; the data transmission request and the physical memory address are sent to the server, so that the server can perform the The physical memory address performs data transmission, wherein the server determines the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address; obtains the GPU address allocated for the data transmission request; and generates the slave A data copy instruction from the second virtual address to the GPU address; calling an interface driven by the GPU to execute the data copy instruction.
本说明书实施例提供的一种任务处理设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:获取应用发送的任务处理请求;转发所述任务处理请求,以便服务端进行获取;当获取 到所述服务端发送的所述任务处理请求的处理状态信息时,发出同步请求;当获取到所述服务端发送的所述同步请求的成功通知后,获取所述任务处理请求的第一虚拟地址;基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;从所述第一虚拟地址对应的物理内存地址中读取所述任务处理请求的计算结果;将所述任务处理请求的计算结果发送至所述应用。A task processing device provided by an embodiment of this specification includes: at least one processor; Instruction, the instruction is executed by the at least one processor, so that the at least one processor can: obtain the task processing request sent by the application; forward the task processing request so that the server can obtain it; when the When the processing status information of the task processing request sent by the server is sent, a synchronization request is issued; when the success notification of the synchronization request sent by the server is obtained, the first virtual address of the task processing request is obtained; based on The mapping relationship between the physical memory address and the virtual address is determined, the physical memory address corresponding to the first virtual address is determined; the calculation result of the task processing request is read from the physical memory address corresponding to the first virtual address; The calculation result of the task processing request is sent to the application.
本说明书实施例提供的一种计算机可读介质,其上存储有计算机可读指令,所述计算机可读指令可被处理器执行以实现上述方法。An embodiment of the present specification provides a computer-readable medium having computer-readable instructions stored thereon, and the computer-readable instructions can be executed by a processor to implement the foregoing method.
本说明书实施例采用的上述至少一个技术方案能够达到以下有益效果:通过将物理内存地址与客户端的第一虚拟地址、服务端的第二虚拟地址分别进行映射,即客户端与服务端共享相同的物理内存,生成数据拷贝指令直接将物理内存地址中的数据拷贝到GPU地址中。因为保留了客户端的第一虚拟地址和服务端的第二虚拟地址,即实现了不对原有的程序进行改动,实现了透明化。而且,只经历了一次从物理内存地址到GPU地址的数据拷贝,减少了数据内存拷贝的次数,因此无需为客户端和服务端分配临时内存来存储拷贝的数据,显著提高利用率,有效降低成本,提高了GPU资源虚拟化的效率。The above-mentioned at least one technical solution adopted in the embodiments of this specification can achieve the following beneficial effects: by mapping the physical memory address with the first virtual address of the client and the second virtual address of the server respectively, that is, the client and the server share the same physical In the memory, the data copy instruction is generated to directly copy the data in the physical memory address to the GPU address. Because the first virtual address of the client and the second virtual address of the server are retained, the original program is not changed, and transparency is realized. Moreover, only one copy of the data from the physical memory address to the GPU address has been experienced, which reduces the number of data memory copies. Therefore, there is no need to allocate temporary memory for the client and server to store the copied data, which significantly improves the utilization rate and effectively reduces the cost. , Improve the efficiency of GPU resource virtualization.
附图说明Description of the drawings
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The exemplary embodiments and descriptions of the application are used to explain the application, and do not constitute an improper limitation of the application. In the attached picture:
图1为基于请求-数据转发(多次内存拷贝)的GPU软件虚拟化的流程示意图;Figure 1 is a schematic diagram of the GPU software virtualization process based on request-data forwarding (multiple memory copies);
图2为本说明书实施例提供的基于透明内存共享的GPU虚拟化总体模块的结果示意图;2 is a schematic diagram of the results of the overall GPU virtualization module based on transparent memory sharing provided by an embodiment of this specification;
图3为本说明书实施例提供的一种数据传输方法的流程示意图;FIG. 3 is a schematic flowchart of a data transmission method provided by an embodiment of this specification;
图4为本说明书实施例提供的一种任务处理方法的流程示意图;4 is a schematic flowchart of a task processing method provided by an embodiment of this specification;
图5为本说明书实施例提供的多队列管理的示意图;FIG. 5 is a schematic diagram of multi-queue management provided by an embodiment of this specification;
图6为本说明书实施例提供的另一种数据传输方法的流程示意图;FIG. 6 is a schematic flowchart of another data transmission method provided by an embodiment of this specification;
图7为本说明书实施例提供的另一种任务处理方法的流程示意图;FIG. 7 is a schematic flowchart of another task processing method provided by an embodiment of the specification;
图8为本说明书实施例提供的对应于图3的一种数据传输装置的结构示意图;FIG. 8 is a schematic structural diagram of a data transmission device corresponding to FIG. 3 provided by an embodiment of this specification;
图9为本说明书实施例提供的对应于图3的一种数据传输设备的结构示意图。FIG. 9 is a schematic structural diagram of a data transmission device corresponding to FIG. 3 provided by an embodiment of this specification.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the present application clearer, the technical solutions of the present application will be described clearly and completely in conjunction with specific embodiments of the present application and the corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
透明内存共享:数据平面上,为了对现有程序无改动,无数据移动,透明虚拟化的目的,而专门设计和优化的共享数据方法。Transparent memory sharing: On the data plane, for the purpose of no changes to existing programs, no data movement, and transparent virtualization, a specially designed and optimized method of sharing data.
已知软件虚拟化通常需要客户端截获GPU请求,然后将请求(例如资源申请、任务提交等)转发到服务端,服务端接收到请求后,执行必要的控制,然后向GPU驱动发请求,最好将结果转发给客户端,如图1所示。该过程中,最大的性能制约在于,所有GPU请求(命令、参数)和数据都必须经历多次内存拷贝,即先将GPU请求和数据拷贝到客户端,再从客户端拷贝到服务端,最后再将GPU请求和数据从服务端拷贝到GPU地址中。这种方法,虽然实现了虚拟化,但是比不采用虚拟化的原生方法,即直接 将GPU请求和数据拷贝到GPU地址中,多出了两次数据内存拷贝,如果内存拷贝是基于网络传输,就会影响数据处理效率,这从根本上限制了软件虚拟化的性能,并且由于多出了两次数据内存拷贝,CPU和内存的开销也很多。It is known that software virtualization usually requires the client to intercept the GPU request, and then forward the request (such as resource application, task submission, etc.) to the server. After receiving the request, the server performs the necessary control, and then sends a request to the GPU driver. Good to forward the results to the client, as shown in Figure 1. In this process, the biggest performance constraint is that all GPU requests (commands, parameters) and data must undergo multiple memory copies, that is, GPU requests and data are first copied to the client, and then copied from the client to the server, and finally Then copy the GPU request and data from the server to the GPU address. This method, although virtualized, is compared to the native method without virtualization, that is, directly copying GPU requests and data to the GPU address, there are two more data memory copies. If the memory copy is based on network transmission, It will affect the efficiency of data processing, which fundamentally limits the performance of software virtualization, and because there are two more data memory copies, the overhead of CPU and memory is also a lot.
为了提高虚拟化效率,本方案采用透明内存共享机制来实现GPU的控制请求和数据交换。In order to improve the efficiency of virtualization, this solution uses a transparent memory sharing mechanism to implement GPU control requests and data exchange.
以下结合附图,详细说明本申请各实施例提供的技术方案。The technical solutions provided by the embodiments of the present application will be described in detail below with reference to the accompanying drawings.
图2为本说明书实施例提供的基于透明内存共享的GPU虚拟化总体模块的结果示意图。如图2所示:FIG. 2 is a schematic diagram of the results of the overall GPU virtualization module based on transparent memory sharing provided by an embodiment of this specification. as shown in picture 2:
模型与应用:模型,如卷积神经网络(Convolutional Neural Networks,CNN)、循环神经网络(Recurrent Neural Networks,RNN)、长短期记忆网络(Long Short-Term Memory,LSTM)、生成式对抗网络(Generative Adversarial Networks,GAN,)等。应用包括模型训练或模型在线服务。Models and applications: models, such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Generative Adversarial Networks (Generative) Adversarial Networks, GAN, etc. Applications include model training or model online services.
AL框架层:常见的DL框架,例如TensorFlow,PyTorch,Caffe2等。AL framework layer: common DL frameworks, such as TensorFlow, PyTorch, Caffe2, etc.
服务端:负责GPU服务与虚拟化管理,运行在GPU驱动之上的long running daemon。通常一台GPU服务器有1个服务实例(可封装运行在docker里),并根据配置策略(例如环境变量或配置文件),划分并预分配虚拟GPU资源,保存虚拟与物理资源的影射关系,并上报到集群调度器(例如K8S,Kubemaker)。Server: Responsible for GPU service and virtualization management, long running daemon running on the GPU driver. Usually a GPU server has 1 service instance (which can be packaged and run in docker), and according to configuration strategies (such as environment variables or configuration files), it divides and pre-allocates virtual GPU resources, saves the mapping relationship between virtual and physical resources, and Report to the cluster scheduler (such as K8S, Kubemaker).
客户端:和应用模型一起的客户端lib(例如可一起打包为docker image),负责虚拟GPU资源的发现、申请、访问和必要的内置优化,并记录虚拟-物理资源的对应关系。客户端对应用导出GPU访问API,例如Nvidia CUDA(只是内部的实现解耦的资源申请和底层实现)。一个服务器(或一个物理GPU)可运行多个客户端。Client: The client lib (for example, packaged together as a docker image) together with the application model is responsible for the discovery, application, access and necessary built-in optimization of virtual GPU resources, and records the correspondence between virtual and physical resources. The client exports the GPU access API to the application, such as Nvidia CUDA (only internal resource application and low-level implementation for decoupling). One server (or one physical GPU) can run multiple clients.
其中,为提高虚拟化效率,在服务端和客户端都需要布置透明共享内存模块和高效的GPU请求处理模块(发送-处理-返回结果)。这是本方案的核心所在。另外,本说明书实施例中的客户端和服务端应用于GPU虚拟化系统中客户端和服务端,与常规意义中的客户端和服务端不同。它们是仿照常规意义中的客户端和服务端的功能和作用,采用软件程序编程构造的,并非实体。Among them, in order to improve the efficiency of virtualization, a transparent shared memory module and an efficient GPU request processing module (send-process-return results) need to be arranged on both the server and the client. This is the core of this program. In addition, the client and server in the embodiments of this specification are applied to the client and server in the GPU virtualization system, which are different from the client and server in the conventional sense. They are modeled after the functions and roles of the client and server in the conventional sense, and are constructed using software programming, and are not entities.
调度器:集群范围内GPU资源调度器,例如K8S,客户端的应用需要先向调度器申请GPU资源,然后调度器负责调度执行。Scheduler: GPU resource scheduler within the cluster, such as K8S, the client application needs to apply for GPU resources from the scheduler first, and then the scheduler is responsible for scheduling execution.
实施例一Example one
图3为本说明书实施例提供的一种数据传输方法的流程示意图。从程序角度而言,流程的执行主体可以为应用于GPU虚拟化系统中的服务端。FIG. 3 is a schematic flowchart of a data transmission method provided by an embodiment of this specification. From a program point of view, the execution subject of the process can be the server applied to the GPU virtualization system.
如图3所示,该流程可以包括步骤302~步骤314。As shown in FIG. 3, the process may include step 302 to step 314.
步骤302:获取客户端发送的数据传输请求。Step 302: Obtain a data transmission request sent by the client.
在本说明书实施例中,数据传输请求是由客户端的应用发起的,客户端收到由应用发起的数据传输请求,然后将数据传输请求转发到服务端。In the embodiment of this specification, the data transmission request is initiated by the client application, and the client receives the data transmission request initiated by the application, and then forwards the data transmission request to the server.
数据传输请求可以是一个独立的数据请求,也是一个任务处理请求中的一个分任务。如,需要GPU完成一个计算任务,那么首先需要将要计算的数据传输给GPU地址,然后GPU再根据GPU地址中的数据进行计算。那么步骤302中的数据输出请求就可以是上述计算任务的一个数据传输的分任务。The data transmission request can be an independent data request or a subtask in a task processing request. For example, if the GPU is required to complete a calculation task, then the data to be calculated must first be transmitted to the GPU address, and then the GPU will perform calculations based on the data in the GPU address. Then the data output request in step 302 may be a subtask of data transmission of the foregoing computing task.
步骤304:获取所述数据传输请求中的第一虚拟地址。Step 304: Obtain the first virtual address in the data transmission request.
虚拟地址(Virtual Address)标识一个非物理的实体地址。Virtual Address (Virtual Address) identifies a non-physical physical address.
在数据传输请求中,往往会包括数据传输的方向,即将数据从一个地址传输到另一个地址。但是,在客户端或客户端应用中,数据请求只能采用虚拟地址,不能包括实际的物理地址。这样做是为了避免客户端直接调用物理地址,进行一些数据篡改等不安全的行为。所以,数据传输请求中地址是虚拟地址,这里用第一虚拟地址与其他虚拟地址进行区分,但是,“第一”并没有其他含义。In the data transmission request, the direction of the data transmission is often included, that is, the data is transmitted from one address to another address. However, in the client or client application, the data request can only use the virtual address, and cannot include the actual physical address. This is done to prevent the client from directly calling the physical address and performing unsafe behaviors such as data tampering. Therefore, the address in the data transmission request is a virtual address. Here, the first virtual address is used to distinguish it from other virtual addresses. However, "first" has no other meaning.
步骤306:获取所述第一虚拟地址对应的物理内存地址。Step 306: Obtain the physical memory address corresponding to the first virtual address.
为了确定数据的实际地址,需要根据虚拟地址与物理内存地址的关系来确定第一虚拟地址对应的物理内存地址。该物理内存地址是划分的专门用于存储共享数据的内存池的一个地址,具体位置可以用偏置来表示。In order to determine the actual address of the data, the physical memory address corresponding to the first virtual address needs to be determined according to the relationship between the virtual address and the physical memory address. The physical memory address is an address of a divided memory pool dedicated to storing shared data, and the specific location can be represented by an offset.
其中,虚拟地址与物理内存地址的关系可以保存在一个表格中,并存储在客户端。客户端在转发数据传输请求的时候,可以将第一虚拟地址对应的物理内存地址也发送给客户端。另一种方式是,服务端向客户端发送“获取第一虚拟地址对应的物理内存地址”的请求,客户端接到请求后,再将第一虚拟地址对应的物理内存地址转发给服务端。Among them, the relationship between the virtual address and the physical memory address can be saved in a table and stored on the client. When forwarding the data transmission request, the client may also send the physical memory address corresponding to the first virtual address to the client. Another way is that the server sends a request to the client to "get the physical memory address corresponding to the first virtual address", and after receiving the request, the client forwards the physical memory address corresponding to the first virtual address to the server.
步骤308:基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址。Step 308: Determine a second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address.
为了做到对现有程序无改动,无数据移动,实现透明内存共享,服务端在获取了数据实际对应的物理内存地址之后,也需要转换到服务端的进程中所能使用和识别的虚拟地址,采用第二虚拟地址进行标识。第二虚拟地址只出现于服务端的进程中的程序中,与客户端无关。而且,在服务端的程序中,物理内存地址与第二虚拟地址的对应关系是唯一的,即一个物理内存地址对应唯一的第二虚拟地址,可以基于物理内存地址与虚拟地址的映射关系来确定述物理内存地址对应的第二虚拟地址。物理内存地址与虚拟地址的映射关系可以采用映射表的方式来表示。查询所述映射表中是否包括该物理内存地址,如果可以查找到,说明该物理内存地址已经在服务端进行了映射,无需在进行映射。如果没有查找到,说明该物理内存地址没有在服务端进行映射,因此需要进行映射,得到第二虚拟地址。In order to achieve no changes to existing programs, no data movement, and transparent memory sharing, after the server obtains the actual physical memory address corresponding to the data, it also needs to convert to a virtual address that can be used and recognized in the server process. The second virtual address is used for identification. The second virtual address only appears in the program in the process of the server, and has nothing to do with the client. Moreover, in the server program, the correspondence between the physical memory address and the second virtual address is unique, that is, a physical memory address corresponds to a unique second virtual address, and the description can be determined based on the mapping relationship between the physical memory address and the virtual address. The second virtual address corresponding to the physical memory address. The mapping relationship between the physical memory address and the virtual address can be represented by a mapping table. Query whether the physical memory address is included in the mapping table. If it can be found, it means that the physical memory address has been mapped on the server side, and there is no need to perform the mapping. If it is not found, it means that the physical memory address is not mapped on the server side, so it needs to be mapped to obtain the second virtual address.
步骤310:获取为所述数据传输请求分配的GPU地址。Step 310: Obtain the GPU address allocated for the data transmission request.
在数据传输请求中,是将物理内存地址中的数据拷贝到GPU地址中,实现GPU的数据计算。因此,还需要为传输的数据分配对应的GPU地址。In the data transmission request, the data in the physical memory address is copied to the GPU address to realize the data calculation of the GPU. Therefore, it is also necessary to allocate a corresponding GPU address for the transmitted data.
服务端可以根据所述数据传输请求发出GPU地址分配指令,由GPU驱动调用接口完成GPU地址。然后GPU驱动将分配好的GPU地址发送给服务端。当然,服务端还可以通过其他方法获取为所述数据传输请求分配的GPU地址,本说明书实施例并不做具体限定。The server can issue a GPU address allocation instruction according to the data transmission request, and the GPU driver calls the interface to complete the GPU address. Then the GPU driver sends the assigned GPU address to the server. Of course, the server may also obtain the GPU address allocated for the data transmission request through other methods, which is not specifically limited in the embodiment of this specification.
在数据传输请求中,还可以包括数据的长度,GPU地址的分配主体可以是服务端,也可以是其它执行主体。The data transmission request may also include the length of the data. The GPU address allocation subject can be the server or other execution subjects.
步骤312:生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令。Step 312: Generate a data copy instruction from the second virtual address to the GPU address.
确定了第二虚拟地址就确定了数据的原始地址,确定了GPU地址就确定了数据的传送地址,因此,根据数据的原始地址和传送地址就可以生成数据传送指令。Once the second virtual address is determined, the original address of the data is determined, and when the GPU address is determined, the transfer address of the data is determined. Therefore, the data transfer instruction can be generated according to the original address and the transfer address of the data.
步骤314:调用GPU驱动的接口执行所述数据拷贝指令。Step 314: Call the GPU driver interface to execute the data copy instruction.
步骤312生成的数据拷贝指令,需要GPU驱动去完成与GPU的互动,因此,需要调用GPU驱动的接口执行所述数据拷贝指令,即完成从所述物理内存地址到所述 GPU地址的数据拷贝。The data copy instruction generated in step 312 requires the GPU driver to complete the interaction with the GPU. Therefore, it is necessary to call the GPU driver interface to execute the data copy instruction, that is, complete the data copy from the physical memory address to the GPU address.
图3中的方法,通过将物理内存地址与客户端的第一虚拟地址、服务端的第二虚拟地址分别进行映射,即客户端与服务端共享相同的物理内存,生成数据拷贝指令直接将物理内存地址中的数据拷贝到GPU地址中。因为保留了客户端的第一虚拟地址和服务端的第二虚拟地址,即实现了不对原有的程序进行改动,实现了透明化。而且,只经历了一次从物理内存地址到GPU地址的数据拷贝,减少了数据内存拷贝的次数,因此无需为客户端和服务端分配临时内存来存储拷贝的数据,显著提高了利用率,有效降低了成本,提高了GPU资源虚拟化的效率。The method in Figure 3 maps the physical memory address to the first virtual address of the client and the second virtual address of the server respectively, that is, the client and the server share the same physical memory, and the data copy instruction is generated directly to the physical memory address Copy the data in the GPU address. Because the first virtual address of the client and the second virtual address of the server are retained, the original program is not changed, and transparency is realized. Moreover, only one data copy from the physical memory address to the GPU address has been experienced, which reduces the number of data memory copies, so there is no need to allocate temporary memory for the client and server to store the copied data, which significantly improves the utilization rate and effectively reduces It reduces the cost and improves the efficiency of GPU resource virtualization.
基于图3的方法,本说明书实施例还提供了该方法的一些具体实施方式,下面进行说明。Based on the method in FIG. 3, the embodiments of this specification also provide some specific implementation manners of the method, which are described below.
具体的,在所述确定所述物理内存地址对应的第二虚拟地址之前,还可以包括:判断所述物理内存地址是否存储于映射表中;若否,生成所述物理内存地址对应的第二虚拟地址,并将所述物理内存地址与所述第二虚拟地址的映射关系存储于所述映射表中;所述确定所述物理内存地址对应的第二虚拟地址,具体包括:若是,获取所述物理内存地址对应的第二虚拟地址。Specifically, before determining the second virtual address corresponding to the physical memory address, the method may further include: determining whether the physical memory address is stored in a mapping table; if not, generating a second virtual address corresponding to the physical memory address. Virtual address, and store the mapping relationship between the physical memory address and the second virtual address in the mapping table; the determining the second virtual address corresponding to the physical memory address specifically includes: if yes, obtaining all The second virtual address corresponding to the physical memory address.
在说明书的一个或者多个实施例中,服务端确定物理内存地址对应的第二虚拟地址,首先需要确定映射表中是否存在该物理内存地址,如果存在,说明已经完成了物理内存地址在服务端的虚拟地址的映射。也就是说,物理内存在服务端只进行一次映射。如果映射表中不存在物理内存地址,那么说明这个物理内存地址还没有被服务端映射过,可以进行映射操作,为该物理内存地址生成一个第二虚拟地址,然后在将物理内存地址与第二虚拟地址的映射关系存储起来。In one or more embodiments of the specification, the server determines the second virtual address corresponding to the physical memory address. First, it needs to determine whether the physical memory address exists in the mapping table. If it exists, it means that the physical memory address on the server has been completed. Virtual address mapping. In other words, the physical memory is mapped only once on the server. If there is no physical memory address in the mapping table, it means that the physical memory address has not been mapped by the server. You can perform the mapping operation to generate a second virtual address for the physical memory address, and then compare the physical memory address with the second virtual address. The mapping relationship of the virtual address is stored.
实施例二Example two
图4为本说明书实施例提供的一种任务处理方法的流程示意图。从程序角度而言,流程的执行主体可以为应用于GPU虚拟化系统中的服务端。如图4所示,该流程可以包括步骤402~步骤420。FIG. 4 is a schematic flowchart of a task processing method provided by an embodiment of this specification. From a program point of view, the execution subject of the process can be the server applied to the GPU virtualization system. As shown in FIG. 4, the process may include step 402 to step 420.
步骤402:获取客户端发送的任务计算请求。Step 402: Obtain the task calculation request sent by the client.
任务计算请求可以是各种计算任务,如矩阵相乘,卷积等。任务计算请求是应用发起的,客户端获取以后,转发给服务端。The task calculation request can be various calculation tasks, such as matrix multiplication, convolution and so on. The task calculation request is initiated by the application, and after the client obtains it, it is forwarded to the server.
步骤404:获取所述任务计算请求中的第一虚拟地址。Step 404: Obtain the first virtual address in the task calculation request.
在任务计算请求中,可以包括一些与计算数据有关的信息,但是,在任务计算请求中不会直接包括这些数据,而是记录存储这些数据的地址。由于在客户端中不能体现实际的物理内存地址,因此以虚拟地址的形式表示。例如,如果是矩阵A和矩阵B相乘,那么,第一虚拟地址就是代表存储矩阵A和矩阵B的地址。其中,第一虚拟地址可以是一个,也可以是多个,个数与具体的任务计算请求有关。In the task calculation request, some information related to the calculation data can be included. However, the task calculation request will not directly include these data, but the address where the data is stored is recorded. Since the actual physical memory address cannot be reflected in the client, it is expressed in the form of a virtual address. For example, if matrix A and matrix B are multiplied, then the first virtual address is the address representing the storage matrix A and matrix B. Among them, the first virtual address may be one or multiple, and the number is related to the specific task calculation request.
步骤406:获取所述第一虚拟地址对应的物理内存地址。Step 406: Obtain the physical memory address corresponding to the first virtual address.
在步骤404中,确定了第一虚拟地址之后,还需要确定与第一虚拟地址对应的物理内存地址,具体内容可以参照实施例一的步骤306。In step 404, after the first virtual address is determined, it is also necessary to determine the physical memory address corresponding to the first virtual address. For details, refer to step 306 in the first embodiment.
步骤408:基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址。Step 408: Determine a second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address.
步骤410:获取为所述任务计算请求分配的GPU地址。Step 410: Obtain the GPU address allocated for the task calculation request.
在任务计算请求中,是将物理内存地址中的数据拷贝到GPU地址中,实现GPU 的数据计算。因此,还需要为传输的数据分配对应的GPU地址。In the task calculation request, the data in the physical memory address is copied to the GPU address to implement GPU data calculation. Therefore, it is also necessary to allocate a corresponding GPU address for the transmitted data.
服务端可以根据所述数据传输请求发出GPU地址分配指令,由GPU驱动调用接口完成GPU地址。然后GPU驱动将分配好的GPU地址发送给服务端。当然,服务端还可以通过其他方法获取为所述数据传输请求分配的GPU地址,本说明书实施例并不做具体限定。The server can issue a GPU address allocation instruction according to the data transmission request, and the GPU driver calls the interface to complete the GPU address. Then the GPU driver sends the assigned GPU address to the server. Of course, the server may also obtain the GPU address allocated for the data transmission request through other methods, which is not specifically limited in the embodiment of this specification.
在数据传输请求中,还可以包括数据的长度,GPU地址的分配主体可以是服务端,也可以是其它执行主体。The data transmission request may also include the length of the data. The GPU address allocation subject can be the server or other execution subjects.
需要注意的是,为任务计算请求分配的GPU地址,可以包括计算数据的存储地址,还可以包括计算结果的存储地址。GPU地址的分配也可以根据数据的大小进行分配。It should be noted that the GPU address allocated for the task calculation request may include the storage address of the calculation data, and may also include the storage address of the calculation result. The allocation of GPU addresses can also be allocated according to the size of the data.
步骤412:生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令。Step 412: Generate a data copy instruction from the second virtual address to the GPU address.
步骤414:调用GPU驱动的接口执行所述数据拷贝指令。Step 414: Call the interface of the GPU driver to execute the data copy instruction.
步骤416:将所述任务计算请求发送至GPU。Step 416: Send the task calculation request to the GPU.
步骤418:当所述GPU完成所述任务计算请求对应的计算任务后,生成所述任务计算请求对应的处理状态信息。Step 418: After the GPU completes the calculation task corresponding to the task calculation request, generate processing state information corresponding to the task calculation request.
GPU获取GPU地址中的数据,然后将根据任务计算请求进行计算,然后将计算结果存储在为计算结果分配的GPU地址中。然后,GPU会将计算完成的状态通知服务端,然后服务端生成所述任务计算请求对应的处理状态信息。The GPU obtains the data in the GPU address, and then performs calculations according to the task calculation request, and then stores the calculation result in the GPU address allocated for the calculation result. Then, the GPU will notify the server of the completed state of the calculation, and then the server generates processing state information corresponding to the task calculation request.
步骤420:存储所述处理状态信息。Step 420: Store the processing status information.
服务端可以主动将所述处理状态信息发送给客户端,也可以将信息进程存储,便于客户端进行查询。The server can actively send the processing status information to the client, and can also store the information process, which is convenient for the client to query.
图4中的方法,通过将物理内存地址与客户端的第一虚拟地址、服务端的第二虚拟地址分别进行映射,即客户端与服务端共享相同的物理内存,生成数据拷贝指令直接将物理内存地址中的数据拷贝到GPU地址中。因为保留了客户端的第一虚拟地址和服务端的第二虚拟地址,即实现了不对原有的程序进行改动,实现了透明化。而且,只经历了一次从物理内存地址到GPU地址的数据拷贝,减少了数据内存拷贝的次数,因此无需为客户端和服务端分配临时内存来存储拷贝的数据,显著提高利用率,有效降低成本,提高了GPU资源虚拟化的效率。The method in Figure 4 maps the physical memory address to the first virtual address of the client and the second virtual address of the server respectively, that is, the client and the server share the same physical memory, and the data copy instruction is generated to directly transfer the physical memory address Copy the data in the GPU address. Because the first virtual address of the client and the second virtual address of the server are retained, the original program is not changed, and transparency is realized. Moreover, only one copy of the data from the physical memory address to the GPU address has been experienced, which reduces the number of data memory copies. Therefore, there is no need to allocate temporary memory for the client and server to store the copied data, which significantly improves the utilization rate and effectively reduces the cost. , Improve the efficiency of GPU resource virtualization.
在说明书的一个或者多个实施例中,第一虚拟地址可以有不同的作用,比如,一类用来存储计算数据,一类用来存储计算结果。具体的,所述第一虚拟地址包括计算数据获取虚拟地址和计算结果存放虚拟地址,所述获取所述任务计算请求中的第一虚拟地址,具体可以包括:获取所述任务计算请求中的计算数据获取虚拟地址;所述获取所述第一虚拟地址对应的物理内存地址,具体可以包括:获取所述计算数据获取虚拟地址对应的第一物理内存地址;所述确定所述物理内存地址对应的第二虚拟地址,具体可以包括:确定所述第一物理内存地址对应的第二虚拟地址。In one or more embodiments of the specification, the first virtual address may have different functions, for example, one type is used to store calculation data, and the other type is used to store calculation results. Specifically, the first virtual address includes a calculation data acquisition virtual address and a calculation result storage virtual address, and the acquisition of the first virtual address in the task calculation request may specifically include: acquiring the calculation in the task calculation request Data acquisition virtual address; said acquiring the physical memory address corresponding to the first virtual address may specifically include: acquiring the calculation data acquiring the first physical memory address corresponding to the virtual address; said determining the physical memory address corresponding to the The second virtual address may specifically include: determining the second virtual address corresponding to the first physical memory address.
由于第一虚拟地址包括两类,因此在获取第一虚拟地址的时候,要判断此虚拟地址是用来获取计算数据的,还是用来存储计算结果的。如果第一虚拟地址的种类分错了,那么就会导致任务技术请求失败。例如,矩阵A与矩阵B相乘,矩阵A存储在虚拟地址A中,矩阵B存储在虚拟地址B中,计算结果C存储在虚拟地址C中,如果,从虚拟地址A和虚拟地址C中获取数据,然后再进行乘法运算,显然计算结果是错误的。Since the first virtual address includes two types, when obtaining the first virtual address, it is necessary to determine whether the virtual address is used to obtain calculation data or to store calculation results. If the type of the first virtual address is wrong, it will cause the task technology request to fail. For example, matrix A is multiplied by matrix B, matrix A is stored in virtual address A, matrix B is stored in virtual address B, and the calculation result C is stored in virtual address C. If, get from virtual address A and virtual address C Data, and then multiply, obviously the calculation result is wrong.
由于第一虚拟地址包括多个种类,因此在为任务计算请求分配GPU地址时,也要进行分类。具体的,所述获取为所述任务计算请求分配的GPU地址,具体包括:获取 为所述任务计算请求分配的计算数据存储GPU地址和计算结果存储GPU地址;所述生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令,具体包括:生成从所述第二虚拟地址至所述计算数据存储GPU地址的数据拷贝指令。Since the first virtual address includes multiple categories, it must be classified when assigning GPU addresses to task calculation requests. Specifically, the obtaining the GPU address allocated for the task calculation request specifically includes: obtaining the calculation data storage GPU address and the calculation result storage GPU address allocated for the task calculation request; and the generating from the second virtual The data copy instruction from the address to the GPU address specifically includes: generating a data copy instruction from the second virtual address to the computing data storage GPU address.
计算数据存储GPU地址是用于存储从物理内存地址中拷贝过来的数据的,即用于存储进行计算的源数据。计算结果存储GPU地址是用于存储计算结果的。GPU完成计算任务之后,会将计算结果暂时存放在计算结果存储GPU地址,当客户端调用之后,再将数据拷贝到相应的物理内存地址中。The calculation data storage GPU address is used to store the data copied from the physical memory address, that is, to store the source data for calculation. The calculation result storage GPU address is used to store the calculation result. After the GPU completes the calculation task, it temporarily stores the calculation result in the calculation result storage GPU address, and when the client calls it, it copies the data to the corresponding physical memory address.
在本说明书的一个或者多个实施例中,在所述生成所述任务计算请求对应的处理状态信息之后,还可以包括:当获取到所述客户端发送的计算结果同步请求时,获取所述计算结果存放虚拟地址对应的第二物理内存地址;基于物理内存地址与虚拟地址的映射关系,确定所述第二物理内存地址对应的第三虚拟地址;生成将计算结果从所述计算结果存储GPU地址中拷贝至所述第三虚拟地址的数据拷贝指令;调用GPU驱动的接口执行所述数据拷贝指令。In one or more embodiments of this specification, after generating the processing status information corresponding to the task calculation request, it may further include: when the calculation result synchronization request sent by the client is obtained, obtaining the The calculation result stores the second physical memory address corresponding to the virtual address; based on the mapping relationship between the physical memory address and the virtual address, the third virtual address corresponding to the second physical memory address is determined; the calculation result is stored in the GPU from the calculation result The data copy instruction copied to the third virtual address in the address; the interface of the GPU driver is invoked to execute the data copy instruction.
虽然GPU完成了计算任务,但是客户端或者应用并不能获取计算结果。因此,客户端在获取到GPU完成了计算任务之后,会发出一个计算结果同步请求,即将GPU地址中的计算结果拷贝到物理内存地址中,以供客户端的应用读取。Although the GPU has completed the calculation task, the client or application cannot obtain the calculation result. Therefore, after the client obtains that the GPU has completed the calculation task, it will issue a calculation result synchronization request, that is, copy the calculation result in the GPU address to the physical memory address for the client application to read.
将计算结果从将GPU地址中拷贝到物理内存地址中,与将计算数据从物理内存地址中拷贝到GPU地址中是一个相反的过程。由于是服务端发起数据拷贝请求,因此,需要确定计算结果存放的物理内存地址在服务端对应的虚拟地址是哪个,即第三虚拟地址。Copying the calculation result from the GPU address to the physical memory address is the opposite process of copying the calculation data from the physical memory address to the GPU address. Since the server initiates the data copy request, it is necessary to determine which virtual address corresponds to the server side of the physical memory address where the calculation result is stored, that is, the third virtual address.
在任务计算请求中,第一虚拟地址包括计算数据获取虚拟地址和计算结果存放虚拟地址,可以基于物理内存地址与虚拟地址的映射关系,确定计算结果存放虚拟地址对应的第二物理内存地址。然后,服务端再根据映射关系确定第二物理内存地址对应的第三虚拟地址。然后,再生成将计算结果从所述计算结果存储GPU地址中拷贝至所述第三虚拟地址的数据拷贝指令,并调用GPU驱动的接口执行所述数据拷贝指令。In the task calculation request, the first virtual address includes the calculation data acquisition virtual address and the calculation result storage virtual address, and the second physical memory address corresponding to the calculation result storage virtual address may be determined based on the mapping relationship between the physical memory address and the virtual address. Then, the server determines the third virtual address corresponding to the second physical memory address according to the mapping relationship. Then, a data copy instruction for copying the calculation result from the calculation result storage GPU address to the third virtual address is generated, and the GPU driver interface is called to execute the data copy instruction.
为提高扩展性,例如单机多卡,提出多队列的请求管理方法,即每个GPU维护多个请求队列,包括提交队列(SubmitQ)和完成队列(CompleteQ),如图5所示。具体的,所述获取客户端发送的任务计算请求,具体可以包括:从提交队列中获取所述客户端发送的任务计算请求;所述提交队列中包含多个所述客户端提交的未经过处理的任务计算请求;在生成所述任务计算请求对应的处理状态信息之后,还包括:将所述任务计算请求对应的处理状态信息发送至完成队列,所述完成队列中包含多个所述服务端提交的未被所述客户端读取的处理状态信息。In order to improve scalability, such as single-machine multi-card, a multi-queue request management method is proposed, that is, each GPU maintains multiple request queues, including a submission queue (SubmitQ) and a completion queue (CompleteQ), as shown in Figure 5. Specifically, the obtaining the task calculation request sent by the client may specifically include: obtaining the task calculation request sent by the client from a submission queue; the submission queue contains a plurality of unprocessed submissions submitted by the client The task calculation request; after generating the processing status information corresponding to the task calculation request, it further includes: sending the processing status information corresponding to the task calculation request to a completion queue, and the completion queue contains a plurality of the server Submitted processing status information that has not been read by the client.
客户端提交GPU请求时,将请求放到提交队列即返回(例如对于异步请求),之后由工作线程,负责将请求发送至服务端;或者服务器端主动查询请求队列中新的请求。服务端收到请求之后,执行处理,并将处理结果放至完成队列。客户端可以异步查询完成队列中的处理状态信息。When the client submits a GPU request, it puts the request in the submission queue and then returns (for example, for asynchronous requests), and then the worker thread is responsible for sending the request to the server; or the server actively queries for new requests in the request queue. After the server receives the request, it executes the processing and puts the processing result on the completion queue. The client can asynchronously query the processing status information in the completion queue.
提交队列里面存储了多个任务计算请求,并且按照进入提交队列的时间进行排序,先进入提交队列的任务计算请求优先被服务端获取,后进入提交队列的任务计算请求被服务端后获取。如,有任务1、任务2和任务3先后提交到提交队列中,那么服务端从提交队列中首先获取任务1,再获取任务2,最后获取任务3。同理,完成队列也是这样的原理。Multiple task calculation requests are stored in the submission queue, and they are sorted according to the time when they enter the submission queue. The task calculation request that enters the submission queue first is first obtained by the server, and the task calculation request that enters the submission queue later is obtained by the server. For example, if task 1, task 2, and task 3 are submitted to the submission queue one after another, then the server will first obtain task 1, then task 2, and finally task 3 from the submission queue. The same principle applies to completion queues.
本方案同时将队列机制与透明共享内存结合,即客户端和服务端的所有请求都是在共享内存上分配出来,避免请求转发时碰到的请求消息(request)的内存拷贝。This solution also combines the queuing mechanism with transparent shared memory, that is, all requests from the client and the server are allocated on the shared memory to avoid memory copies of the request message (request) encountered when the request is forwarded.
本方法提出通过高效的软件方法使得GPU硬件像CPU那样高效、无损共享,从而显著提高利用率,有效降低成本;并且在独占情况下,还可以优化性能;在大规模部署情况下,采用纯软件虚拟方法来简化运维、管理。This method proposes to use an efficient software method to make GPU hardware as efficient and lossless as the CPU, thereby significantly improving the utilization rate and effectively reducing the cost; and in the case of exclusive use, it can also optimize the performance; in the case of large-scale deployment, pure software is used Virtual methods to simplify operation, maintenance and management.
实施例三Example three
图6为本说明书实施例提供的另一种数据传输方法的流程示意图。从程序角度而言,流程的执行主体可以为应用于GPU虚拟化系统中的客户端。如图6所示,该流程可以包括步骤602~步骤608。FIG. 6 is a schematic flowchart of another data transmission method provided by an embodiment of this specification. From a program point of view, the execution subject of the process can be the client applied to the GPU virtualization system. As shown in FIG. 6, the process may include step 602 to step 608.
步骤602:获取应用发送的数据传输请求。Step 602: Obtain a data transmission request sent by the application.
在本实施例中,应用和客户端是一起的,应用发送的数据传输请求会通过客户端进行传送。In this embodiment, the application and the client are together, and the data transmission request sent by the application will be transmitted through the client.
步骤604:获取所述数据传输请求中的第一虚拟地址。Step 604: Obtain the first virtual address in the data transmission request.
数据传输请求中的数据地址是虚拟地址,客户端首先需要获取第一虚拟地址,然后再进行相关操作。The data address in the data transmission request is a virtual address, and the client first needs to obtain the first virtual address, and then perform related operations.
步骤606:基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址。Step 606: Determine the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address.
为了确定数据的实际地址,需要根据虚拟地址与物理内存地址的关系来确定第一虚拟地址对应的物理内存地址。其中,虚拟地址与物理内存地址的关系可以保存在一个表格中,并存储在客户端。客户端获取第一虚拟地址之后,可以对已存储的第一虚拟地址和物理内存地址的对应关系进行查询,然后确定第一虚拟地址对应的物理内存地址。In order to determine the actual address of the data, the physical memory address corresponding to the first virtual address needs to be determined according to the relationship between the virtual address and the physical memory address. Among them, the relationship between the virtual address and the physical memory address can be saved in a table and stored on the client. After obtaining the first virtual address, the client can query the stored correspondence between the first virtual address and the physical memory address, and then determine the physical memory address corresponding to the first virtual address.
步骤608:将所述数据传输请求和所述物理内存地址发送至服务端,以便所述服务端根据所述数据传输请求和所述物理内存地址执行数据传输,其中,所述服务器基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;获取为所述数据传输请求分配的GPU地址;生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;调用GPU驱动的接口执行所述数据拷贝指令。Step 608: Send the data transmission request and the physical memory address to the server, so that the server performs data transmission according to the data transmission request and the physical memory address, wherein the server is based on the physical memory address A mapping relationship with a virtual address, determining a second virtual address corresponding to the physical memory address; acquiring a GPU address allocated for the data transmission request; generating a data copy instruction from the second virtual address to the GPU address; Call the GPU driver interface to execute the data copy instruction.
实施例三提供的数据传输方法与实施例一提供的数据传输方法是分别从客户端和服务端的角度进行描述的,其很多内容是相似的,实施例三未进行解释的内容可以参见实施例一的解释。The data transmission method provided in the third embodiment and the data transmission method provided in the first embodiment are respectively described from the perspective of the client and the server, and many of the contents are similar. For the content not explained in the third embodiment, please refer to the first embodiment. explanation of.
图6的方法,通过将物理内存地址与客户端的第一虚拟地址、服务端的第二虚拟地址分别进行映射,即客户端与服务端共享相同的物理内存,生成数据拷贝指令直接将物理内存地址中的数据拷贝到GPU地址中。因为保留了客户端的第一虚拟地址和服务端的第二虚拟地址,即实现了不对原有的程序进行改动,实现了透明化。而且,只经历了一次从物理内存地址到GPU地址的数据拷贝,减少了数据内存拷贝的次数,因此无需为客户端和服务端分配临时内存来存储拷贝的数据,显著提高了利用率,有效降低了成本,提高了GPU资源虚拟化的效率。The method in Figure 6 maps the physical memory address with the first virtual address of the client and the second virtual address of the server respectively, that is, the client and the server share the same physical memory, and the data copy instruction is generated directly into the physical memory address Copy the data to the GPU address. Because the first virtual address of the client and the second virtual address of the server are retained, the original program is not changed, and transparency is realized. Moreover, only one data copy from the physical memory address to the GPU address has been experienced, which reduces the number of data memory copies. Therefore, there is no need to allocate temporary memory for the client and server to store the copied data, which significantly improves the utilization rate and effectively reduces It reduces the cost and improves the efficiency of GPU resource virtualization.
在本说明书的一个或多个实施例中,在所述获取应用发送的数据传输请求之前,还可以包括:获取应用发送的内存分配请求;获取所述内存分配请求中的数据;将所述数据存储至第一物理内存地址;将所述第一物理内存地址映射到所述应用的进程空间,生成所述第一物理内存地址对应的第一虚拟地址;将所述第一虚拟地址发送给所述应用,并存储所述物理内存地址和所述第一虚拟地址的映射关系。In one or more embodiments of this specification, before the acquiring the data transmission request sent by the application, it may further include: acquiring the memory allocation request sent by the application; acquiring the data in the memory allocation request; Store to a first physical memory address; map the first physical memory address to the process space of the application to generate a first virtual address corresponding to the first physical memory address; send the first virtual address to all The application, and store the mapping relationship between the physical memory address and the first virtual address.
应用发起内存分配请求,例如调用malloc(len)。客户端获取应用发送的内存分配请求,并从内存池中分配满足长度要求的所需内存,例如内存池起始地址offset,长度为L 的一段。并将内存映射到应用的进程空间,获得映射后的虚拟地址H。记录虚拟地址H和内存池中的位置信息(偏移offset)存储在映射表中,可以通过哈希表进行记录。然后,返回虚拟地址H给应用,以便于应用进行正常的数据读写。The application initiates a memory allocation request, such as calling malloc(len). The client obtains the memory allocation request sent by the application, and allocates the required memory that meets the length requirement from the memory pool, such as a segment of the memory pool starting address offset and length L. And map the memory to the process space of the application to obtain the mapped virtual address H. Recording the virtual address H and the location information (offset offset) in the memory pool are stored in the mapping table, and can be recorded through a hash table. Then, the virtual address H is returned to the application, so that the application can perform normal data reading and writing.
在本说明书的一个或多个实施例中,所述确定所述第一虚拟地址对应的物理内存地址,具体可以包括:根据所述映射关系确定所述第一虚拟地址对应的物理内存地址。In one or more embodiments of this specification, the determining the physical memory address corresponding to the first virtual address may specifically include: determining the physical memory address corresponding to the first virtual address according to the mapping relationship.
实施例四Example four
图7为本说明书实施例提供的另一种任务处理方法的流程示意图。从程序角度而言,流程的执行主体可以为应用于GPU虚拟化系统中的客户端。如图7所示,该流程可以包括步骤702~步骤714。FIG. 7 is a schematic flowchart of another task processing method provided by an embodiment of this specification. From a program point of view, the execution subject of the process can be the client applied to the GPU virtualization system. As shown in FIG. 7, the process may include step 702 to step 714.
步骤702:获取应用发送的任务处理请求。Step 702: Obtain a task processing request sent by the application.
步骤704:转发所述任务处理请求,以便服务端进行获取。Step 704: Forward the task processing request so that the server can obtain it.
步骤706:当获取到所述服务端发送的所述任务处理请求的处理状态信息时,发出同步请求。Step 706: When the processing status information of the task processing request sent by the server is obtained, a synchronization request is issued.
步骤708:当获取到所述服务端发送的所述同步请求的成功通知后,获取所述任务处理请求的第一虚拟地址。Step 708: After obtaining the successful notification of the synchronization request sent by the server, obtain the first virtual address of the task processing request.
步骤710:基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址。Step 710: Determine the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address.
步骤712:从所述第一虚拟地址对应的物理内存地址中读取所述任务处理请求的计算结果。Step 712: Read the calculation result of the task processing request from the physical memory address corresponding to the first virtual address.
步骤714:将所述任务处理请求的计算结果发送至所述应用。Step 714: Send the calculation result of the task processing request to the application.
可选的,所述第一虚拟地址包括计算数据获取虚拟地址和计算结果存放虚拟地址。Optionally, the first virtual address includes a virtual address for obtaining calculation data and a virtual address for storing calculation results.
所述获取所述任务处理请求的第一虚拟地址,具体可以包括:获取所述任务处理请求的计算结果存放虚拟地址。The obtaining the first virtual address of the task processing request may specifically include: obtaining a virtual address for storing a calculation result of the task processing request.
所述确定所述第一虚拟地址对应的物理内存地址,具体可以包括:确定所述计算结果存放虚拟地址对应的物理内存地址。The determining the physical memory address corresponding to the first virtual address may specifically include: determining the physical memory address corresponding to the virtual address stored in the calculation result.
可选的,所述转发所述任务处理请求,具体可以包括:将所述任务处理请求发送至提交队列,以便所述服务端从所述提交队列中获取所述任务处理请求,所述提交队列中包含多个所述客户端提交的未经过处理的任务计算请求。Optionally, the forwarding the task processing request may specifically include: sending the task processing request to a submission queue, so that the server can obtain the task processing request from the submission queue, and the submission queue Contains multiple unprocessed task calculation requests submitted by the client.
当获取到所述服务端发送的所述任务处理请求的处理状态信息时,发出同步请求,具体包括:查询完成队列,当查询到所述服务端发送的所述任务处理请求的处理状态信息时,发出同步请求,所述完成队列中包含多个所述服务端提交的未被所述客户端读取的处理状态信息。When the processing status information of the task processing request sent by the server is obtained, a synchronization request is sent, which specifically includes: querying the completion queue, when the processing status information of the task processing request sent by the server is queried , Send a synchronization request, and the completion queue contains a plurality of processing status information submitted by the server that has not been read by the client.
当获取到所述服务端发送的所述同步请求的成功通知后,获取所述任务处理请求的第一虚拟地址,具体包括:查询所述完成队列,当查询到所述服务端发送的所述同步请求的成功通知后,获取所述任务处理请求的第一虚拟地址。After obtaining the successful notification of the synchronization request sent by the server, obtaining the first virtual address of the task processing request specifically includes: querying the completion queue, and when querying the completion queue sent by the server After the successful notification of the synchronization request, the first virtual address of the task processing request is acquired.
客户端提交GPU请求时,将请求放到提交队列即返回(例如对于异步请求),之后由工作线程,负责将请求发送至服务端;或者服务器端主动查询请求队列中新的请求。服务端收到请求之后,执行处理,并将处理结果放至完成队列。客户端可以异步可查询完成队列中的处理状态信息。When the client submits a GPU request, it puts the request in the submission queue and then returns (for example, for asynchronous requests), and then the worker thread is responsible for sending the request to the server; or the server actively queries for new requests in the request queue. After the server receives the request, it executes the processing and puts the processing result on the completion queue. The client can asynchronously query the processing status information in the completion queue.
本方案同时将队列机制与透明共享内存结合,即客户端和服务端的所有请求都是在共享内存上分配出来,避免请求转发时碰到的请求消息(request)的内存拷贝。This solution also combines the queuing mechanism with transparent shared memory, that is, all requests from the client and the server are allocated on the shared memory to avoid memory copies of the request message (request) encountered when the request is forwarded.
本方法提出通过高效的软件方法使得GPU硬件像CPU那样高效、无损共享,从而显著提高利用率,有效降低成本;并且在独占情况下,还可以优化性能;在大规模部署情况下,采用纯软件虚拟方法来简化运维、管理。This method proposes to use an efficient software method to make GPU hardware as efficient and lossless as the CPU, thereby significantly improving the utilization rate and effectively reducing the cost; and in the case of exclusive use, it can also optimize the performance; in the case of large-scale deployment, pure software is used Virtual methods to simplify operation, maintenance and management.
实施例五Example five
本说明书实施例提供的另一种任务处理方法,其执行主体为搭载了客户端和服务端的机器。该方法可以包括以下步骤:客户端获取应用发送的任务计算请求;所述客户端具有虚拟内存共享功能;所述客户端将所述任务计算请求发送至提交队列;服务端从所述提交队列中获取所述任务计算请求;所述服务器具有虚拟内存共享功能;所述服务端获取所述任务计算请求中的计算数据存储GPU地址和计算结果存储GPU地址;所述服务端获取所述计算数据存储GPU地址对应的第一物理内存地址;所述服务端确定所述第一物理内存地址对应的第二虚拟地址;所述服务端获取为所述任务计算请求分配的GPU地址;所述服务端生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令,以便调用接口执行从所述物理内存地址到所述GPU地址的数据拷贝;所述服务端将所述任务计算请求发送至GPU;所述服务端当所述GPU完成所述任务计算请求对应的计算任务后,生成所述任务计算请求对应的处理状态信息,并将所述处理状态信息发送至完成队列;所述客户端从所述完成队列中查询所述任务计算请求对应的处理状态信息,当查询到时,发出同步请求至所述提交队列;当获取到所述提交队列中的同步请求时,所述服务端获取所述计算结果存放虚拟地址对应的第二物理内存地址;所述服务端确定所述第二物理内存地址对应的第三虚拟地址;所述服务端生成将计算结果从所述GPU地址中拷贝至所述第三虚拟地址的指令,以便调用接口执行从所述GPU地址到所述第二物理内存地址的数据拷贝;当数据拷贝完成后,所述服务端将同步完成的通知发送至所述完成队列中;当在所述完成队列中查询到所述同步完成的通知后,所述客户端从所述第二物理内存地址中获取计算结果,并发送至所述应用。In another task processing method provided by the embodiment of this specification, the execution subject is a machine equipped with a client and a server. The method may include the following steps: the client obtains the task calculation request sent by the application; the client has a virtual memory sharing function; the client sends the task calculation request to the submission queue; the server receives the task calculation request from the submission queue Obtain the task calculation request; the server has a virtual memory sharing function; the server obtains the calculation data storage GPU address and the calculation result storage GPU address in the task calculation request; the server obtains the calculation data storage The first physical memory address corresponding to the GPU address; the server determines the second virtual address corresponding to the first physical memory address; the server obtains the GPU address allocated for the task calculation request; the server generates A data copy instruction from the second virtual address to the GPU address, so as to call an interface to execute the data copy from the physical memory address to the GPU address; the server sends the task calculation request to the GPU; After the GPU completes the calculation task corresponding to the task calculation request, the server generates processing status information corresponding to the task calculation request, and sends the processing status information to the completion queue; The processing status information corresponding to the task calculation request is queried in the completion queue, and when queried, a synchronization request is sent to the submission queue; when the synchronization request in the submission queue is obtained, the server obtains the The calculation result stores the second physical memory address corresponding to the virtual address; the server determines the third virtual address corresponding to the second physical memory address; the server generates and copies the calculation result from the GPU address to the The instruction of the third virtual address, so as to call the interface to execute the data copy from the GPU address to the second physical memory address; when the data copy is completed, the server sends a synchronization completion notification to the completion queue ; When the notification of completion of the synchronization is queried in the completion queue, the client obtains the calculation result from the second physical memory address and sends it to the application.
本说明书实施例提供的方法采取的程序为用户态,可以应用在用户空间。针对不同场景有多种实现方式,部署灵活。总结如下:The program adopted by the method provided in the embodiment of this specification is in the user mode and can be applied in the user space. There are multiple implementation methods for different scenarios and flexible deployment. Summarized as follows:
1、裸机环境(未采用虚拟技术):服务端和客户端都运行在host OS上(例如linux),服务端通过GPU驱动接管所有GPU的访问,包括可根据配置,独占使用某GPU0,或共享使用GPU1。客户端如果与服务端在同一机器,通信可以采用IPC方式(例如UNIX socket,Pipe或者shmem);如不在一台机器,采用socket/RDMA通信。1. Bare metal environment (virtualization technology is not used): Both the server and the client run on the host OS (such as linux), and the server takes over all GPU access through the GPU driver, including exclusive use of a certain GPU0 or sharing based on configuration Use GPU1. If the client and the server are on the same machine, the communication can be IPC (such as UNIX socket, Pipe or shmem); if not on the same machine, socket/RDMA communication is used.
2、容器化环境:在容器环境中,服务端可以以容器化方式运行,接管物理GPU,导出虚拟GPU资源。客户端(例如K8S pod)运行在同一物理机上,链接服务端,客户端与服务端通信可以采用IPC或者网络方式。2. Containerized environment: In a containerized environment, the server can run in a containerized manner, take over the physical GPU, and export virtual GPU resources. The client (such as K8S pod) runs on the same physical machine and is connected to the server. The communication between the client and the server can be IPC or network.
3、虚拟机环境:在典型虚拟机环境中,GPU以pass through给特定的物理机,然后在VM guest OS里,启动服务端或客户端,之后等同于裸机环境。3. Virtual machine environment: In a typical virtual machine environment, the GPU passes through to a specific physical machine, and then in the VM guest OS, the server or client is started, and then it is equivalent to a bare metal environment.
上述方案能够达到的技术效果如下:The technical effects that the above scheme can achieve are as follows:
1、高性能:采用透明内存共享机制避免了额外的内存拷贝,基于polling(轮询)的多队列请求处理可以高效应对典型深度学习任务的高频请求调用。相比于已知方法,性能显著提高。采用本方法的软件虚拟化可以达到性能无损失,虚拟化效率明显优于已知工业界/学术界的硬件和软件虚拟化方案。1. High performance: A transparent memory sharing mechanism is used to avoid additional memory copying. Multi-queue request processing based on polling (polling) can efficiently respond to high-frequency request calls for typical deep learning tasks. Compared with known methods, the performance is significantly improved. The software virtualization using this method can achieve no performance loss, and the virtualization efficiency is significantly better than the known industrial/academic hardware and software virtualization solutions.
2、低开销:由于采用透明共享内存机制,不需要分配临时内存,大幅减少内存开 销;而高效无锁polling也可减少CPU开销(开销常量)。2. Low overhead: Due to the transparent shared memory mechanism, there is no need to allocate temporary memory, which greatly reduces memory overhead; and efficient lock-free polling can also reduce CPU overhead (overhead constant).
3、扩展性:由于以上的高效性和低开销,方案可以应对单机多卡的并发访问。3. Scalability: Due to the above efficiency and low overhead, the solution can cope with the concurrent access of a single machine with multiple cards.
4、透明无侵入:无需修改或重新编译现有应用程序,保持API级别兼容,且核心框架可以方便扩展支持其他异构加速设备,例如NPU等。4. Transparent and non-intrusive: No need to modify or recompile existing applications, maintain API level compatibility, and the core framework can be easily extended to support other heterogeneous acceleration devices, such as NPU.
5、基于上述透明内存共享,对每个设备提供多个请求队列,包括提交和完成队列,提高扩展性,应对多卡并发访问。5. Based on the above transparent memory sharing, multiple request queues are provided for each device, including submission and completion queues, to improve scalability and cope with concurrent access by multiple cards.
6、低开销:大幅减少运行时额外内存分配,且只需一个CPU core可支持多卡并发。6. Low overhead: It greatly reduces the allocation of additional memory during runtime, and only one CPU core is required to support multi-card concurrency.
7、普适性与灵活可扩展:方法支持多种部署环境,可对接已知所有的AI框架和模型,且透明、无侵入;核心方法可独立与GPU设备,也可支持其他加速设备例如阿里的AI芯片等。7. Universality, flexibility and scalability: The method supports a variety of deployment environments, can interface with all known AI frameworks and models, and is transparent and non-intrusive; the core method can be independent of the GPU device, and can also support other acceleration devices such as Ali AI chips and so on.
基于同样的思路,本说明书实施例还提供了上述方法对应的装置。图8为本说明书实施例提供的对应于图3的一种数据传输装置的结构示意图。如图8所示,该装置可以包括:数据传输请求获取模块801,用于获取客户端发送的数据传输请求;第一虚拟地址获取模块802,用于获取所述数据传输请求中的第一虚拟地址;物理内存地址获取模块803,用于获取所述第一虚拟地址对应的物理内存地址;第二虚拟地址确定模块804,用于基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;GPU地址获取模块805,用于获取为所述数据传输请求分配的GPU地址;数据拷贝指令生成模块806,用于生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;接口调用模块807,用于调用GPU驱动的接口执行所述数据拷贝指令。Based on the same idea, the embodiment of this specification also provides a device corresponding to the above method. FIG. 8 is a schematic structural diagram of a data transmission device corresponding to FIG. 3 provided by an embodiment of the specification. As shown in FIG. 8, the device may include: a data transmission request obtaining module 801, configured to obtain a data transmission request sent by a client; and a first virtual address obtaining module 802, configured to obtain a first virtual address in the data transmission request. Address; physical memory address obtaining module 803, configured to obtain the physical memory address corresponding to the first virtual address; second virtual address determining module 804, configured to determine the physical memory based on the mapping relationship between the physical memory address and the virtual address The second virtual address corresponding to the address; the GPU address acquisition module 805, which is used to acquire the GPU address allocated for the data transmission request; the data copy instruction generation module 806, which is used to generate from the second virtual address to the GPU address The data copy instruction; the interface calling module 807 is used to call the GPU-driven interface to execute the data copy instruction.
可选的,所述装置还可以包括:判断模块,用于判断所述物理内存地址是否存储于映射表中。Optionally, the device may further include: a judging module, configured to judge whether the physical memory address is stored in a mapping table.
第二虚拟地址生成模块,用于若否,生成所述物理内存地址对应的第二虚拟地址,并将所述物理内存地址与所述第二虚拟地址的映射关系存储于所述映射表中。The second virtual address generating module is configured to, if not, generate a second virtual address corresponding to the physical memory address, and store the mapping relationship between the physical memory address and the second virtual address in the mapping table.
所述第二虚拟地址确定模块804,具体可以用于:若是,获取所述物理内存地址对应的第二虚拟地址。The second virtual address determining module 804 may be specifically configured to: if yes, obtain the second virtual address corresponding to the physical memory address.
本说明书实施例还提供了对应于图4的一种任务处理装置,所述装置包括:任务计算请求获取模块,用于获取客户端发送的任务计算请求;第一虚拟地址获取模块,用于获取所述任务计算请求中的第一虚拟地址;物理内存地址获取模块,用于获取所述第一虚拟地址对应的物理内存地址;第二虚拟地址确定模块,用于基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;第一GPU地址获取模块,用于获取为所述任务计算请求分配的GPU地址;数据拷贝指令生成模块,用于生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;第一GPU驱动接口调用模块,用于调用GPU驱动的接口执行所述数据拷贝指令;任务计算请求发送模块,用于将所述任务计算请求发送至GPU;处理状态信息生成模块,用于当所述GPU完成所述任务计算请求对应的计算任务后,生成所述任务计算请求对应的处理状态信息;处理状态信息存储模块,用于存储所述处理状态信息。The embodiment of this specification also provides a task processing device corresponding to FIG. 4, the device includes: a task calculation request obtaining module, configured to obtain a task calculation request sent by a client; and a first virtual address obtaining module, configured to obtain The first virtual address in the task calculation request; a physical memory address obtaining module for obtaining the physical memory address corresponding to the first virtual address; a second virtual address determining module for calculating the physical memory address and the virtual address The mapping relationship is used to determine the second virtual address corresponding to the physical memory address; the first GPU address obtaining module is used to obtain the GPU address allocated for the task calculation request; the data copy instruction generation module is used to generate the slave 2. The data copy instruction from the virtual address to the GPU address; the first GPU driver interface calling module is used to call the GPU driver interface to execute the data copy instruction; the task calculation request sending module is used to send the task calculation request To the GPU; a processing state information generation module, used to generate processing state information corresponding to the task calculation request after the GPU completes the calculation task corresponding to the task calculation request; processing state information storage module, used to store the Processing status information.
可选的,所述第一虚拟地址包括计算数据获取虚拟地址和计算结果存放虚拟地址,所述第一虚拟地址获取模块,具体可以用于:获取所述任务计算请求中的计算数据获取虚拟地址;所述物理内存地址获取模块,具体可以用于:获取所述计算数据获取虚拟地址对应的第一物理内存地址;所述第二虚拟地址确定模块,具体可以用于:确定所述第一物理内存地址对应的第二虚拟地址。Optionally, the first virtual address includes a calculation data acquisition virtual address and a calculation result storage virtual address, and the first virtual address acquisition module may be specifically used to: acquire the calculation data acquisition virtual address in the task calculation request The physical memory address obtaining module may be specifically used to: obtain the first physical memory address corresponding to the virtual address obtained by the calculation data; the second virtual address determining module may be specifically used to: determine the first physical The second virtual address corresponding to the memory address.
可选的,所述GPU地址获取模块,具体可以用于:获取为所述任务计算请求分配 的计算数据存储GPU地址和计算结果存储GPU地址;所述生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令,具体包括:生成从所述第二虚拟地址至所述计算数据存储GPU地址的数据拷贝指令。Optionally, the GPU address obtaining module may be specifically used to: obtain the calculation data storage GPU address and the calculation result storage GPU address allocated for the task calculation request; the generation from the second virtual address to the The data copy instruction of the GPU address specifically includes: generating a data copy instruction from the second virtual address to the computing data storage GPU address.
可选的,所述装置还可以包括:第二物理内存地址获取模块,用于当获取到所述客户端发送的计算结果同步请求时,获取所述计算结果存放虚拟地址对应的第二物理内存地址;第三虚拟地址获取模块,用于基于物理内存地址与虚拟地址的映射关系,确定所述第二物理内存地址对应的第三虚拟地址;第二数据拷贝指令生成模块,用于生成将计算结果从所述计算结果存储GPU地址中拷贝至所述第三虚拟地址的数据拷贝指令;第二GPU驱动接口调用模块,用于调用GPU驱动的接口执行所述数据拷贝指令。Optionally, the device may further include: a second physical memory address obtaining module, configured to obtain the second physical memory corresponding to the virtual address of the calculation result when the calculation result synchronization request sent by the client is obtained Address; the third virtual address acquisition module, used to determine the third virtual address corresponding to the second physical memory address based on the mapping relationship between the physical memory address and the virtual address; the second data copy instruction generation module, used to generate the calculation The result is copied from the calculation result storage GPU address to the data copy instruction of the third virtual address; the second GPU driver interface calling module is used to call the GPU driver interface to execute the data copy instruction.
可选的,所述任务计算请求获取模块,具体可以用于:从提交队列中获取所述客户端发送的任务计算请求;所述提交队列中包含多个所述客户端提交的未经过处理的任务计算请求;在生成所述任务计算请求对应的处理状态信息之后,所述装置还可以包括:处理状态信息发送模块,用于将所述任务计算请求对应的处理状态信息发送至完成队列,所述完成队列中包含多个所述服务端提交的未被所述客户端读取的处理状态信息。Optionally, the task calculation request obtaining module may be specifically used to: obtain a task calculation request sent by the client from a submission queue; the submission queue contains a plurality of unprocessed submissions submitted by the client Task calculation request; after generating the processing status information corresponding to the task calculation request, the device may further include: a processing status information sending module, configured to send the processing status information corresponding to the task calculation request to the completion queue, so The completion queue contains a plurality of processing status information submitted by the server but not read by the client.
本说明书实施例还提供了对应于图6的一种数据传输装置,包括:数据传输请求获取模块,用于获取应用发送的数据传输请求;第一虚拟地址获取模块,用于获取所述数据传输请求中的第一虚拟地址;物理内存地址确定模块,用于基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;数据传输请求和所述物理内存地址发送模块,用于将所述数据传输请求和所述物理内存地址发送至服务端,以便所述服务端根据所述数据传输请求和所述物理内存地址执行数据传输,其中,所述服务器基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;获取为所述数据传输请求分配的GPU地址;生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;调用GPU驱动的接口执行所述数据拷贝指令。The embodiment of this specification also provides a data transmission device corresponding to FIG. 6, including: a data transmission request obtaining module, configured to obtain a data transmission request sent by an application; and a first virtual address obtaining module, configured to obtain the data transmission The first virtual address in the request; a physical memory address determining module, configured to determine the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address; the data transmission request and the physical memory address sending Module for sending the data transmission request and the physical memory address to the server, so that the server performs data transmission according to the data transmission request and the physical memory address, wherein the server is based on the physical memory The mapping relationship between the address and the virtual address is determined, the second virtual address corresponding to the physical memory address is determined; the GPU address allocated for the data transmission request is obtained; the data copy instruction from the second virtual address to the GPU address is generated ; Call the GPU driver interface to execute the data copy instruction.
可选的,在所述获取应用发送的数据传输请求之前,所述装置还可以包括:内存分配请求获取模块,用于获取应用发送的内存分配请求;数据获取模块,用于获取所述内存分配请求中的数据;数据存储模块,用于将所述数据存储至第一物理内存地址;第一虚拟地址生成模块,用于将所述第一物理内存地址映射到所述应用的进程空间,生成所述第一物理内存地址对应的第一虚拟地址;存储模块,用于将所述第一虚拟地址发送给所述应用,并存储所述物理内存地址和所述第一虚拟地址的映射关系。Optionally, before the acquiring the data transmission request sent by the application, the device may further include: a memory allocation request acquiring module, configured to acquire the memory allocation request sent by the application; and a data acquiring module, configured to acquire the memory allocation The data in the request; a data storage module for storing the data in a first physical memory address; a first virtual address generating module for mapping the first physical memory address to the process space of the application to generate A first virtual address corresponding to the first physical memory address; a storage module, configured to send the first virtual address to the application, and store the mapping relationship between the physical memory address and the first virtual address.
可选的,所述物理内存地址确定模块,具体可以用于:根据所述映射关系确定所述第一虚拟地址对应的物理内存地址。Optionally, the physical memory address determining module may be specifically configured to determine the physical memory address corresponding to the first virtual address according to the mapping relationship.
本说明书实施例还提供了对应于图7的一种任务处理装置,包括:任务处理请求获取模块,用于获取应用发送的任务处理请求;任务处理请求转发模块,用于转发所述任务处理请求,以便服务端进行获取;同步请求发送模块,用于当获取到所述服务端发送的所述任务处理请求的处理状态信息时,发出同步请求;第一虚拟地址获取模块,用于当获取到所述服务端发送的所述同步请求的成功通知后,获取所述任务处理请求的第一虚拟地址;物理内存地址确定模块,用于基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;计算结果读取模块,用于从所述第一虚拟地址对应的物理内存地址中读取所述任务处理请求的计算结果;计算结果发送模块,用于将所述任务处理请求的计算结果发送至所述应用。The embodiment of the present specification also provides a task processing device corresponding to FIG. 7, including: a task processing request acquisition module for acquiring a task processing request sent by an application; a task processing request forwarding module for forwarding the task processing request , So that the server can obtain; the synchronization request sending module is used to send a synchronization request when the processing status information of the task processing request sent by the server is obtained; the first virtual address obtaining module is used to obtain the After the successful notification of the synchronization request sent by the server, the first virtual address of the task processing request is acquired; the physical memory address determination module is configured to determine the first virtual address based on the mapping relationship between the physical memory address and the virtual address A physical memory address corresponding to a virtual address; a calculation result reading module for reading the calculation result of the task processing request from the physical memory address corresponding to the first virtual address; a calculation result sending module for sending all The calculation result of the task processing request is sent to the application.
可选的,所述第一虚拟地址包括计算数据获取虚拟地址和计算结果存放虚拟地址;所述第一虚拟地址获取模块,具体可以用于:获取所述任务处理请求的计算结果存放虚拟地址;所述物理内存地址确定模块,具体可以用于:确定所述计算结果存放虚拟地址对应的物理内存地址。Optionally, the first virtual address includes a calculation data acquisition virtual address and a calculation result storage virtual address; the first virtual address acquisition module may be specifically used to: obtain a calculation result storage virtual address of the task processing request; The physical memory address determining module may be specifically used to determine the physical memory address corresponding to the virtual address stored in the calculation result.
可选的,所述任务处理请求转发模块,具体可以用于:将所述任务处理请求发送至提交队列,以便所述服务端从所述提交队列中获取所述任务处理请求,所述提交队列中包含多个所述客户端提交的未经过处理的任务计算请求;所述同步请求发送模块,具体可以用于:查询完成队列,当查询到所述服务端发送的所述任务处理请求的处理状态信息时,发出同步请求,所述完成队列中包含多个所述服务端提交的未被所述客户端读取的处理状态信息;所述第一虚拟地址获取模块,具体可以用于:查询所述完成队列,当查询到所述服务端发送的所述同步请求的成功通知后,获取所述任务处理请求的第一虚拟地址。Optionally, the task processing request forwarding module may be specifically configured to: send the task processing request to a submission queue, so that the server can obtain the task processing request from the submission queue, and the submission queue Contains multiple unprocessed task calculation requests submitted by the client; the synchronization request sending module can be specifically used to query the completion queue, when the processing of the task processing request sent by the server is queried When the status information, a synchronization request is sent, and the completion queue contains a plurality of processing status information submitted by the server and not read by the client; the first virtual address acquisition module can be specifically used for: query The completion queue acquires the first virtual address of the task processing request after querying the success notification of the synchronization request sent by the server.
基于同样的思路,本说明书实施例还提供了上述方法对应的设备。Based on the same idea, the embodiment of this specification also provides a device corresponding to the above method.
图9为本说明书实施例提供的对应于图3的一种数据传输设备的结构示意图。如图9所示,设备900可以包括:至少一个处理器910;以及,与所述至少一个处理器通信连接的存储器930;其中,所述存储器930存储有可被所述至少一个处理器910执行的指令920,所述指令被所述至少一个处理器910执行,以使所述至少一个处理器910能够:获取客户端发送的数据传输请求;获取所述数据传输请求中的第一虚拟地址;获取所述第一虚拟地址对应的物理内存地址;基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;获取为所述数据传输请求分配的GPU地址;生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;调用GPU驱动的接口执行所述数据拷贝指令。FIG. 9 is a schematic structural diagram of a data transmission device corresponding to FIG. 3 provided by an embodiment of this specification. As shown in Figure 9, the device 900 may include: at least one processor 910; The instruction 920 of the instruction is executed by the at least one processor 910, so that the at least one processor 910 can: obtain the data transmission request sent by the client; obtain the first virtual address in the data transmission request; Obtain the physical memory address corresponding to the first virtual address; determine the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address; obtain the GPU address allocated for the data transmission request; generate A data copy instruction from the second virtual address to the GPU address; calling an interface driven by the GPU to execute the data copy instruction.
本说明书实施例还提供了对应于图4的一种任务处理设备。所述设备可以包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:获取客户端发送的任务计算请求;获取所述任务计算请求中的第一虚拟地址;获取所述第一虚拟地址对应的物理内存地址;基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;获取为所述任务计算请求分配的GPU地址;生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;调用GPU驱动的接口执行所述数据拷贝指令;将所述任务计算请求发送至GPU;当所述GPU完成所述任务计算请求对应的计算任务后,生成所述任务计算请求对应的处理状态信息;存储所述处理状态信息。The embodiment of this specification also provides a task processing device corresponding to FIG. 4. The device may include: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are Executed by at least one processor, so that the at least one processor can: obtain a task calculation request sent by a client; obtain a first virtual address in the task calculation request; obtain a physical memory address corresponding to the first virtual address Based on the mapping relationship between the physical memory address and the virtual address, determine the second virtual address corresponding to the physical memory address; obtain the GPU address allocated for the task calculation request; generate from the second virtual address to the GPU address Call the GPU-driven interface to execute the data copy instruction; send the task calculation request to the GPU; when the GPU completes the calculation task corresponding to the task calculation request, generate the task calculation request corresponding The processing status information; storing the processing status information.
本说明书实施例还提供了对应于图6的一种数据传输设备。所述设备可以包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:获取应用发送的数据传输请求;获取所述数据传输请求中的第一虚拟地址;基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;将所述数据传输请求和所述物理内存地址发送至服务端,以便所述服务端根据所述数据传输请求和所述物理内存地址执行数据传输,其中,所述服务器基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;获取为所述数据传输请求分配的GPU地址;生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;调用GPU驱动的接口执行所述数据拷贝指令。The embodiment of this specification also provides a data transmission device corresponding to FIG. 6. The device may include: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are Executed by at least one processor, so that the at least one processor can: obtain a data transmission request sent by an application; obtain a first virtual address in the data transmission request; The physical memory address corresponding to the first virtual address; sending the data transmission request and the physical memory address to the server, so that the server performs data transmission according to the data transmission request and the physical memory address, wherein The server determines the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address; obtains the GPU address allocated for the data transmission request; The data copy instruction of the GPU address; call the GPU driver interface to execute the data copy instruction.
本说明书实施例还提供了对应于图7的一种任务处理设备。所述设备可以包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:获取应用发送的任务处理请求;转发所述任务处理请求,以便服务端进行获取;当获取到所述服务端发送的所述任务处理请求的处理状态信息时,发出同步请求;当获取到所述服务端发送的所述同步请求的成功通知后,获取所述任务 处理请求的第一虚拟地址;基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;从所述第一虚拟地址对应的物理内存地址中读取所述任务处理请求的计算结果;将所述任务处理请求的计算结果发送至所述应用。The embodiment of this specification also provides a task processing device corresponding to FIG. 7. The device may include: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are Executed by at least one processor, so that the at least one processor can: obtain the task processing request sent by the application; forward the task processing request so that the server can obtain it; when the task processing sent by the server is obtained When the requested processing status information, a synchronization request is issued; when the successful notification of the synchronization request sent by the server is obtained, the first virtual address of the task processing request is obtained; based on the mapping of physical memory addresses and virtual addresses Relationship, determine the physical memory address corresponding to the first virtual address; read the calculation result of the task processing request from the physical memory address corresponding to the first virtual address; send the calculation result of the task processing request to The application.
本说明书实施例提供的一种计算机可读介质,其上存储有计算机可读指令,所述计算机可读指令可被处理器执行以实现任一所述的方法。An embodiment of the present specification provides a computer-readable medium having computer-readable instructions stored thereon, and the computer-readable instructions can be executed by a processor to implement any of the methods described above.
在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable Gate Array,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等,目前最普遍使用的是VHDL(Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。In the 1990s, the improvement of a technology can be clearly distinguished between hardware improvements (for example, improvements in circuit structures such as diodes, transistors, switches, etc.) or software improvements (improvements in method flow). However, with the development of technology, the improvement of many methods and processes of today can be regarded as a direct improvement of the hardware circuit structure. Designers almost always get the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be realized by the hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (for example, a Field Programmable Gate Array (Field Programmable Gate Array, FPGA)) is such an integrated circuit whose logic function is determined by the user's programming of the device. It is programmed by the designer to "integrate" a digital system on a piece of PLD, without requiring chip manufacturers to design and manufacture dedicated integrated circuit chips. Moreover, nowadays, instead of manually making integrated circuit chips, this kind of programming is mostly realized with "logic compiler" software, which is similar to the software compiler used in program development and writing, but before compilation The original code must also be written in a specific programming language, which is called Hardware Description Language (HDL), and there is not only one type of HDL, but many types, such as ABEL (Advanced Boolean Expression Language) , AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description), etc., currently most commonly used It is VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. It should also be clear to those skilled in the art that just a little bit of logic programming of the method flow in the above-mentioned hardware description languages and programming into an integrated circuit can easily obtain the hardware circuit that implements the logic method flow.
控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。The controller can be implemented in any suitable manner. For example, the controller can take the form of, for example, a microprocessor or a processor and a computer-readable medium storing computer-readable program codes (such as software or firmware) executable by the (micro)processor. , Logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers. Examples of controllers include but are not limited to the following microcontrollers: ARC625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as part of the memory control logic. Those skilled in the art also know that, in addition to implementing the controller in a purely computer-readable program code manner, it is entirely possible to program the method steps to make the controller use logic gates, switches, application specific integrated circuits, programmable logic controllers, and embedded logic. The same function can be realized in the form of a microcontroller or the like. Therefore, such a controller can be regarded as a hardware component, and the devices included in it for realizing various functions can also be regarded as a structure within the hardware component. Or even, the device for realizing various functions can be regarded as both a software module for realizing the method and a structure within a hardware component.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。The systems, devices, modules, or units illustrated in the above embodiments may be specifically implemented by computer chips or entities, or implemented by products with certain functions. A typical implementation device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a cell phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Any combination of these devices.
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above device, the functions are divided into various units and described separately. Of course, when implementing this application, the functions of each unit can be implemented in the same or multiple software and/or hardware.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方 面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present invention. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are used to generate It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, the computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。The memory may include non-permanent memory in a computer readable medium, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer readable media.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带式磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. The information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or equipment including a series of elements not only includes those elements, but also includes Other elements that are not explicitly listed, or also include elements inherent to such processes, methods, commodities, or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, commodity, or equipment that includes the element.
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。This application may be described in the general context of computer-executable instructions executed by a computer, such as a program module. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. This application can also be practiced in distributed computing environments. In these distributed computing environments, tasks are performed by remote processing devices connected through a communication network. In a distributed computing environment, program modules can be located in local and remote computer storage media including storage devices.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部 分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。The various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above descriptions are only examples of the present application, and are not used to limit the present application. For those skilled in the art, this application can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the scope of the claims of this application.

Claims (22)

  1. 一种数据传输方法,所述方法应用于GPU虚拟化系统中的服务端,包括:A data transmission method, which is applied to a server in a GPU virtualization system, includes:
    获取客户端发送的数据传输请求;Obtain the data transmission request sent by the client;
    获取所述数据传输请求中的第一虚拟地址;Acquiring the first virtual address in the data transmission request;
    获取所述第一虚拟地址对应的物理内存地址;Acquiring a physical memory address corresponding to the first virtual address;
    基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;Determining the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address;
    获取为所述数据传输请求分配的GPU地址;Acquiring the GPU address allocated for the data transmission request;
    生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;Generating a data copy instruction from the second virtual address to the GPU address;
    调用GPU驱动的接口执行所述数据拷贝指令。Call the GPU driver interface to execute the data copy instruction.
  2. 如权利要求1所述的方法,在所述确定所述物理内存地址对应的第二虚拟地址之前,还包括:The method according to claim 1, before said determining the second virtual address corresponding to the physical memory address, further comprising:
    判断所述物理内存地址是否存储于映射表中;Judging whether the physical memory address is stored in a mapping table;
    若否,生成所述物理内存地址对应的第二虚拟地址,并将所述物理内存地址与所述第二虚拟地址的映射关系存储于所述映射表中;If not, generate a second virtual address corresponding to the physical memory address, and store the mapping relationship between the physical memory address and the second virtual address in the mapping table;
    所述确定所述物理内存地址对应的第二虚拟地址,具体包括:The determining the second virtual address corresponding to the physical memory address specifically includes:
    若是,获取所述物理内存地址对应的第二虚拟地址。If yes, obtain the second virtual address corresponding to the physical memory address.
  3. 一种任务处理方法,所述方法应用于GPU虚拟化系统中的服务端,包括:A task processing method, which is applied to a server in a GPU virtualization system, includes:
    获取客户端发送的任务计算请求;Obtain the task calculation request sent by the client;
    获取所述任务计算请求中的第一虚拟地址;Acquiring the first virtual address in the task calculation request;
    获取所述第一虚拟地址对应的物理内存地址;Acquiring a physical memory address corresponding to the first virtual address;
    基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;Determining the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address;
    获取为所述任务计算请求分配的GPU地址;Acquiring the GPU address allocated for the task calculation request;
    生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;Generating a data copy instruction from the second virtual address to the GPU address;
    调用GPU驱动的接口执行所述数据拷贝指令;Calling the GPU-driven interface to execute the data copy instruction;
    将所述任务计算请求发送至GPU;Sending the task calculation request to the GPU;
    当所述GPU完成所述任务计算请求对应的计算任务后,生成所述任务计算请求对应的处理状态信息;After the GPU completes the calculation task corresponding to the task calculation request, generate processing state information corresponding to the task calculation request;
    存储所述处理状态信息。Store the processing status information.
  4. 如权利要求3所述的方法,所述第一虚拟地址包括计算数据获取虚拟地址和计 算结果存放虚拟地址,所述获取所述任务计算请求中的第一虚拟地址,具体包括:The method according to claim 3, wherein the first virtual address includes a virtual address for obtaining a calculation data and a virtual address for storing a calculation result, and the obtaining of the first virtual address in the task calculation request specifically includes:
    获取所述任务计算请求中的计算数据获取虚拟地址;Acquiring the computing data in the task computing request to acquire a virtual address;
    所述获取所述第一虚拟地址对应的物理内存地址,具体包括:The obtaining the physical memory address corresponding to the first virtual address specifically includes:
    获取所述计算数据获取虚拟地址对应的第一物理内存地址;Obtaining the first physical memory address corresponding to the virtual address by obtaining the calculation data;
    所述确定所述物理内存地址对应的第二虚拟地址,具体包括:The determining the second virtual address corresponding to the physical memory address specifically includes:
    确定所述第一物理内存地址对应的第二虚拟地址。Determine the second virtual address corresponding to the first physical memory address.
  5. 如权利要求4所述的方法,所述获取为所述任务计算请求分配的GPU地址,具体包括:The method according to claim 4, wherein said obtaining the GPU address allocated for the task calculation request specifically comprises:
    获取为所述任务计算请求分配的计算数据存储GPU地址和计算结果存储GPU地址;Acquiring the calculation data storage GPU address and the calculation result storage GPU address allocated for the task calculation request;
    所述生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令,具体包括:The generating a data copy instruction from the second virtual address to the GPU address specifically includes:
    生成从所述第二虚拟地址至所述计算数据存储GPU地址的数据拷贝指令。Generate a data copy instruction from the second virtual address to the computing data storage GPU address.
  6. 如权利要求5所述的方法,在所述生成所述任务计算请求对应的处理状态信息之后,还包括:5. The method according to claim 5, after said generating the processing state information corresponding to the task calculation request, further comprising:
    当获取到所述客户端发送的计算结果同步请求时,获取所述计算结果存放虚拟地址对应的第二物理内存地址;When obtaining the calculation result synchronization request sent by the client, obtaining the second physical memory address corresponding to the virtual address for storing the calculation result;
    基于物理内存地址与虚拟地址的映射关系,确定所述第二物理内存地址对应的第三虚拟地址;Determining the third virtual address corresponding to the second physical memory address based on the mapping relationship between the physical memory address and the virtual address;
    生成将计算结果从所述计算结果存储GPU地址中拷贝至所述第三虚拟地址的数据拷贝指令;Generating a data copy instruction for copying a calculation result from the calculation result storage GPU address to the third virtual address;
    调用GPU驱动的接口执行所述数据拷贝指令。Call the GPU driver interface to execute the data copy instruction.
  7. 如权利要求3所述的方法,所述获取客户端发送的任务计算请求,具体包括:The method according to claim 3, wherein said obtaining the task calculation request sent by the client specifically includes:
    从提交队列中获取所述客户端发送的任务计算请求;所述提交队列中包含多个所述客户端提交的未经过处理的任务计算请求;Obtain the task calculation request sent by the client from a submission queue; the submission queue contains a plurality of unprocessed task calculation requests submitted by the client;
    在生成所述任务计算请求对应的处理状态信息之后,还包括:After generating the processing state information corresponding to the task calculation request, the method further includes:
    将所述任务计算请求对应的处理状态信息发送至完成队列,所述完成队列中包含多个所述服务端提交的未被所述客户端读取的处理状态信息。The processing status information corresponding to the task calculation request is sent to a completion queue, and the completion queue contains a plurality of processing status information submitted by the server and not read by the client.
  8. 一种数据传输方法,所述方法应用于GPU虚拟化系统中的客户端,包括:A data transmission method, which is applied to a client in a GPU virtualization system, includes:
    获取应用发送的数据传输请求;Obtain the data transmission request sent by the application;
    获取所述数据传输请求中的第一虚拟地址;Acquiring the first virtual address in the data transmission request;
    基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;Determining the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address;
    将所述数据传输请求和所述物理内存地址发送至服务端,以便所述服务端根据所述数据传输请求和所述物理内存地址执行数据传输,其中,所述服务器基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;获取为所述数据传输请求分配的GPU地址;生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;调用GPU驱动的接口执行所述数据拷贝指令。The data transmission request and the physical memory address are sent to the server, so that the server performs data transmission according to the data transmission request and the physical memory address, wherein the server is based on the physical memory address and the virtual address To determine the second virtual address corresponding to the physical memory address; obtain the GPU address allocated for the data transmission request; generate a data copy instruction from the second virtual address to the GPU address; call the GPU driver The interface executes the data copy instruction.
  9. 如权利要求8所述的方法,在所述获取应用发送的数据传输请求之前,还包括:8. The method according to claim 8, before said obtaining the data transmission request sent by the application, further comprising:
    获取应用发送的内存分配请求;Get the memory allocation request sent by the application;
    获取所述内存分配请求中的数据;Acquiring data in the memory allocation request;
    将所述数据存储至第一物理内存地址;Storing the data in the first physical memory address;
    将所述第一物理内存地址映射到所述应用的进程空间,生成所述第一物理内存地址对应的第一虚拟地址;Mapping the first physical memory address to the process space of the application to generate a first virtual address corresponding to the first physical memory address;
    将所述第一虚拟地址发送给所述应用,并存储所述物理内存地址和所述第一虚拟地址的映射关系。The first virtual address is sent to the application, and the mapping relationship between the physical memory address and the first virtual address is stored.
  10. 如权利要求9所述的方法,所述确定所述第一虚拟地址对应的物理内存地址,具体包括:The method according to claim 9, wherein the determining the physical memory address corresponding to the first virtual address specifically includes:
    根据所述映射关系确定所述第一虚拟地址对应的物理内存地址。The physical memory address corresponding to the first virtual address is determined according to the mapping relationship.
  11. 一种任务处理方法,所述方法应用于GPU虚拟化系统中的客户端,包括:A task processing method, which is applied to a client in a GPU virtualization system, includes:
    获取应用发送的任务处理请求;Obtain the task processing request sent by the application;
    转发所述任务处理请求,以便服务端进行获取;Forward the task processing request so that the server can obtain it;
    当获取到所述服务端发送的所述任务处理请求的处理状态信息时,发出同步请求;When the processing status information of the task processing request sent by the server is obtained, a synchronization request is issued;
    当获取到所述服务端发送的所述同步请求的成功通知后,获取所述任务处理请求的第一虚拟地址;After obtaining the successful notification of the synchronization request sent by the server, obtaining the first virtual address of the task processing request;
    基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;Determining the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address;
    从所述第一虚拟地址对应的物理内存地址中读取所述任务处理请求的计算结果;Reading the calculation result of the task processing request from the physical memory address corresponding to the first virtual address;
    将所述任务处理请求的计算结果发送至所述应用。The calculation result of the task processing request is sent to the application.
  12. 如权利要求11所述的方法,所述第一虚拟地址包括计算数据获取虚拟地址和计算结果存放虚拟地址;11. The method according to claim 11, wherein the first virtual address comprises a virtual address for obtaining a calculation data and a virtual address for storing a calculation result;
    所述获取所述任务处理请求的第一虚拟地址,具体包括:The acquiring the first virtual address of the task processing request specifically includes:
    获取所述任务处理请求的计算结果存放虚拟地址;Obtaining a virtual address for storing a calculation result of the task processing request;
    所述确定所述第一虚拟地址对应的物理内存地址,具体包括:The determining the physical memory address corresponding to the first virtual address specifically includes:
    确定所述计算结果存放虚拟地址对应的物理内存地址。It is determined that the physical memory address corresponding to the virtual address is stored in the calculation result.
  13. 如权利要求11所述的方法,转发所述任务处理请求,具体包括:The method according to claim 11, forwarding the task processing request specifically includes:
    将所述任务处理请求发送至提交队列,以便所述服务端从所述提交队列中获取所述任务处理请求,所述提交队列中包含多个所述客户端提交的未经过处理的任务计算请求;Send the task processing request to a submission queue, so that the server obtains the task processing request from the submission queue, and the submission queue contains a plurality of unprocessed task calculation requests submitted by the client ;
    当获取到所述服务端发送的所述任务处理请求的处理状态信息时,发出同步请求,具体包括:When the processing status information of the task processing request sent by the server is obtained, the synchronization request is sent, which specifically includes:
    查询完成队列,当查询到所述服务端发送的所述任务处理请求的处理状态信息时,发出同步请求,所述完成队列中包含多个所述服务端提交的未被所述客户端读取的处理状态信息;Query the completion queue, when the processing status information of the task processing request sent by the server is queried, a synchronization request is issued, and the completion queue contains multiple submitted by the server that have not been read by the client Processing status information;
    当获取到所述服务端发送的所述同步请求的成功通知后,获取所述任务处理请求的第一虚拟地址,具体包括:After obtaining the successful notification of the synchronization request sent by the server, obtaining the first virtual address of the task processing request specifically includes:
    查询所述完成队列,当查询到所述服务端发送的所述同步请求的成功通知后,获取所述任务处理请求的第一虚拟地址。The completion queue is inquired, and after the successful notification of the synchronization request sent by the server is inquired, the first virtual address of the task processing request is acquired.
  14. 一种数据传输装置,包括:A data transmission device includes:
    数据传输请求获取模块,用于获取客户端发送的数据传输请求;The data transmission request acquisition module is used to acquire the data transmission request sent by the client;
    第一虚拟地址获取模块,用于获取所述数据传输请求中的第一虚拟地址;A first virtual address obtaining module, configured to obtain the first virtual address in the data transmission request;
    物理内存地址获取模块,用于获取所述第一虚拟地址对应的物理内存地址;A physical memory address obtaining module, configured to obtain a physical memory address corresponding to the first virtual address;
    第二虚拟地址确定模块,用于基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;The second virtual address determining module is configured to determine the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address;
    GPU地址获取模块,用于获取为所述数据传输请求分配的GPU地址;GPU address obtaining module, configured to obtain the GPU address allocated for the data transmission request;
    数据拷贝指令生成模块,用于生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;A data copy instruction generating module, configured to generate a data copy instruction from the second virtual address to the GPU address;
    接口调用模块,用于调用GPU驱动的接口执行所述数据拷贝指令。The interface calling module is used to call the GPU-driven interface to execute the data copy instruction.
  15. 一种任务处理装置,包括:A task processing device includes:
    任务计算请求获取模块,用于获取客户端发送的任务计算请求;The task calculation request obtaining module is used to obtain the task calculation request sent by the client;
    第一虚拟地址获取模块,用于获取所述任务计算请求中的第一虚拟地址;The first virtual address obtaining module is configured to obtain the first virtual address in the task calculation request;
    物理内存地址获取模块,用于获取所述第一虚拟地址对应的物理内存地址;A physical memory address obtaining module, configured to obtain a physical memory address corresponding to the first virtual address;
    第二虚拟地址确定模块,用于基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;The second virtual address determining module is configured to determine the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address;
    GPU地址获取模块,用于获取为所述任务计算请求分配的GPU地址;GPU address obtaining module, configured to obtain the GPU address allocated for the task calculation request;
    数据拷贝指令生成模块,用于生成从所述第二虚拟地址至所述GPU地址的数据拷 贝指令;A data copy instruction generating module, configured to generate a data copy instruction from the second virtual address to the GPU address;
    GPU驱动接口调用模块,用于调用GPU驱动的接口执行所述数据拷贝指令;The GPU driver interface calling module is used to call the GPU driver interface to execute the data copy instruction;
    任务计算请求发送模块,用于将所述任务计算请求发送至GPU;A task calculation request sending module, configured to send the task calculation request to the GPU;
    处理状态信息生成模块,用于当所述GPU完成所述任务计算请求对应的计算任务后,生成所述任务计算请求对应的处理状态信息;A processing state information generating module, configured to generate processing state information corresponding to the task calculation request after the GPU completes the calculation task corresponding to the task calculation request;
    处理状态信息存储模块,用于存储所述处理状态信息。The processing state information storage module is used to store the processing state information.
  16. 一种数据传输装置,包括:A data transmission device includes:
    数据传输请求获取模块,用于获取应用发送的数据传输请求;The data transmission request acquisition module is used to acquire the data transmission request sent by the application;
    第一虚拟地址获取模块,用于获取所述数据传输请求中的第一虚拟地址;A first virtual address obtaining module, configured to obtain the first virtual address in the data transmission request;
    物理内存地址确定模块,用于基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;A physical memory address determining module, configured to determine the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address;
    数据传输请求和所述物理内存地址发送模块,用于将所述数据传输请求和所述物理内存地址发送至服务端,以便所述服务端根据所述数据传输请求和所述物理内存地址执行数据传输,其中,所述服务器基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;获取为所述数据传输请求分配的GPU地址;生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;调用GPU驱动的接口执行所述数据拷贝指令。The data transmission request and the physical memory address sending module is configured to send the data transmission request and the physical memory address to the server, so that the server executes the data according to the data transmission request and the physical memory address Transmission, wherein the server determines the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address; obtains the GPU address allocated for the data transmission request; A data copy instruction from an address to the GPU address; calling an interface driven by the GPU to execute the data copy instruction.
  17. 一种任务处理装置,包括:A task processing device includes:
    任务处理请求获取模块,用于获取应用发送的任务处理请求;The task processing request obtaining module is used to obtain the task processing request sent by the application;
    任务处理请求转发模块,用于转发所述任务处理请求,以便服务端进行获取;The task processing request forwarding module is used to forward the task processing request so that the server can obtain it;
    同步请求发送模块,用于当获取到所述服务端发送的所述任务处理请求的处理状态信息时,发出同步请求;The synchronization request sending module is configured to send a synchronization request when the processing status information of the task processing request sent by the server is obtained;
    第一虚拟地址获取模块,用于当获取到所述服务端发送的所述同步请求的成功通知后,获取所述任务处理请求的第一虚拟地址;The first virtual address obtaining module is configured to obtain the first virtual address of the task processing request after obtaining the successful notification of the synchronization request sent by the server;
    物理内存地址确定模块,用于基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;A physical memory address determining module, configured to determine the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address;
    计算结果读取模块,用于从所述第一虚拟地址对应的物理内存地址中读取所述任务处理请求的计算结果;A calculation result reading module, configured to read the calculation result of the task processing request from the physical memory address corresponding to the first virtual address;
    计算结果发送模块,用于将所述任务处理请求的计算结果发送至所述应用。The calculation result sending module is configured to send the calculation result of the task processing request to the application.
  18. 一种数据传输设备,包括:A data transmission device, including:
    至少一个处理器;以及,At least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,A memory communicatively connected with the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can:
    获取客户端发送的数据传输请求;Obtain the data transmission request sent by the client;
    获取所述数据传输请求中的第一虚拟地址;Acquiring the first virtual address in the data transmission request;
    获取所述第一虚拟地址对应的物理内存地址;Acquiring a physical memory address corresponding to the first virtual address;
    基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;Determining the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address;
    获取为所述数据传输请求分配的GPU地址;Acquiring the GPU address allocated for the data transmission request;
    生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;Generating a data copy instruction from the second virtual address to the GPU address;
    调用GPU驱动的接口执行所述数据拷贝指令。Call the GPU driver interface to execute the data copy instruction.
  19. 一种任务处理设备,包括:A task processing device, including:
    至少一个处理器;以及,At least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,A memory communicatively connected with the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can:
    获取客户端发送的任务计算请求;Obtain the task calculation request sent by the client;
    获取所述任务计算请求中的第一虚拟地址;Acquiring the first virtual address in the task calculation request;
    获取所述第一虚拟地址对应的物理内存地址;Acquiring a physical memory address corresponding to the first virtual address;
    基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;Determining the second virtual address corresponding to the physical memory address based on the mapping relationship between the physical memory address and the virtual address;
    获取为所述任务计算请求分配的GPU地址;Acquiring the GPU address allocated for the task calculation request;
    生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;Generating a data copy instruction from the second virtual address to the GPU address;
    调用GPU驱动的接口执行所述数据拷贝指令;Calling the GPU-driven interface to execute the data copy instruction;
    将所述任务计算请求发送至GPU;Sending the task calculation request to the GPU;
    当所述GPU完成所述任务计算请求对应的计算任务后,生成所述任务计算请求对应的处理状态信息;After the GPU completes the calculation task corresponding to the task calculation request, generate processing state information corresponding to the task calculation request;
    存储所述处理状态信息。Store the processing status information.
  20. 一种数据传输设备,包括:A data transmission device, including:
    至少一个处理器;以及,At least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,A memory communicatively connected with the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can:
    获取应用发送的数据传输请求;Obtain the data transmission request sent by the application;
    获取所述数据传输请求中的第一虚拟地址;Acquiring the first virtual address in the data transmission request;
    基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;Determining the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address;
    将所述数据传输请求和所述物理内存地址发送至服务端,以便所述服务端根据所述数据传输请求和所述物理内存地址执行数据传输,其中,所述服务器基于物理内存地址与虚拟地址的映射关系,确定所述物理内存地址对应的第二虚拟地址;获取为所述数据传输请求分配的GPU地址;生成从所述第二虚拟地址至所述GPU地址的数据拷贝指令;调用GPU驱动的接口执行所述数据拷贝指令。The data transmission request and the physical memory address are sent to the server, so that the server performs data transmission according to the data transmission request and the physical memory address, wherein the server is based on the physical memory address and the virtual address To determine the second virtual address corresponding to the physical memory address; obtain the GPU address allocated for the data transmission request; generate a data copy instruction from the second virtual address to the GPU address; call the GPU driver The interface executes the data copy instruction.
  21. 一种任务处理设备,包括:A task processing device, including:
    至少一个处理器;以及,At least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,A memory communicatively connected with the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can:
    获取应用发送的任务处理请求;Obtain the task processing request sent by the application;
    转发所述任务处理请求,以便服务端进行获取;Forward the task processing request so that the server can obtain it;
    当获取到所述服务端发送的所述任务处理请求的处理状态信息时,发出同步请求;When the processing status information of the task processing request sent by the server is obtained, a synchronization request is issued;
    当获取到所述服务端发送的所述同步请求的成功通知后,获取所述任务处理请求的第一虚拟地址;After obtaining the successful notification of the synchronization request sent by the server, obtaining the first virtual address of the task processing request;
    基于物理内存地址与虚拟地址的映射关系,确定所述第一虚拟地址对应的物理内存地址;Determining the physical memory address corresponding to the first virtual address based on the mapping relationship between the physical memory address and the virtual address;
    从所述第一虚拟地址对应的物理内存地址中读取所述任务处理请求的计算结果;Reading the calculation result of the task processing request from the physical memory address corresponding to the first virtual address;
    将所述任务处理请求的计算结果发送至所述应用。The calculation result of the task processing request is sent to the application.
  22. 一种计算机可读介质,其上存储有计算机可读指令,所述计算机可读指令可被处理器执行以实现权利要求1至13中任一项所述的方法。A computer-readable medium having computer-readable instructions stored thereon, and the computer-readable instructions can be executed by a processor to implement the method according to any one of claims 1 to 13.
PCT/CN2020/132846 2020-02-11 2020-11-30 Data transmission and task processing methods, apparatuses and devices WO2021159820A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010086948.9 2020-02-11
CN202010086948.9A CN111309649B (en) 2020-02-11 2020-02-11 Data transmission and task processing method, device and equipment

Publications (1)

Publication Number Publication Date
WO2021159820A1 true WO2021159820A1 (en) 2021-08-19

Family

ID=71145245

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/132846 WO2021159820A1 (en) 2020-02-11 2020-11-30 Data transmission and task processing methods, apparatuses and devices

Country Status (2)

Country Link
CN (1) CN111309649B (en)
WO (1) WO2021159820A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111309649B (en) * 2020-02-11 2021-05-25 支付宝(杭州)信息技术有限公司 Data transmission and task processing method, device and equipment
CN112925737B (en) * 2021-03-30 2022-08-05 上海西井信息科技有限公司 PCI heterogeneous system data fusion method, system, equipment and storage medium
CN114359015B (en) * 2021-12-08 2023-08-04 北京百度网讯科技有限公司 Data transmission method, device and graphic processing server
CN114741214B (en) * 2022-04-01 2024-02-27 新华三技术有限公司 Data transmission method, device and equipment
CN114884881B (en) * 2022-05-12 2023-07-07 福建天晴在线互动科技有限公司 Data compression transmission method and terminal

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005036405A1 (en) * 2003-10-08 2005-04-21 Unisys Corporation Computer system para-virtualization using a hypervisor that is implemented in a partition of the host system
CN107025183A (en) * 2012-08-17 2017-08-08 英特尔公司 Shared virtual memory
CN108804199A (en) * 2017-05-05 2018-11-13 龙芯中科技术有限公司 Graphics processor virtual method and device
CN111309649A (en) * 2020-02-11 2020-06-19 支付宝(杭州)信息技术有限公司 Data transmission and task processing method, device and equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521015B (en) * 2011-12-08 2014-03-26 华中科技大学 Equipment virtualization method under embedded platform
CN103559078B (en) * 2013-11-08 2017-04-26 华为技术有限公司 GPU (Graphics Processing Unit) virtualization realization method as well as vertex data caching method and related device
JP6846537B2 (en) * 2016-12-27 2021-03-24 深▲せん▼前海達闥雲端智能科技有限公司Cloudminds (Shenzhen) Robotics Systems Co., Ltd. Display methods, equipment and electronics for multi-operating systems
CN107193759A (en) * 2017-04-18 2017-09-22 上海交通大学 The virtual method of device memory administrative unit

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005036405A1 (en) * 2003-10-08 2005-04-21 Unisys Corporation Computer system para-virtualization using a hypervisor that is implemented in a partition of the host system
CN107025183A (en) * 2012-08-17 2017-08-08 英特尔公司 Shared virtual memory
CN108804199A (en) * 2017-05-05 2018-11-13 龙芯中科技术有限公司 Graphics processor virtual method and device
CN111309649A (en) * 2020-02-11 2020-06-19 支付宝(杭州)信息技术有限公司 Data transmission and task processing method, device and equipment

Also Published As

Publication number Publication date
CN111309649A (en) 2020-06-19
CN111309649B (en) 2021-05-25

Similar Documents

Publication Publication Date Title
WO2021159820A1 (en) Data transmission and task processing methods, apparatuses and devices
US11681564B2 (en) Heterogeneous computing-based task processing method and software and hardware framework system
EP3754498B1 (en) Architecture for offload of linked work assignments
WO2021051914A1 (en) Gpu resource-based data processing method and system, and electronic device
US20190197655A1 (en) Managing access to a resource pool of graphics processing units under fine grain control
US8478926B1 (en) Co-processing acceleration method, apparatus, and system
US9176794B2 (en) Graphics compute process scheduling
US9244629B2 (en) Method and system for asymmetrical processing with managed data affinity
US8381230B2 (en) Message passing with queues and channels
US10109030B1 (en) Queue-based GPU virtualization and management system
US9176795B2 (en) Graphics processing dispatch from user mode
US20120229481A1 (en) Accessibility of graphics processing compute resources
US20180219797A1 (en) Technologies for pooling accelerator over fabric
JP2016513846A (en) Memory sharing over the network
JP2014508982A (en) Dynamic work partition of heterogeneous processing devices
US11863469B2 (en) Utilizing coherently attached interfaces in a network stack framework
CN112256414A (en) Method and system for connecting multiple computing storage engines
EP4184324A1 (en) Efficient accelerator offload in multi-accelerator framework
JP2021533455A (en) VMID as a GPU task container for virtualization
KR102326280B1 (en) Method, apparatus, device and medium for processing data
US8543722B2 (en) Message passing with queues and channels
KR101620896B1 (en) Executing performance enhancement method, executing performance enhancement apparatus and executing performance enhancement system for map-reduce programming model considering different processing type
CN112114967B (en) GPU resource reservation method based on service priority
US11481255B2 (en) Management of memory pages for a set of non-consecutive work elements in work queue designated by a sliding window for execution on a coherent accelerator
US20130141446A1 (en) Method and Apparatus for Servicing Page Fault Exceptions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20918696

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20918696

Country of ref document: EP

Kind code of ref document: A1