WO2015067043A1

WO2015067043A1 - Gpu virtualization realization method as well as vertex data caching method and related device

Info

Publication number: WO2015067043A1
Application number: PCT/CN2014/079557
Authority: WO
Inventors: 徐利成
Original assignee: 华为技术有限公司
Priority date: 2013-11-08
Filing date: 2014-06-10
Publication date: 2015-05-14
Also published as: CN103559078A; WO2015067043A9; CN103559078B

Abstract

Disclosed in the present invention are a GPU (Graphic Processing Unit) virtualization realization method as well as a vertex data caching method and a related device. The method comprises: a graphics client intercepting vertex array class commands; caching vertex data to create a first cache area, sending a synchronization command to a graphics server to create a second cache area, and the second cache area and the first cache area forming a mapping relationship of the vertex data; querying in local data, packing and sending the vertex array class commands to the graphics server to render a picture according to the vertex data of the second cache area and the packed vertex array class commands if one piece of vertex data consistent with the intercepted vertex data exists in the local data, and if there is no vertex data consistent with the intercepted vertex data in the local data, resolving and sending the vertex array class commands to the graphics server to render the picture according to the resolved vertex array class commands. By doing as above, the present invention enables a drastic reduction of the delay and the bandwidth of transmission paths, and also reduces CPU (Central Processing Unit) consumption of memory sharing, increases VM (Virtual Machine) density and lowers cost.

Description

GPU virtualization implementation method and vertex data caching method and related device The application is filed on November 08, 2013, the Chinese Patent Office, application number 201310554845.0, the invention name is "GPU virtualization implementation method and vertex data caching method and related device" The priority of the Chinese Patent Application, the entire contents of which is incorporated herein by reference.

Technical field

The present invention relates to the field of virtualization technologies, and in particular, to a GPU virtualization implementation method and a vertex data buffering method and related apparatus. Background technique

GPU (Graphic Processing Unit) is mainly used for floating point arithmetic and parallel computing, and is often used for professional graphics operations. GPU virtualization technology is to allow virtualized instances running on data center servers to share the same block or multiple GPU processors for graphics operations. From the perspective of products that have already been implemented, the virtualization solution based on DirectX 3d is relatively mature, and its performance and experience are close to the level of physical machines. In the more widely used HD graphics field, most of them are used. 3D software is more based on the OpenGL (Open Graphics Library) specification, which is the most difficult application problem for enterprises.

Implementation of GPU Virtualization Technology Based on Opengl Directive There is an open source code Chromium. Chromium essentially implements a cross-network remote rendering process. In the Chromium architecture, vertex arrays allow Opengl drivers to get properties such as vertices, colors, normal vectors, etc. directly from the application's memory. The use of vertex arrays minimizes the overhead of function calls and reduces the amount of data that must be packed into the command cache in the display driver. However, during the remote rendering process, the vertex array pointers intercepted from the application layer are allocated on the graphics client. If the vertex array pointer is directly transmitted to the graphics server for use, an error will be generated. Chromium decomposes a glArrayElement instruction call into an equivalent glVertex3f, glNormal3f, glColor3f, or glTexCoord2f call, which converts the glArrayElement's pass pointer class parameter instructions into a series of passed-valued class parameter instructions. The number of decoded instructions is the number of instructions before the decomposition. 100 Multiple times, the amount of data transmitted by the network will increase abruptly, which will generate a large amount of delay, occupy the bandwidth of the transmission channel, increase the consumption of the CPU by the memory sharing, and cause the VM (Virtual Machine ware, virtual machine) to have low density and high cost. . Summary of the invention

Embodiments of the present invention provide a GPU virtualization implementation method and a vertex data caching method and related apparatus, which can greatly reduce the bandwidth of the delay and the transmission channel, reduce the CPU consumption of the memory sharing, increase the VM density, and reduce the cost.

The first aspect provides a GPU virtualization implementation method, comprising: a graphics client intercepting a vertex array class instruction; performing vertex data caching to create a first buffer area, sending a synchronization instruction to a graphics server to create a second buffer area, and a second buffer The region and the first buffer area form a mapping relationship of vertex data, and the vertex data is obtained from the vertex array class instruction, including the vertex array pointer and the vertex array length; the query is performed in the local data, if there is a vertex data and the intercepted in the local data If the vertex data is consistent, the vertex array class instruction is packaged and sent to the graphics server, so that the graphics server renders the image according to the vertex data of the second buffer area and the packed vertex array class instruction, and if not, the vertex array class instruction is decomposed. And sending to the graphics server, so that the graphics server renders the image according to the decomposed vertex array class instruction, wherein the local data is vertex data pre-existing in the graphics client, and the vertex data can be sent and used for the graphics server without decomposition.

In a first possible implementation manner of the first aspect, the method further includes: the graphics client receiving the image sent by the graphics server through the data channel and pasting the graphic device interface; and redirecting the vertex array class instruction to the TC through the graphic device interface The end executes the vertex array class instruction and generates a screen.

In a second possible implementation manner of the first aspect, performing vertex data caching to create the first buffer area includes: if the newly added vertex data is historical data, but the cached first buffer area is released or its vertex array length is Need to update to a larger value, create a temporary buffer; copy the newly added vertex data into the temporary buffer; copy the vertex data from the temporary buffer to the first buffer.

In a third possible implementation of the first aspect, the vertex data is buffered to create a first buffer, the synchronization instruction is sent to the graphics server to create a second buffer, and the second buffer forms vertex data with the first buffer. The mapping relationship includes: performing vertex data caching and creating the first a buffer area; sending a synchronization instruction to the graphics server to create a second buffer area, the synchronization instruction includes a vertex array pointer, and the second buffer area forms a mapping relationship with the first buffer area by the vertex array pointer.

In a fourth possible implementation of the first aspect, the first buffer area is located in the graphics client.

In a fifth possible implementation of the first aspect, the first buffer area is located in the shared memory. The second aspect provides a GPU virtualization implementation method, including: receiving a synchronization instruction and creating a second buffer area for vertex data caching, and forming a mapping relationship between the second buffer area and the first buffer area of the graphics client to form vertex data, and vertex data. Include a vertex array pointer and a vertex array length; determine, according to the vertex array pointer, whether the second buffer area has a corresponding vertex data cached, and if so, the receiving graphics client sends the packed vertex array class instruction through the data channel, and according to the second The vertex data of the buffer and the packed vertex array class instruction render the image for transmission to the graphics client; if not, receive the decomposed vertex array class instruction sent by the graphics client, and according to the decomposed vertex array class The instruction renders the image for transmission to the graphics client.

In a first possible implementation manner of the second aspect, the receiving the synchronization instruction and creating the second buffer area for performing vertex data buffering, the mapping relationship between the second buffer area and the first buffer area of the graphics client forming vertex data includes: receiving a synchronization instruction sent by the graphics client, wherein the synchronization instruction includes a vertex array pointer; a second buffer area is created according to the synchronization instruction to perform vertex data buffering, and the second buffer area forms vertex data through the vertex array pointer and the first buffer area of the graphics client. Mapping relationship.

In a second possible implementation of the second aspect, the second buffer is located in the graphics server.

In a third possible implementation of the second aspect, the second buffer is located in the shared memory. The third aspect provides a method for buffering vertex data in a GPU, comprising: creating a first buffer area by a graphics client to perform vertex data caching, wherein the vertex data includes a vertex array pointer and a vertex array length; and sending a synchronization instruction to the graphics server The synchronization instruction includes a vertex array pointer; the second buffer area is created by the graphics server according to the synchronization instruction, and the vertex data is buffered, and the second buffer area forms a mapping relationship between the vertex data and the first buffer area by the vertex array pointer.

In a first possible implementation of the third aspect, the vertex data cache is cached. The unit mode learns, predicts, and corrects the vector, including vertex array pointers and the learning, prediction, and correction of vertex array lengths.

In a second possible implementation of the third aspect, the buffer unit mode includes: indicating a first address of the apex array and a length of each byte; and drawing the geometric unit according to the offset of the first address.

In a third possible implementation of the third aspect, the learning, predicting, and correcting the vertex array pointer includes: obtaining a vertex array class instruction; using a vertex array pointer for a hash lookup; determining whether the hit is true, and if so, setting the current The cache data pointer is used to draw the vertex pointer; if not, the vertex array pointer and related feature information are added to the Hashtable; the cached data pointer is transparently transmitted.

In a fourth possible implementation of the third aspect, the learning, predicting, and correcting the vertex array length includes: obtaining a vertex instruction; determining whether the vertex data has been cached, and if so, determining whether the vertex buffer data exists locally. In the data, if yes, pass the vertex pointer transparently, if not, decompose the vertex pointer; if the vertex data is not cached, determine whether the vertex array length needs to be updated, if necessary, update the vertex array length, if not needed Then, the vertex pointer is decomposed, wherein the local data is vertex data pre-existing in the graphics client, and the vertex data can be sent and used for the graphics server without being decomposed.

The fourth aspect provides a GPU graphics client, including an instruction acquisition module, a first cache module, a query module, and a sending module, wherein: the instruction acquisition module is configured to intercept a vertex array class instruction; and the first cache module is configured to perform vertex data caching. To create a first buffer area, send a synchronization instruction to the graphics server to create a second buffer area, the second buffer area forms a mapping relationship with the first buffer area, and the vertex data is obtained from the vertex array class instruction, including the vertex array pointer. And the vertex array length; the query module is used to query in the local data. If there is a vertex data in the local data that is consistent with the intercepted vertex data, the sending module packages and sends the vertex array class instruction to the graphics server to make the graphics server Rendering the image according to the vertex data of the second buffer area and the packed vertex array class instruction. If not, the sending module decomposes the vertex array class instruction and sends the instruction to the graphics server, so that the graphics server renders according to the decomposed vertex array class instruction. Image, where local data To pre-exist the vertex data of the graphics client, the vertex data can be sent and used for the graphics server without decomposition.

In a first possible implementation manner of the fourth aspect, the graphics client further includes a first receiving module and a graphic device interface, where: the first receiving module is configured to receive the image through the data channel and Paste to the graphics device interface; the graphics device interface redirects vertex array class instructions to the TC side to execute vertex array class instructions and generate a screen shot.

In a second possible implementation manner of the fourth aspect, the sending module further sends a synchronization instruction to the graphics server, where the synchronization instruction includes a vertex array pointer, and the first buffer area forms vertex data through the vertex array pointer and the second buffer area of the graphics server. Mapping relationship.

In a third possible implementation manner of the fourth aspect, if the newly added vertex data is historical data, but the cached first buffer area is released or its vertex array length needs to be updated to a larger value, the first cache is used. The module is also used to: create a temporary buffer; copy the newly added vertex data into the temporary buffer; copy the vertex data from the temporary buffer to the first buffer.

A fifth aspect provides a GPU graphics server, including a second cache module, a second receiving module, and a rendering module, wherein: the second cache module is configured to create a second buffer area for vertex data caching, and the second buffer area and the graphics client The first buffer area of the end forms a mapping relationship of vertex data, the vertex data includes a vertex array pointer and a vertex array length; and the second receiving module is configured to determine, according to the vertex array pointer, whether the second buffer area has a corresponding vertex data cached, and if so, Receiving the packed vertex array class instruction sent by the graphics client, and the rendering module renders the image according to the vertex data of the second buffer area and the packed vertex array class instruction to send to the graphics client; if not, the second receiving module Receiving the decomposed vertex array class instruction sent by the graphics client, and the rendering module renders the image according to the decomposed vertex array class instruction for sending to the graphics client.

In a first possible implementation manner of the fifth aspect, the second cache module further receives a synchronization instruction sent by the graphics client, where the synchronization instruction includes a vertex array pointer, and the second cache module creates a second buffer area according to the synchronization instruction. The vertex data is cached, and the second buffer area forms a mapping relationship between the vertex data and the first buffer area of the graphics client by the vertex array pointer.

A sixth aspect provides an apparatus for buffering vertex data in a GPU, comprising: a first cache module, configured to create a first buffer area on a graphics client, and perform vertex data caching, wherein the vertex data includes a vertex array pointer and a vertex array length. a sending module, configured to send a synchronization instruction to the graphics server, wherein the synchronization instruction includes a vertex array pointer; the second cache module is configured to create a second buffer area according to the synchronization instruction by the graphics server, perform vertex data buffering, and the second buffer area The mapping relationship between the vertex data and the first buffer area is formed by the vertex array pointer.

In a first possible implementation manner of the sixth aspect, the first cache module is configured by using a cache unit The formula is the learning, prediction, and correction of the vertex array pointer and the vertex array length.

In a second possible implementation of the sixth aspect, the cache unit mode includes indicating a first address of the vertex array and a length of each byte; the geometric unit is drawn according to the offset of the first address.

In a third possible implementation manner of the sixth aspect, when the vertex array pointer is learned, predicted, and corrected, the first cache module is configured to: obtain a vertex array class instruction;

Hash lookup; determine whether a hit, if yes, set to the current cache data pointer for drawing vertex pointers; if not, add vertex array pointers and related feature information to the Hashtable; pass the cached data pointer.

In a fourth possible implementation manner of the sixth aspect, for learning, predicting, and correcting the length of the vertex array, the first cache module is configured to: obtain a vertex instruction; determine whether the vertex data has been cached, and if so, Determine whether the vertex buffer data exists in the local data, if so, transparently draw the vertex pointer, if not, decompose the vertex pointer; if the vertex data is not cached, determine whether the vertex array length needs to be updated, if necessary, The vertex array length is updated, and if not required, the vertex pointer is decomposed, wherein the local data is vertex data pre-existing in the graphics client, and the vertex data can be sent and used for the graphics server without decomposition.

The invention intercepts the vertex array class instruction through the graphics client; performs vertex data buffering to create the first buffer area, sends the synchronization instruction to the graphics server to create the second buffer area, and the second buffer area forms a mapping with the first buffer area to form vertex data. Relationship; query in local data, if there is a vertex data in the local data that is consistent with the intercepted vertex data, the vertex array class instruction is packaged and sent to the graphics server, so that the graphics server according to the vertex data of the second buffer area and The packaged vertex array class instruction renders the image. If it does not exist, the vertex array class instruction is decomposed and sent to the graphics server, so that the graphics server renders the image according to the decomposed vertex array class instruction; the second buffer area and the first buffer After the region forms the mapping relationship of the vertex data, it does not need to decompose the vertex array class instruction, which can solve the problem that the vertex array class instruction used in the graphics server can directly generate errors, so that even if there are still some vertex array class instructions Decompose, but the total number of instructions to be transmitted is large In order to reduce, the time required for transmitting all instructions is reduced, and the bandwidth consumption is also reduced, so that the bandwidth of the delay and the transmission channel can be greatly reduced, the CPU consumption of the memory sharing is reduced, the VM density is increased, and the cost is reduced. DRAWINGS In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work. among them:

1 is a schematic structural diagram of a system for implementing GPU virtualization according to a first embodiment of the present invention; FIG. 2 is a schematic flowchart of a method for implementing GPU virtualization according to a first embodiment of the present invention; FIG. 3 is a GPU of a second embodiment of the present invention; FIG. 4 is a schematic flowchart of a method for virtualizing a vertex data in a GPU according to a first embodiment of the present invention; FIG.

5 is a schematic diagram showing a structure of a cache unit mode of a method for buffering vertex data in a GPU according to a first embodiment of the present invention;

6 is a flow chart showing a method for learning, predicting, and correcting a vertex array pointer in a method for buffering vertex data in a GPU according to a first embodiment of the present invention;

7 is a flow chart showing a method for learning, predicting, and correcting a vertex array length in a method for buffering vertex data in a GPU according to a first embodiment of the present invention;

8 is a flow chart showing the process of updating the vertices of the vertices in the method of vertice data buffering in the GPU according to the first embodiment of the present invention;

9 is a schematic structural diagram of a GPU graphics client according to a first embodiment of the present invention;

10 is a schematic structural diagram of a GPU graphics server according to a first embodiment of the present invention; FIG. 11 is a schematic structural diagram of an apparatus for buffering vertex data in a GPU according to a first embodiment of the present invention;

12 is a schematic structural diagram of a GPU graphics client according to a second embodiment of the present invention; FIG. 13 is a schematic structural diagram of a GPU graphics server according to a second embodiment of the present invention; FIG. 14 is a system for implementing GPU virtualization according to a second embodiment of the present invention; Schematic diagram of the structure. detailed description

The invention will now be described in detail in conjunction with the drawings and embodiments.

First, please refer to FIG. 1. FIG. 1 is a schematic structural diagram of a system for implementing GPU virtualization according to a first embodiment of the present invention. As shown in FIG. 1, the GPU virtualization implementation system 10 includes a graphics client 11, a graphics server 12, a data channel 13, a graphics card 14, and a TC (Thin Client, thin client). End 15 , wherein the graphics client 11 includes a GDI (Graphic Device Interface) 110. The graphics client 11 is connected to the graphics server 12 via a data channel 13, the graphics card 14 is connected to the graphics server 12, and the TC terminal 15 is connected to the graphics device interface 110 of the graphics client 11.

In the present embodiment, the graphics client 11 intercepts the vertex array class instructions, creates the first buffer area 111, performs vertex data buffering, and sends synchronization instructions to the graphics server 12 via the data channel 13. The vertex data is obtained from the vertex array class instruction, including the vertex array pointer and the vertex array length, and the synchronization instruction includes the vertex array pointer and the content of the vertex array. After receiving the synchronization instruction, the graphics server 12 creates a second buffer area 121, and the second buffer area 121 establishes a mapping relationship between the vertex data and the first buffer area 111 through the vertex array pointer. In this embodiment, the creation of the first buffer area 111 and the second buffer area 121 is finally performed according to the intercepted vertex array class instruction, which is a continuous process. The graphics client 11 also queries in the local data. If there is a vertex data in the local data that is consistent with the intercepted vertex data, the vertex array class instruction is cache optimized, that is, the vertex array class instruction is packaged and sent to The graphics server, the graphics server renders the image according to the vertex data of the second buffer area and the packed vertex array class instruction; if not, the vertex array class instruction is decomposed and sent to the graphics server, and the graphics server according to the decomposed vertex array The class instructions render the picture, wherein the local data is vertex data pre-existing in the graphics client 11, the vertex data being sent and used for the graphics server 12 without decomposition. The rendered image can be, but is not limited to, a three-dimensional image, or a two-dimensional image, and the image can be a combination of one or more images or a part of a complete image. Specifically, the graphics client 11 learns, predicts, and corrects the vertex array pointer and the vertex array length by using the cache unit mode as a carrier, thereby determining whether the cached vertex data exists in the local data, and if present, the vertex array class instruction. Cache optimization, if it does not exist, the vertex array class instruction is decomposed, that is, the vertex instruction of the passed value class is used, and the vertex data is saved in the Hashtable for the next cache optimization. In GPU virtualization technology, the number of decomposed instructions is more than 100 times the number of pre-decomposition instructions, which causes the amount of data transmitted by the network to increase abruptly, which in turn generates a large amount of delay and occupies the bandwidth of the transmission channel. In this embodiment, when the intercepted vertex data is consistent with the local data, the vertex array class instruction is cache-optimized, so that the vertex array class instruction is not required to be decomposed, and the vertex array class directly transmitted through the graphics server 12 can be solved. The instruction will generate an error, so even if there are still some vertex array class instructions to be divided Solution, but the total number of instructions to be transmitted is greatly reduced, which reduces the time required to transmit all instructions and reduces the bandwidth usage. Therefore, while ensuring the consistency of the cached vertex data, the time can be greatly reduced. Extend the bandwidth of the transmission channel, reduce the CPU consumption of memory sharing, increase the VM density, and reduce the cost.

In this embodiment, there is a vertex data in the local data that is consistent with the intercepted vertex data. When the intercepted vertex data exists in the local data, the graphics client 11 packages the vertex array class instruction and sends it to the graphics server through the data channel 13. 12, the graphics server 12 unpacks the vertex array class instruction and sends it to the graphics card 14 to render the image; when the intercepted vertex data does not exist in the local data, the graphics client 11 sends the decomposed vertex array class instruction through the data channel 13 To the graphics server 12, the graphics server 12 is then sent to the graphics card 44 to render the picture. The graphics server 12 copies the picture into the memory through the screen capture and sends it to the graphics client 11 through the data channel 13, the graphics client 11 receives the picture and pastes it to the graphics device interface 110, and the graphics device interface 110 re-points the vertex array class instruction. The TC end 15 is directed to execute vertex array class instructions and generate a screen shot. The data channel 13 may be TCP/IP (Transmission Control Protocol/Internet Protocol), SR-IOV (Single-Root I/O Virtualization, Single Direct I/O Virtualization RDMA (Remote Direct) Memory Access, remote memory direct access) and any of the shared memory.

2 is a schematic flow chart of a method for implementing GPU virtualization according to a first embodiment of the present invention. As shown in FIG. 2, the graphics client 11 shown in FIG. 1 is specifically described as a main body. The GPU virtualization implementation method in this embodiment includes:

510: The graphics client 11 intercepts the vertex array class instruction. Specifically, the TC terminal 15 sends a 3D instruction to the graphics device interface 110 of the graphics client 11 through mouse and keyboard redirection, and the graphics client 11 is driven by the OpenGL ICD (Interface Control Document) of the graphics device interface 110. 3D instructions can be intercepted. The 3D instructions include glGet* return-transfer instructions, glSwapBuffer and other instructions that need to be sent immediately, vertex array-like instructions with pointer parameters, and instructions that can be aggregated and packaged. In this embodiment, it is mainly processed for vertex array class instructions with pointer parameters.

511: Perform vertex data caching to create a first buffer area 111, send a synchronization instruction to the graphics server 12 to create a second buffer area 121, and the second buffer area 121 forms a mapping relationship with the first buffer area 111 to form vertex data, and the vertex data is from Obtained in the vertex array class instruction, including the vertex array Pointer and vertex array length. Specifically, the graphics client 11 creates a first buffer area 111 for buffering vertex data, and simultaneously sends a synchronization instruction to the graphics server 12 through the data channel 13, the synchronization instruction including the vertex array pointer and the contents of the vertex array, and the vertex array pointer. A mapping relationship is established with the vertex data of the second buffer area of the graphics server 12. In this embodiment, the creation of the first buffer area 111 and the second buffer area 121 is finally performed according to the intercepted vertex array class instruction, which is a continuous process. If the newly added vertex data is historical data, but the cached first buffer area has been released or its vertex array length needs to be updated to a larger value, the graphics client 11 also updates the vertex array length to create a temporary buffer area, which will The newly added vertex data is copied into the temporary buffer area, and then the entire vertex data is copied from the temporary buffer area to the first buffer area 111. Upon receiving the synchronization instruction, the graphics server 12 immediately creates the second buffer area 121, copies the contents of the vertex array from the synchronization instruction, and buffers the vertex data. In this way, the first buffer area 111 and the second buffer area 121 establish a mapping relationship by the vertex array pointer, thereby ensuring the consistency of the cached vertex data. In this embodiment, the first buffer area may be located in the graphics client 11 or in the shared memory.

S12: Query in the local data, if there is a vertex data in the local data that is consistent with the intercepted vertex data, the vertex array class instruction is packaged and sent to the graphics server 12, so that the graphics server 12 according to the vertices of the second buffer area The data and packed vertex array class instructions render the image, if not, the vertex array class instructions are decomposed and sent to the graphics server 12 to cause the graphics server 12 to render the image based on the decomposed vertex array class instructions. The rendered image can be, but is not limited to, a three-dimensional image, or a two-dimensional image, and the image can be a combination of one or more images or a part of a complete image. The local data is the vertex data pre-existing in the graphics client 11, and the vertex data can be sent and used for the graphics server 12 without being decomposed. Specifically, the process of vertex array caching is a process of predicting data, and the prediction result may be correct or wrong, so the data verification process is indispensable. Each time before using vertex data, it is necessary to query in the local data, that is, the graphics client 11 uses the cache unit mode as a carrier to learn, predict, and correct the vertex array pointer and the vertex array length to determine whether the cached vertex data is The local data is in the middle. If it exists, the intercepted vertex data can be cache-optimized, that is, according to the characteristics of the vertex array class instruction, the corresponding packaging processing is performed; if it does not exist, the cache optimization cannot be performed, and only the vertex array can be obtained. The class instruction is decomposed, using the vertex instruction of the passed value class, and the vertex data is saved as historical data. Hashtable for the next cache optimization.

In this embodiment, the data channel 13 can be any of TCP/IP, SR-IOV, RDMA, and shared memory. The picture is compressed by the graphics server 12 to generate a compressed code stream, and the graphics client 11 receives the compressed code stream through the data channel 13 and decompresses it. The graphics client 11 then calls the bitblt() interface to paste the image into the graphics area of the 3D application of the graphics device interface 110, and redirects the vertex array class instructions to the TC terminal 15 via the graphics device interface 110 to execute the vertex array class instructions and generate Screen shot.

In this embodiment, by creating the first buffer area 111 in the graphics client 11, the second buffer area 121 is created in the graphics server 12, and the second buffer area 121 and the first buffer area 111 form a mapping of vertex data through the vertex array pointer. Relationship, and when the intercepted vertex data exists in the local data, the cache optimization of the vertex data is performed, so that the vertex array class instruction is not needed to be decomposed, and the vertex array class instruction used in the graphics server 12 to directly transmit the error may be generated. The problem is that even if there are still some vertex array class instructions that need to be decomposed, the total number of instructions to be transferred is greatly reduced, which reduces the time required to transfer all the instructions, and also reduces the bandwidth usage, so it can be greatly reduced. Delay and bandwidth of the transmission channel reduce the CPU consumption of memory sharing, increase VM density, and reduce cost.

FIG. 3 is a schematic flowchart diagram of a method for implementing GPU virtualization according to a second embodiment of the present invention. As shown in FIG. 3, the graphics server 12 shown in FIG. 1 is specifically described as a main body. The GPU virtualization implementation method in this embodiment includes:

S20: Receive a synchronization instruction and create a second buffer area 121 for vertex data buffering, and the second buffer area 121 forms a mapping relationship with the first buffer area 111 of the graphics client 11 to form vertex data, and the vertex data includes a vertex array pointer and a vertex array. length. Specifically, the graphics server 12 receives the synchronization instructions sent by the graphics client 11. The synchronization instruction includes the contents of the vertex array pointer and the vertex array. The graphics server 12 creates a second buffer area 121 according to the synchronization instruction to perform vertex data buffering, and forms a mapping relationship between the vertex data and the first buffer area 111 of the graphics client 11 through the vertex array pointer, so that the vertex array class instruction can be cached. Optimization, so that there is no need to decompose the vertex array class instructions, which can solve the problem that the vertex array class instruction used in the graphics server 12 will directly generate errors, so that even if there are still some vertex array class instructions to be decomposed, the total The number of instructions to be transmitted is greatly reduced, thereby reducing the time required to transfer all instructions and reducing the bandwidth usage, thus ensuring cached vertex data. At the same time, it can greatly reduce the bandwidth of the delay and transmission channel, and reduce the memory sharing pair.

CPU consumption increases VM density and reduces costs. In this embodiment, the creation of the first buffer area 111 and the second buffer area 121 is ultimately performed based on the intercepted vertex array class instructions, which is a continuous process. The second buffer area may be located in the graphics server 12 or in the shared memory.

S21: determining, according to the vertex array pointer, whether the second buffer area 121 is buffered with corresponding vertex data, and if so, receiving the packed vertex array class instruction sent by the graphics client 11, and according to the vertex data of the second buffer area 121 and The packed vertex array class instruction renders the image for sending to the graphics client 11, and if not, receives the decomposed vertex array class instruction sent by the graphics client 11, and renders the image according to the decomposed vertex array class instruction. To send to the graphics client 11.

In this embodiment, when the second buffer area 121 caches the vertex data corresponding to the vertex array pointer, the graphics server 12 receives the vertex array class instruction sent by the graphics client 11 through the data channel 13, and according to the characteristics of the vertex array class instruction itself. It is correspondingly unpacked. The graphics server 12 then sends the unwrapped vertex array class instructions to the graphics card 14. When the second buffer area 121 does not cache the vertex data corresponding to the vertex array pointer, the graphics server 12 receives the decomposed vertex array class instruction sent by the graphics client 11 and sends it to the graphics card 14. The graphics card 14 executes the vertices array class instruction and renders the image and saves it in the video memory. The rendered image may be, but not limited to, a three-dimensional image or a two-dimensional image, and the image may be a combination of one or more images or a part of a complete image. The graphics server 12 copies the pictures into the memory through screen capture. Since the picture is relatively large, the graphics server 12 compresses the picture, and then sends the compressed code stream to the graphics client 11 through the transmission channel 13 so that the graphics client 11 decompresses the compressed code stream and vertices through the graphics device interface 110. The array class instructions are redirected to the TC side 15 to execute the vertex array class instructions and generate a screen shot.

4 is a flow chart showing a method of vertex data buffering in a GPU according to a first embodiment of the present invention. As shown in FIG. 4, the method for buffering vertex data in the GPU of this embodiment includes:

S30: Create a first buffer area 111 through the graphics client 11 to perform vertex data caching, wherein the vertex data includes a vertex array pointer and a vertex array length.

In this embodiment, the vertex data buffer is learned, predicted, and corrected using the cache unit mode as a carrier, including vertex array pointers and learning, prediction, and correction of vertex array lengths. Therefore, the choice of cache unit mode is the primary problem to solve the vertex data cache, which is mainly a A question of granularity considerations. When you choose a large-grained mode, the overhead of finding, correcting, etc. is small, but the content is subject to change and the overall performance is affected. For example, a large-grained mode can consider caching in units of frames, so that not only vertex data can be cached, but also 3D instructions can be cached, but the data between each frame will always be different and the difference will be large, and the difference processing will lead to performance degradation. . Choosing a small-grained mode, the content of the cache changes little and relatively stable, but the overhead of finding, correcting, etc. will be relatively large. In the embodiment of the present invention, the structure of the cache unit mode is as shown in FIG. In the OpenGL specification, the role of gl*P ₀ inter is to indicate the first address of the vertex array and the length of each byte. The subsequent vertex instructions glDrawArray/glDrawElements are all based on the offset of the first address of the vertex array to draw the geometry unit. Until the next gPPointer instruction appears, indicating the end of a cache unit mode. Among them, gPPointer uses the gl vertexPointer/ glNormalPointer or glInterLeavedArrays _{0 in} Figure 5 to use this mode for vertex data caching, moderate granularity, small overhead, and good cache content stability.

As shown in Figure 6, the methods for learning, predicting, and correcting vertex array pointers include:

S40: Intercept the gPPointer instruction. The vertex array pointer can be obtained from the gPPointer directive. S41: Use the vertex array pointer as a Hash lookup.

542: Determine if a hit. If yes, execute S43; if no, execute S44. Specifically, it is judged whether the obtained vertex array pointer is the same as the pre-stored vertex array pointer in the Hashtable.

543: Set to the current vertex array pointer for use by drawing vertex instructions.

S44: Add the vertex array pointer and related feature information to the Hashtable.

S45: Transparent transmission of the gPPointer instruction.

Thus, the correction of a vertex array pointer in a cache unit mode is completed. The above process is repeated until the correction of all vertex array pointers in the cache unit mode is completed. After that, the learning, prediction and correction of the vertex array length are performed, that is, the correction of the vertex instruction is completed, so that the geometric unit is drawn based on the offset of the vertex array first address. Specifically, as shown in FIG. 7, the method of learning, predicting, and correcting the length of the Array of vertices includes:

S50 : Intercept the glDraw Array instruction. The glDraw Array directive includes the glDrawArray/glDrawElement directive in Figure 5. The length of the vertex array is available in the glDraw Arrays/ glDrawElements directive.

S51: Whether the vertex data has been cached. If no, execute S52; if yes, execute S53.

552: Whether the vertex array length needs to be updated. If yes, execute S54; if no, execute S55.

553: Whether the vertex data exists in the local data. If not, execute S55; if yes, execute S56. The local data is non-vertex data pre-existing in the graphics client, and the vertex data can be sent and used for the graphics server 12 without being decomposed.

554: Update the vertex array length. The specific method is shown in Figure 8 below.

555: The glDrawArray instruction is decomposed. It can be seen that if the intercepted vertex data does not exist in the local data, or the intercepted vertex data is not cached, the cache optimization cannot be performed, and only the glDrawArray instruction can be decomposed, using the vertex instruction of the passed value class, and The vertex data is saved as historical data in the Hashtable for the next cache optimization.

556: Transparent pass glDrawArray instruction. That is, if the intercepted vertex data exists in the local data, the cache optimization can be performed. The above process is repeated until the correction of all the vertex instructions of the cache unit mode is completed. The learning, prediction, and correction of vertex array pointers and vertex array lengths of Figures 6 and 7 are then repeated to complete the caching of vertex data for all cache unit modes. In the learning, prediction and correction of the vertex array pointer and the vertex array length, it is judged whether the cached vertex data exists in the local data, and if so, the vertex array class instruction is cache-optimized, and if not, the vertex array class instruction is decomposed. That is, use the vertex instruction of the pass-value class, and save the vertex data in the Hashtable for the next cache optimization. In this embodiment, after the cache optimization of the vertex array class instruction is performed, the vertex array class instruction does not need to be decomposed, and the problem that the vertex array class instruction directly used in the graphics server 12 generates an error may be solved, even if Some vertex array class instructions need to be decomposed, but the total number of instructions to be transferred is greatly reduced, which reduces the time required to transfer all instructions and reduces the bandwidth consumption, thus greatly reducing the delay and transmission channel. The bandwidth reduces the CPU consumption of memory sharing, increases VM density, and reduces costs.

531: Send a synchronization instruction to the graphics server 12, wherein the synchronization instruction includes a vertex array pointer.

532: Create a second buffer area 121 according to the synchronization instruction by the graphics server 12 to perform vertex data buffering, and the second buffer area 121 forms a mapping relationship between the vertex data and the first buffer area 111 by using the vertex array pointer. As can be seen from the above, by performing a traversal according to the structure of the cache unit mode, the vertex array pointer and the vertex array length can be learned, so that the second buffer area 121 can be created. The graphics server 12 also copies the contents of the vertex array from the synchronization instructions in the second buffer area 121.

In this embodiment, if the newly added vertex data is historical data, but the cached first buffer area has been released or its vertex array length needs to be updated to a larger value, in order to ensure learning, prediction and correction of the vertex array The reliability of pointer and vertex array lengths requires updating the vertex array length. Specifically, as shown in FIG. 8, it is assumed that when traversing the (k-1)th cache unit mode, the vertex array length needs to be updated to a larger value, including:

560: Update the vertex array length. Specifically, when traversing the (k-1)th cache unit mode, the vertex array pointer of the cache unit mode is first recorded in the first buffer area, and is updated when the vertex array length needs to be updated to a larger value. .

561: Copy the newly added vertex data into the temporary buffer. Specifically, the temporary buffer area is first created, and the newly added data is instantly copied into the temporary buffer area. When the (k-1)th cache unit mode is traversed, the temporary buffer area has already cached the historical data because It is an instant copy, so this copying process is reliable.

562: Create a buffer for the previous mode. Specifically, in order to prevent the temporary buffer area data from being overwritten, the last cache unit mode is to ensure that the vertex data transfer of the temporary buffer area is completed before the (k)th cache unit mode traversal. Therefore, at the beginning of the (k)th cache unit mode, the buffer area of the last cache unit mode is created, that is, the buffer area of the (k-1)th cache unit mode. The vertex data of the temporary buffer area is copied as a whole to the buffer area of the (k-1)th cache unit mode. The buffer area of the (k-1)th cache unit mode and the buffer area of the (k)th cache unit mode are referred to as the first buffer area 111.

563: Send a synchronization command to the graphics server 12. The aforementioned S60-S63 are all completed by the graphics client 11.

564: Create a second buffer area 121. Specifically, the graphics server 12 creates the second buffer area 121 according to the synchronization instruction, and forms a mapping relationship with the first buffer area 111 of the graphics client 11 through the vertex array pointer of the graphics client 11, thereby ensuring the consistency of the cached vertex data. Sex.

In this embodiment, the first cache area 111 is created by the graphics client 11, the vertex data is buffered, and the synchronization instruction is sent to the graphics server 12 to create the second buffer area 121, the first buffer area 111 and the second buffer area. 121 through the vertex array pointer to form a mapping of vertex data off Therefore, the vertex array class instruction can be cache-optimized, so that the vertex array class instruction does not need to be decomposed, and the problem that the vertex array class instruction using the direct pass-through in the graphics server 12 can cause an error, even if there is still a part. The vertex array class instructions need to be decomposed, but the total number of instructions to be transferred is greatly reduced, thereby reducing the time required to transfer all instructions and reducing the bandwidth usage, thus ensuring the consistency of the cached vertex data. It can greatly reduce the bandwidth of the delay and transmission channel, reduce the CPU consumption of memory sharing, increase the VM density, and reduce the cost. In this embodiment, the creation of the first buffer area 111 and the second buffer area 121 is finally performed according to the intercepted vertex array class instruction, which is a continuous process.

FIG. 9 is a schematic structural diagram of a GPU graphics client according to a first embodiment of the present invention. As shown in FIG. 9, the GPU virtualization implementation method of the first embodiment is described. The graphics client 11 includes a graphics device interface 110, a first buffer area 111, an instruction acquisition module 112, and a first cache module 113. The query module 114, the sending module 115, and the first receiving module 116.

In this embodiment, the instruction acquisition module 112 is configured to intercept vertex array class instructions. The first cache module 113 is configured to create a first buffer area 111, perform vertex data buffering, and send a synchronization instruction to the graphics server 12 to create a mapping of the second buffer area 121 and the first buffer area 111 to form vertex data. Relationships, vertex data is obtained from vertex array class instructions, including vertex array pointers and vertex array lengths. In this embodiment, the creation of the first buffer area 111 and the second buffer area 121 is ultimately performed according to the intercepted vertex array class instruction, which is a continuous process. The query module 114 is configured to perform a query in the local data. If there is a vertex data in the local data that is consistent with the intercepted vertex data, that is, the intercepted vertex data exists in the local data, the sending module 115 packages and sends the vertex array class instruction. To the graphics server 12, so that the graphics server 12 renders the image according to the vertex data of the second buffer area 121 and the packed vertex array class instruction, that is, cache optimization of the vertex array class instruction, if not, the sending module 115 decomposes the vertex The array class instruction, that is, the drawing vertex instruction using the value class, and saves the vertex data in the Hashtable for the next cache optimization, and sends it to the graphics server 12 to cause the graphics server 12 to render according to the decomposed vertex array class instruction. Out of the picture. The rendered image can be, but is not limited to, a three-dimensional image, or a two-dimensional image, and the image can be a combination of one or more images or a part of a complete image. The local data is the vertex data pre-existing in the graphics client 11, and the vertex data can be sent and used for the graphics server 12 without being decomposed. The first receiving module 116 is configured to receive a picture and paste it to the graphics device interface 110. Graphic device connection Port 110 redirects vertex array class instructions to TC side 15 to execute vertex array class instructions and generate a screen shot.

Further, the sending module 115 further sends a synchronization instruction to the graphics server 12 to create a second buffer area 121. The synchronization instruction includes a vertex array pointer, and the second buffer area 121 forms a mapping relationship between the vertex data and the first buffer area 111 through the vertex array pointer. Therefore, the vertex array class instruction can be cache-optimized, so that the vertex array class instruction does not need to be decomposed, and the problem that the vertex array class instruction using the direct pass-through in the graphics server 12 will generate an error, even if there are still some vertices. Array class instructions need to be decomposed, but the total number of instructions to be transferred is greatly reduced, which reduces the time required to transfer all instructions and reduces the bandwidth usage, thus ensuring the consistency of the cached vertex data. Significantly reduce the bandwidth of the delay and transmission channel, reduce the CPU consumption of memory sharing, increase the VM density, and reduce the cost.

Optionally, if the newly added vertex data is historical data, but the cached first buffer area is released or its vertex array length needs to be updated to a larger value, the first cache module 113 is further configured to create a temporary buffer area. Copy the newly added vertex data to the temporary buffer. The vertex data is then copied from the temporary buffer area to the first buffer area 111 as a whole.

In this embodiment, the picture is compressed by the graphics server 12 to generate a compressed code stream and sent to the graphics client 11. The first receiving module 116 receives the compressed code stream through the data channel 13 and decompresses it, and then calls the bitblt() interface. The picture is pasted to the graphics area of the 3D application of graphics device interface 110, and the vertex array class instructions are redirected to TC end 15 via graphics device interface 110 to execute vertex array class instructions and generate a screen shot.

FIG. 10 is a schematic structural diagram of a GPU graphics server according to a first embodiment of the present invention. As shown in FIG. 10, based on the GPU virtualization implementation method of the first embodiment, the graphics client 12 includes a second buffer area 121, a second cache module 122, a second receiving module 123, and a rendering module 124.

In this embodiment, the second cache module 122 is configured to create a second buffer area 121 for vertex data caching, and the second buffer area 121 forms a mapping relationship with the first buffer area 111 of the graphics client 11 to form vertex data, and vertex data. Includes vertex array pointers and vertex array lengths. In this embodiment, the creation of the first buffer area 111 and the second buffer area 121 is finally performed according to the intercepted vertex array class instruction, which is a continuous process. The second receiving module 123 is configured to determine, according to the vertex array pointer, whether the second buffer area 121 is buffered with corresponding vertex data, and if so, Receiving the packed vertex array class instruction sent by the graphics client 11, and the rendering module 124 renders the image according to the vertex data of the second buffer area 121 and the packed vertex array class instruction for sending to the graphics client 11; if not, then The second receiving module 123 receives the decomposed vertex array class instruction sent by the graphics client 11, and the rendering module 124 renders the image according to the decomposed vertex array class instruction for sending to the graphics client 11.

Optionally, the second receiving module 123 further receives the synchronization instruction sent by the graphics client 11 through the data channel 13, wherein the synchronization instruction includes a vertex array pointer. The second cache module 122 creates a second buffer area 121 according to the synchronization instruction to perform vertex data buffering, and the second buffer area 121 forms a mapping relationship between the vertex data by the vertex array pointer and the first buffer area 111 of the graphics client 11, thereby ensuring the cache. The consistency of the vertex data. And when the vertex data exists in the local data, the cache optimization of the vertex data is performed, so that the vertex array class instruction does not need to be decomposed, so that even if there are still some vertex array class instructions to be decomposed, the direct use in the graphics server 12 can be solved. Pass-through vertex array class instructions can cause errors, so that even if there are still some vertex array class instructions to be decomposed, the total number of instructions to be transferred is greatly reduced, thus reducing the time required to transfer all instructions. The bandwidth consumption is reduced, so the bandwidth of the delay and the transmission channel can be greatly reduced, the CPU consumption of the memory sharing is reduced, the VM density is increased, and the cost is reduced.

In this embodiment, when the second buffer area 121 caches the vertex data corresponding to the vertex array pointer, the second receiving module 123 receives the vertex array class instruction sent by the graphics client 11 through the data channel 13, and instructs itself according to the vertex array class. The feature is unpacked accordingly, and the unwrapped vertex array class instruction is sent to the graphics card 14. When the second buffer area 121 does not cache the vertex data corresponding to the vertex array pointer, the second receiving module 123 receives the decomposed vertex array class instruction sent by the graphics client 11 and sends it to the graphics card 14. The graphics card 14 executes the vertex array class instructions and renders the image, saving it in the video memory. The rendered image may be, but not limited to, a three-dimensional image, or may be a two-dimensional image, and the image may be a combination of one or more images, or may be part of a complete image. The rendering module 124 copies the image into memory through screen capture. Since the picture is relatively large, the rendering module 124 compresses the picture, and then sends the compressed code stream to the graphics client 11 through the transmission channel 13 so that the graphics client 11 decompresses the compressed code stream and vertices through the graphics device interface 110. The array class instructions are redirected to the TC side 15 to execute the vertex array class instructions and generate a screen shot. 11 is a schematic structural diagram of an apparatus for buffering vertex data in a GPU according to a first embodiment of the present invention. 9 and FIG. 10, as shown in FIG. 11, the apparatus 100 for vertex data buffering includes: a first cache module 113, a first buffer area 111, a sending module 115, a second buffer area 121, and a second Cache module 122.

In this embodiment, the first cache module 113 is configured to create a first buffer area 111 for vertex data caching, wherein the vertex data includes a vertex array pointer and a vertex array length. The sending module 115 is configured to send a synchronization instruction to the graphics server 12, wherein the synchronization instruction includes a vertex array pointer. The second cache module 122 is configured to create a second buffer area 121 according to the synchronization instruction, and perform vertex data buffering. The second buffer area 121 forms a mapping relationship with the first buffer area 111 by the vertex array pointer. In the present embodiment, the creation of the first buffer area 111 and the second buffer area 121 is finally performed according to the intercepted vertex array class instruction, which is an ongoing process.

Further, the first cache module 113 uses the cache unit mode as a carrier to learn, predict, and correct the vertex array pointer and the vertex array length. The cache unit mode includes indicating the first address of the vertex array and the length of each byte, and drawing the geometric unit according to the offset of the first address. For the vertex array pointer learning, prediction and correction, the first cache module 113 is used to obtain the vertex array class instruction; the vertex array pointer is used for the hash lookup; whether the hit is determined, and if so, the current cache array pointer is set for drawing. The vertex pointer is used; if not, the vertex array pointer and related feature information is added to the Hashtable; the cached data pointer is transparently transmitted. For learning, predicting, and correcting the length of the vertex array, the first cache module 113 is configured to obtain a vertex instruction; determine whether the intercepted vertex data has been cached, and if yes, determine whether the intercepted vertex buffer data exists in the local data. If yes, pass the vertex pointer transparently. If not, decompose the vertex pointer, that is, use the vertex instruction of the passed value class, and save the vertex data in the Hashtable for the next cache optimization; if the vertex data is not To do the caching, determine whether the vertex array length needs to be updated. If necessary, update the vertex array length. If not, decompose the vertex pointer, that is, use the vertex instruction of the passed value class. The local data is the vertex data pre-existing in the graphics client 11, and the vertex data can be sent and used for the graphics server 12 without being decomposed. Therefore, if the intercepted vertex data does not exist in the local data, or the intercepted vertex data is not cached, the cache optimization cannot be performed, and only the vertex instruction can be decomposed, that is, the vertex instruction of the passed value class is used. If the intercepted vertex data exists in the local data, that is, if there is a vertex data in the local data that is consistent with the intercepted vertex data, then the cache optimization can be performed, thereby eliminating the need to decompose the vertex array. Class instructions can greatly reduce the bandwidth of the delay and transmission channels, reduce the CPU consumption of memory sharing, increase the VM density, and reduce the cost.

In this embodiment, when the vertex array length is updated, the first cache module 113 first creates a temporary buffer area, and instantly copies the newly added data into the temporary buffer area. When the previous cache unit mode is traversed, the temporary buffer area is used. The historical data has already been cached; the buffer of the previous mode is created, and the vertex data of the temporary buffer is transferred to the buffer of the previous mode as a whole before the next cache unit mode is traversed.

In this embodiment, the first cache area 111 is created by the first cache module 113, and the vertex data is buffered. The sending module 115 sends a synchronization instruction to the graphics server 12, and the second cache module 122 creates a second buffer area 121 according to the synchronization instruction. Performing vertex data caching; the second cache module 122 forms a mapping relationship between the vertex data by the vertex array pointer and the first buffer area 111, thereby ensuring the consistency of the cached vertex data, and when the intercepted vertex data exists in the local data, The cache optimization of the vertex data is performed, so that the vertex array class instruction does not need to be decomposed, so that even if there are still some vertex array class instructions to be decomposed, the total number of instructions to be transferred is greatly reduced, so that the graphics server 12 can be solved. The use of directly transparent vertex array class instructions can cause errors, which can greatly reduce the bandwidth of the delay and transmission channels, reduce the CPU consumption of memory sharing, increase the VM density, and reduce the cost.

FIG. 12 is a schematic structural diagram of a GPU graphics client according to a second embodiment of the present invention. As shown in FIG. 12, the GPU graphics client 20 includes a processor 201, a memory 202, a receiver 203, a bus 204, and a transmitter 205. The processor 201, the memory 202, the transmitter 205, and the receiver 203 are connected by a bus 204. Communicate with each other.

Specifically, the receiver 203 is configured to intercept vertex array instructions. The processor 201 is configured to create a first buffer area, the memory 202 buffers the vertex data, the transmitter 205 sends a synchronization instruction to the graphics server to create a second buffer area, and the second buffer area forms a mapping relationship with the first buffer area to form vertex data. . Vertex data is obtained from vertex array class instructions, including vertex array pointers and vertex array lengths. In this embodiment, the creation of the first buffer area and the second buffer area is ultimately performed according to the intercepted vertex array class instruction, which is a continuous process. The processor 201 is further configured to perform a query in the local data. If there is a vertex data in the local data that is consistent with the intercepted vertex data, the transmitter 205 packages and sends the vertex array class instruction to the graphics server, and the processor 201 according to the second The vertex data of the buffer area and the packed vertex array class instruction render the image, that is, the top The dot array class instruction performs cache optimization. If it does not exist, the vertex array class instruction is decomposed, that is, the vertex instruction of the pass value class is used, and the vertex data is saved in the Hashtable for the next cache optimization, and the transmitter 205 sends to The graphics server 12, the processor 201 renders the picture according to the decomposed vertex array class instruction. The rendered image can be, but is not limited to, a three-dimensional image, or a two-dimensional image, and the image can be a combination of one or more images or a part of a complete image. The local data is vertex data pre-existing in the graphics client, and the vertex data can be sent and used for the graphics server without being decomposed.

In this embodiment, the receiver 203 is further configured to receive a picture and paste it to a graphics device interface. The graphical device interface redirects vertex array class instructions to the TC side to execute vertex array class instructions and generate a screen. If the newly added vertex data is historical data, but the cached first buffer area has been released or its vertex array length needs to be updated to a larger value, the processor 201 also creates a temporary buffer area, which will add new vertex data. Copy to the temporary buffer and copy the vertex data from the temporary buffer to the first buffer.

In this embodiment, the transmitter 205 sends a synchronization command to the graphics server to create a second buffer. The synchronization instruction includes a vertex array pointer, and the second buffer area forms a mapping relationship with the first buffer area by the vertex array pointer, so that the vertex array class instruction can be cache-optimized, so that the vertex array class instruction does not need to be decomposed, Solving the problem of using the directly transparent pass-through vertex array class instruction in the graphics server will cause errors, so that even if some vertex array class instructions need to be decomposed, the total number of instructions to be transferred is greatly reduced, thereby reducing the transmission of all instructions. The required time also reduces the bandwidth occupation, thus ensuring the consistency of the cached vertex data, which can greatly reduce the bandwidth of the delay and the transmission channel, reduce the CPU consumption of the memory sharing, increase the VM density, and reduce the cost.

FIG. 13 is a schematic structural diagram of a GPU graphics server according to a second embodiment of the present invention. As shown in FIG. 13, the GPU graphics server 30 includes a processor 301, a memory 302, a receiver 303, and a bus 304. The processor 301, the memory 302, and the receiver 303 are connected by a bus 304 to communicate with each other.

Specifically, the processor 301 is configured to create a second buffer area. The memory 202 caches the vertex data, and the second buffer area forms a mapping relationship with the first buffer area of the graphics client to form vertex data. Vertex data includes vertex array pointers and vertex array lengths. In this embodiment, the creation of the first buffer area and the second buffer area is finally performed according to the intercepted vertex array class instruction, which is a The ongoing process. The processor 301 determines, according to the vertex array pointer, whether the second buffer area is buffered with corresponding vertex data, and if so, the receiver 303 receives the packed vertex array class instruction sent by the graphics client, and the processor 301 is configured according to the second buffer area. The vertex data and the packed vertex array class instructions render the image for transmission to the graphics client; if not, the receiver 303 receives the decomposed vertex array class instruction sent by the graphics client, and the processor 301 is based on the decomposed vertex The array class instruction renders the image for sending to the graphics client.

In this embodiment, the receiver 303 also receives the synchronization instruction sent by the graphics client through the data channel, wherein the synchronization instruction includes a vertex array pointer. The processor 301 creates a second buffer area according to the synchronization instruction to perform vertex data caching, and the second buffer area forms a mapping relationship between the vertex data pointer and the first buffer area of the graphics client to ensure vertex data consistency. And when the intercepted vertex data exists in the local data, the cache optimization of the vertex data is performed, so that the vertex array class instruction does not need to be decomposed, so that even if some vertex array class instructions still need to be decomposed, the total needs to be transmitted. The number of instructions is greatly reduced, so it can solve the problem that the vertex array class instruction used in the graphics server can directly generate errors, which can greatly reduce the bandwidth of the delay and the transmission channel, reduce the CPU consumption of the memory sharing, and improve the VM density. cut costs.

FIG. 14 is a schematic structural diagram of an implementation system of GPU virtualization according to a second embodiment of the present invention. As shown in FIG. 14, the GPU virtualization implementation system 40 of the second embodiment includes a graphics client 41, a graphics server 42, a data channel 43, a graphics card 44, and a TC terminal 45. The graphics client 41 includes a graphics device interface 410. The data channel 43 includes a vertex data buffer 431. The graphics client 41 is connected to the graphics server 42 via a data channel 43, the graphics card 44 is coupled to the graphics server 42, and the TC terminal 45 is coupled to the graphics device interface 410 of the graphics client 41.

In this embodiment, the data channel 43 is a shared memory, and the graphics client 41 and the graphics server 42 share the vertex data buffer 431 in the shared memory to implement vertex data buffering. Specifically, the TC terminal 45 sends a 3D instruction to the graphics device interface 410 of the graphics client 41 through mouse and keyboard redirection, and the graphics client 41 can intercept the 3D instruction through the Opengl ICD driver of the graphics device interface 410, and the 3D instruction includes a vertex. Array class instructions. The graphics client 41 performs vertex data caching in the vertex data buffer 431, and sends synchronization instructions to the graphics server 42 through the data channel 43. The graphics server 42 performs vertex data caching in the vertex data buffer 431 to ensure consistent cached vertex data. Sex. In this embodiment, the vertex data buffer 431 is created the most. It is executed according to the intercepted vertex array class instructions and is a continuous process. The graphics client 41 queries in the local data. If there is a vertex data in the local data that is consistent with the intercepted vertex data, the vertex array class instruction is packaged and sent to the graphics server 42 so that the graphics server 42 is based on the vertex data buffer. The vertice data of 431 and the packed vertex array class instruction render the image, that is, cache optimization of the vertex array class instruction, if not, decompose the vertex array class instruction, that is, use the vertex instruction of the value class, and the vertex is The data is saved in the Hashtable for next cache optimization and sent to the graphics server 42 to cause the graphics server 42 to render the image based on the decomposed vertex array class instructions. The local data is vertex data pre-existing in the graphics client, and the vertex data can be sent and used for the graphics server 42 without being decomposed. Specifically, a vertex data exists in the local data and is consistent with the intercepted vertex data. When the intercepted vertex data exists in the local data, the graphics client 41 packages the vertex array class instruction and sends the data to the graphics server 42 through the data channel 43. The server 42 unpacks the vertex array class instruction and sends it to the graphics card 44 to render the picture; when the intercepted vertex data does not exist in the local data, the graphics client 41 sends the decomposed vertex array class instruction to the graphics server through the data channel 43. 42. The graphics server 42 sends the graphics card 42 to the graphics card to render the picture. The rendered image can be, but is not limited to, a three-dimensional image, or a two-dimensional image, and the image can be a combination of one or more images or a part of a complete image. The graphics server 42 copies the picture into the memory through screen capture and sends it to the graphics client 41 via the data channel 43. The graphics client 41 receives the picture and pastes it to the graphics device interface 410. The graphics device interface 410 places the vertex array class instruction heavy. The TC end 45 is directed to execute vertex array class instructions and generate a screen shot. Among them, the vertex data is obtained from the vertex array class instruction, including the vertex array pointer and the vertex array length. In the present embodiment, the vertex data buffer is implemented by sharing the vertex data buffer 431 in the shared memory between the graphics client 41 and the graphics server 42, ensuring the consistency of the cached vertex data, and the intercepted vertex data exists in In the local data, the cache optimization of the vertex data is performed, so that the vertex array class instruction does not need to be decomposed, so that even if some vertex array class instructions need to be decomposed, the total number of instructions to be transmitted is greatly reduced, so Solving the problem of using the directly transparent pass-through vertex array class instruction in the graphics server 42 will cause an error, so that even if there are still some vertex array class instructions to be decomposed, the total number of instructions to be transmitted is greatly reduced, thereby reducing the transfer of all The time required for the instruction also reduces the bandwidth usage, thus greatly reducing the bandwidth of the delay and the transmission channel, reducing the CPU consumption of the memory sharing, and improving VM density, reducing cost; while reducing the use of cache memory, simplifies the complexity of maintaining the consistency of graphics client 41 and graphics server 42 cache.

In summary, the present invention intercepts vertex array class instructions through a graphics client; performs vertex data caching to create a first buffer area, sends synchronization instructions to a graphics server to create a second buffer area, a second buffer area and a first buffer area. Forming a mapping relationship of vertex data; performing a query in the local data. If a vertex data in the local data is consistent with the intercepted vertex data, the vertex array class instruction is packaged and sent to the graphics server, so that the graphics server is configured according to the second cache. The vertex data of the region and the packed vertex array class instruction render the image. If not, the vertex array class instruction is decomposed and sent to the graphics server, so that the graphics server renders the image according to the decomposed vertex array class instruction; the second cache After the mapping relationship between the region and the first buffer region forms vertex data, it is not necessary to decompose the vertex array class instruction, which can solve the problem that the vertex array class instruction used in the graphics server directly generates a fault, so that even if there is still a part Vertex array class instructions need to be decomposed, but the total need The number of instructions transmitted is greatly reduced, which can greatly reduce the bandwidth of the delay and the transmission channel, reduce the CPU consumption of memory sharing, increase the VM density, and reduce the cost.

The above is only the embodiment of the present invention, and is not intended to limit the scope of the invention, and the equivalent structure or equivalent process transformation of the present invention and the contents of the drawings may be directly or indirectly applied to other related technologies. The scope of the patent protection is included in the scope of patent protection of the present invention.

Claims

Rights request

A method for implementing GPU virtualization, the method comprising: intercepting a vertex array class instruction by a graphics client;

Performing vertex data caching to create a first buffer area, sending a synchronization instruction to the graphics server to create a second buffer area, the second buffer area and the first buffer area forming a mapping relationship of vertex data, the vertex data from the Obtained in the vertex array class instruction, including the vertex array pointer and the vertex array length;

Querying in the local data, if there is a vertex data in the local data that is consistent with the intercepted vertex data, the vertex array class instruction is packaged and sent to the graphics server, so that the graphics server is configured according to The vertex data of the second buffer area and the packed vertex array class instruction render a picture, if not, decompose the vertex array class instruction and send the instruction to the graphics server, so that the graphics server Rendering a picture according to the decomposed vertex array class instruction, wherein the local data is vertex data pre-existing in the graphics client, and the vertex data is sent and used for the graphics server without decomposition.

The method according to claim 1, wherein the method further comprises: receiving, by the graphics client, a picture sent by the graphics server through a data channel and attaching to a graphic device interface;

The vertex array class instructions are redirected to the TC side by the graphics device interface to execute the vertex array class instructions and generate a screen shot.

The method according to claim 1, wherein the performing vertex data buffering to create the first buffer area comprises: if the newly added vertex data is historical data, but the cached first buffer area is released Or its vertex array length needs to be updated to a larger value, then

Create a temporary cache area;

Copying the newly added vertex data into the temporary buffer area;

Copying the vertex data from the temporary buffer to the first buffer.

4. The method of claim 1, wherein the performing vertex data buffering to create a first buffer area, transmitting a synchronization instruction to a graphics server to establish a second buffer area, the second buffer area and the The mapping relationship between the first buffer area and the vertex data includes:

Performing the vertex data cache and creating the first buffer area;

Sending a synchronization instruction to the graphics server to create a second buffer area, the synchronization instruction packet The vertex array pointer is included, and the second buffer area forms a mapping relationship with the first buffer area by using the vertex array pointer to form vertex data.

5. The method of claim 1 wherein the first cache area is located in the graphics client.

6. The method of claim 1 wherein the first buffer area is located in shared memory.

A GPU virtualization implementation method, the method comprising: creating a second buffer area according to the received synchronization instruction to perform vertex data buffering, wherein the second buffer area forms a first buffer area with a graphics client. a mapping relationship of vertex data, the vertex data includes a vertex array pointer and a vertex array length;

Determining, according to the vertex array pointer, whether the second buffer area is buffered with corresponding vertex data, and if so, receiving the packed vertex array class instruction sent by the graphics client, and according to the second buffer area The vertex data and the packed vertex array class instruction renders a picture for transmission to the graphics client, and if not, receives the decomposed vertex array class instruction sent by the graphics client, and according to the The decomposed vertex array class instruction renders the image for transmission to the graphics client.

8. The method of claim 7, wherein the receiving a synchronization instruction and creating a second buffer for vertex data caching, the second buffer forming a vertex with a first buffer of a graphics client The mapping relationship of data includes:

Receiving a synchronization instruction sent by the graphics client, where the synchronization instruction includes a vertex array pointer;

Creating the second buffer area according to the synchronization instruction to perform vertex data caching, and the second buffer area forms the vertex data by using the vertex array pointer and the first buffer area of the graphics client. Mapping relations.

9. The method of claim 7, wherein the second cache area is located in the graphics server.

10. The method of claim 7, wherein the second buffer area is located in shared memory.

A method for buffering vertex data in GPU virtualization, the method comprising: Creating a first buffer area by a graphics client to perform vertex data caching, wherein the vertex data includes a vertex array pointer and a vertex array length;

Sending a synchronization instruction to the graphics server, wherein the synchronization instruction includes the vertex array pointer;

The second buffer area is created by the graphics server according to the synchronization instruction, and the vertex data is buffered. The second buffer area forms a mapping relationship with the first buffer area by using the vertex array pointer.

12. The method of claim 11, wherein the performing vertex data cache is learning, predicting, and correcting using a cache unit mode as a carrier, including vertex array pointers and learning, prediction, and correction of vertex array lengths.

13. The method of claim 12, wherein the cache unit mode comprises:

Indicate the first address of the vertex array and the length of each byte;

A geometry unit is drawn based on the offset of the first address.

14. The method of claim 12, wherein the learning, predicting, and correcting the vertex array pointers comprises:

Obtaining the vertex array class instruction;

Using the vertex array pointer as a Hash lookup;

Determine whether the hit, if yes, set to the current cache data pointer for drawing the vertex pointer; if not, add the vertex array pointer and related feature information to the Hashtable; transparently pass the cached data pointer.

15. The method of claim 12, wherein the learning, predicting, and correcting the length of the vertex array comprises:

Obtaining the vertex instruction of the drawing;

Determining whether the vertex data has been cached, and if so, determining whether the vertex buffer data exists in the local data, and if so, transparently transmitting the drawn vertex pointer; if not, decomposing the drawn vertex pointer; If the vertex data is not cached, it is determined whether the vertex array length needs to be updated, if necessary, the vertex array length is updated, and if not, the drawn vertex pointer is decomposed, wherein the local data is pre-existing The vertex data of the graphics client, the vertex data can be sent and used for the graphics server without decomposition.

16. A GPU graphics client, wherein the graphics client comprises an instruction acquisition module, a first cache module, a query module, and a sending module, wherein:

The instruction acquisition module is configured to intercept a vertex array class instruction;

The first cache module is configured to perform vertex data caching to create a first buffer area, send a synchronization instruction to a graphics server to create a second buffer area, and the second buffer area forms a mapping with vertex data of the first buffer area. a relationship, the vertex data is obtained from the vertex array class instruction, including a vertex array pointer and a vertex array length;

The query module is configured to perform a query on the local data. If a vertex data exists in the local data and the intercepted vertex data is consistent, the sending module packages and sends the vertex array class instruction to the a graphics server, such that the graphics server renders a picture according to the vertex data of the second buffer area and the packaged vertex array class instruction, and if not, the sending module decomposes the vertex array class instruction And sending to the graphics server, so that the graphics server renders a picture according to the decomposed vertex array class instruction, where the local data is pre-existing vertex data of the graphics client, and the vertex data does not need to be decomposed It can be sent and used for the graphics server.

17. The graphics client of claim 16, wherein the graphics client further comprises a first receiving module and a graphics device interface, wherein:

The first receiving module is configured to receive the picture through the data channel and paste the picture to the graphic device interface;

The graphics device interface redirects the vertex array class instructions to the TC terminal to execute the vertex array class instructions and generate a screen shot.

18. The graphics client according to claim 16, wherein the sending module further sends a synchronization instruction to the graphics server, the synchronization instruction includes the vertex array pointer, and the first buffer area passes through The mapping between the vertex array pointer and the second buffer area of the graphics server forms vertex data.

19. The graphics client according to claim 16, wherein if the newly added vertex data is historical data, but the cached first buffer area is released or its vertex array length needs to be updated to a larger value. The first cache module is further configured to:

Create a temporary cache area;

Copying the newly added vertex data into the temporary buffer area; Copying the vertex data from the temporary buffer to the first buffer.

20. A GPU graphics server, the graphics server comprising a second cache module, a second receiving module, and a rendering module, wherein:

The second cache module is configured to create a second buffer area according to the received synchronization instruction to perform vertex data caching, and the second buffer area and the first buffer area of the graphics client form a mapping relationship of vertex data, where the vertex data includes Vertex array pointer and vertex array length;

The second receiving module is configured to determine, according to the vertex array pointer, whether the second buffer area has cached corresponding vertex data, and if so, receive the packaged vertex array class instruction sent by the graphics client, and The rendering module renders a picture to the graphics client according to the vertex data of the second buffer area and the packed vertex array class instruction; if not, the second receiving module receives the The decomposed vertex array class instruction sent by the graphics client, and the rendering module renders the image according to the decomposed vertex array class instruction for sending to the graphics client.

21. The graphics server of claim 20, wherein the second receiving module further receives a synchronization instruction sent by the graphics client, wherein the synchronization instruction comprises a top array pointer;

The second cache module creates the second buffer area according to the synchronization instruction to perform vertex data buffering, and the second buffer area forms the vertex by using the vertex array pointer and the first buffer area of the graphics client. The mapping relationship of data.

22. An apparatus for vertex data caching in GPU virtualization, the apparatus comprising:

a first cache module, configured to create a first buffer area in the graphics client, and perform vertex data caching, where the vertex data includes a vertex array pointer and a vertex array length;

a sending module, configured to send a synchronization instruction to the graphics server, where the synchronization instruction includes the vertex array pointer;

a second cache module, configured to create a second buffer area according to the synchronization instruction by the graphics server to perform vertex data caching, and the second buffer area forms a mapping of vertex data with the first buffer area by using the vertex array pointer relationship.

The device according to claim 22, wherein the first cache module learns, predicts, and verifies the vertex array pointer and the vertex array length by using a cache unit mode as a carrier. Positive.

24. The apparatus of claim 23, wherein the buffer unit pattern comprises a first address indicating a vertex array and a length per byte; and a geometry unit is drawn based on an offset of the first address.

25. The apparatus of claim 23, wherein when the vertex array pointer is learned, predicted, and corrected, the first cache module is configured to:

Obtaining the vertex array class instruction;

Using the vertex array pointer as a Hash lookup;

26. The apparatus of claim 23, wherein the first cache module is for: learning, predicting, and correcting the length of the vertex array:

Obtaining the vertex instruction of the drawing;