CN117290052A - Universal graphic processor virtualization method and system - Google Patents

Universal graphic processor virtualization method and system Download PDF

Info

Publication number
CN117290052A
CN117290052A CN202311148059.0A CN202311148059A CN117290052A CN 117290052 A CN117290052 A CN 117290052A CN 202311148059 A CN202311148059 A CN 202311148059A CN 117290052 A CN117290052 A CN 117290052A
Authority
CN
China
Prior art keywords
gpu
graphic
driver
command
virtual machine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311148059.0A
Other languages
Chinese (zh)
Inventor
何德威
余学俊
付席席
王攀攀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
709th Research Institute of CSSC
Original Assignee
709th Research Institute of CSSC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 709th Research Institute of CSSC filed Critical 709th Research Institute of CSSC
Priority to CN202311148059.0A priority Critical patent/CN117290052A/en
Publication of CN117290052A publication Critical patent/CN117290052A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a general graphic processor virtualization method and a general graphic processor virtualization system, wherein the method is applied to a physical machine, the physical machine comprises a plurality of virtual machines, a GPU rear end driver, a graphic framework provided with a GPU driver and a GPU; each virtual machine includes a graphics application and a GPU front end driver; the method comprises the following steps: the graphics application of any virtual machine receives a graphics command sent by a CPU; the GPU front end driver of the virtual machine captures a graphic command of the graphic application and writes the graphic command into a ring queue of the shared memory; the GPU back-end driver reads the graphic commands in the annular queue in a zero copy mode, and invokes the graphic framework to operate the GPU driver so that the GPU executes the graphic commands. The invention realizes the GPU virtualization function without depending on hardware virtualization support, has strong adaptability and universality, and adopts a zero-copy mode to forward commands in a driving layer, thereby having high virtualization efficiency compared with a common command forwarding mode.

Description

Universal graphic processor virtualization method and system
Technical Field
The invention belongs to the technical field of virtualization, and particularly relates to a general graphics processor virtualization method and system.
Background
The virtualization technology in the computer field generally refers to a resource management scheme for multiplexing physical machine resources to a virtual machine through segmentation by a software and hardware technology. Through abstracting and simulating various entity resources (CPU, graphic processor, disk space, memory, etc.) of the computer, the entity resources are segmented or combined into one or more computer configuration environments (mainly referred to as virtual machines), so that the unclassivable barrier among entity structures is broken, and the purpose that multiple users (generally referred to as multiple virtual machines) can efficiently and conveniently share and use computer hardware resources is achieved.
Cloud computing applications and environments, represented by virtualization technology, are currently rapidly developing, and heterogeneous computing systems, represented by cpu+gpu (graphics processing units, graphics processor), are becoming key to the development of virtualization technology. CPU virtualization is mature in the current technology. The GPU virtualization mode starts later, and the GPU virtualization scheme adopted at present mainly uses the GPU slicing scheme supported by application layer forwarding or hardware virtualization, so that the problems of low forwarding command efficiency, dependence on a specific hardware virtualization module and the like exist.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a general graphics processor virtualization method and system, and aims to solve the problems that the existing GPU virtualization scheme has low command forwarding efficiency and depends on a specific hardware virtualization module.
In order to achieve the above object, in a first aspect, the present invention provides a general purpose graphics processor virtualization method, where the method is applied to a physical machine, the physical machine includes a plurality of virtual machines, a GPU back end driver, a graphics frame provided with a GPU driver, and a GPU; each virtual machine includes a graphics application and a GPU front end driver;
the method comprises the following steps:
step S101, a graphic application of any virtual machine receives a graphic command sent by a CPU;
step S102, the GPU front end driver of any virtual machine captures a graphic command of a graphic application and writes the graphic command into a ring queue of a shared memory; the shared memory is applied by the GPU front end driver in the initializing process;
step S103, the GPU back-end driver reads the graphic commands in the annular queue of the shared memory in a zero copy mode, and invokes the graphic framework to operate the GPU driver so that the GPU executes the graphic commands.
In an optional example, the GPU front-end driver writes the graphics command to the ring queue of the shared memory in step S102, which specifically includes:
the GPU front end driver writes the graphic command into a space corresponding to the front end zone bit in the annular queue, and after the writing is successful, the added front end zone bit is used as the updated front end zone bit;
correspondingly, in step S103, the GPU back-end driver reads the graphics command in the ring queue of the shared memory in a zero copy manner, and specifically includes:
the GPU rear end driver reads the graphic command from the space corresponding to the rear end zone bit in the annular queue, and takes the added rear end zone bit as the updated rear end zone bit after the graphic command is successfully read; the initial value of the rear end zone bit is the same as that of the front end zone bit.
In an alternative example, specifically stored in the ring queue is a pointer to the graphics command.
In an alternative example, the GPU front-end driver of any virtual machine includes a dispatch controller, and the GPU back-end driver includes a dispatch monitor;
the method further comprises the steps of:
the scheduling monitor calculates a frame period required for the graphic command based on the graphic frame rate;
the scheduling monitor transmits the frame period to a scheduling controller;
the scheduling controller controls the time of the graphic command to be written into the annular queue based on the frame period so as to realize the time of distributing the GPU used by any virtual machine.
In a second aspect, the present invention provides a general purpose graphics processor virtualization system, including a CPU and a physical machine, where the physical machine includes a plurality of virtual machines, a GPU back end driver, a graphics frame provided with a GPU driver, and a GPU; each virtual machine includes a graphics application and a GPU front end driver;
the graphics application of any virtual machine is used for receiving the graphics command sent by the CPU;
the GPU front end driver of any virtual machine is used for capturing graphic commands of the graphic application and writing the graphic commands into a ring-shaped queue of the shared memory; the shared memory is applied by the GPU front end driver in the initializing process;
and the GPU back-end driver is used for reading the graphic command in the annular queue of the shared memory in a zero copy mode and calling the graphic framework to operate the GPU driver so as to enable the GPU to execute the graphic command.
In an optional example, the GPU front end driver of any virtual machine is specifically configured to write a graphics command into a space corresponding to a front end flag bit in the ring queue, and after writing is successful, use the added front end flag bit as an updated front end flag bit;
correspondingly, the GPU rear end driver is specifically used for the GPU rear end driver to read the graphic command from the space corresponding to the rear end zone bit in the annular queue, and after the graphic command is successfully read, the added rear end zone bit is used as the updated rear end zone bit; the initial value of the rear end zone bit is the same as that of the front end zone bit.
In an alternative example, specifically stored in the ring queue is a pointer to the graphics command.
In an alternative example, the GPU front-end driver of any virtual machine includes a dispatch controller, and the GPU back-end driver includes a dispatch monitor;
the scheduling monitor is used for calculating the frame period required by the graphic command based on the graphic frame rate;
the scheduling monitor is used for sending the frame period to the scheduling controller;
the scheduling controller is used for controlling the time of writing the graphic command into the annular queue based on the frame period so as to realize the time of distributing any virtual machine to use the GPU.
In a third aspect, the present invention provides an electronic device comprising: at least one memory for storing a program; at least one processor for executing a memory-stored program, which when executed is adapted to carry out the method described in the first aspect or any one of the possible implementations of the first aspect.
In a fourth aspect, the present invention provides a computer readable storage medium storing a computer program which, when run on a processor, causes the processor to perform the method described in the first aspect or any one of the possible implementations of the first aspect.
In general, the above technical solutions conceived by the present invention have the following beneficial effects compared with the prior art:
the invention provides a general graphic processor virtualization method and a general graphic processor virtualization system, wherein a GPU front end driver is newly added in a virtual machine, a GPU rear end driver is newly added in a physical machine, the GPU front end driver captures graphic commands of graphic applications in the virtual machine and writes the commands into a shared memory, and the GPU rear end driver directly reads the commands from the shared memory in a zero copy mode by utilizing the shared memory and then invokes a graphic framework to operate the GPU driver so as to finally realize the execution of the graphic commands on the GPU, thereby realizing the GPU virtualization function without modifying the existing application program under the condition of not depending on hardware virtualization support, having strong adaptability and universality, and the virtualization mode adopts a zero copy mode for forwarding the commands at a driving layer.
Drawings
FIG. 1 is a flow chart of a general purpose graphics processor virtualization method provided by an embodiment of the present invention;
FIG. 2 is a general framework diagram of a general purpose graphics processor virtualization method provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a ring shared memory for zero copy according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the operation of a GPU scheduler according to an embodiment of the present invention;
FIG. 5 is a block diagram of a general purpose graphics processor virtualization system provided by an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
GPU virtualization approaches are mainly divided into four categories: equipment simulation, equipment direct connection, application layer forwarding and GPU slicing scheme supported by hardware virtualization. The four modes have the following advantages and disadvantages:
1. the device simulation is realized mainly by cpu simulation GPU interface semantics at present, and has the advantages of simple design principle, too large performance loss, too low virtual efficiency, and very large difference of GPU specifications of various manufacturers, thus resulting in poor generality.
2. The equipment direct connection technology mainly provides the physical GPU directly for the virtual machine to be used exclusively, and has the advantages of high efficiency and good performance; however, the multiple virtual machines cannot use the GPUs at the same time, and even the physical machines cannot control the GPUs after monopolization, so that the resources cannot be monitored.
3. And the application layer forwarding technology captures the GPU programming interface at the application interface layer and then transmits the GPU programming interface to a physical machine or a remote end for execution, so that GPU virtualization is realized. The method has the advantages that the applicability of different GPUs is strong, but the command forwarding efficiency is low and the performance is poor.
4. The GPU slicing technology supported by hardware virtualization realizes GPU virtualization in physical sense by utilizing technologies such as hardware SR-IOV and the like, has low virtual loss and has efficiency close to a device straight-through scheme; however, the scheme depends on hardware virtualization support provided by manufacturers, and in the case of no hardware support, virtualization cannot be realized.
In summary, the present invention provides a general purpose graphics processor virtualization method, which aims to design a zero copy forwarding graphics command mechanism to implement domestic general purpose GPU virtualization without depending on GPU hardware virtualization support, and design a GPU time sharing scheduling technique to implement fair use of GPUs by multiple virtual machines, thereby providing a solution for implementing domestic cloud environment for general purpose GPU virtualization.
The general graphic processor virtualization method provided by the invention is applied to a physical machine, wherein the physical machine comprises a plurality of virtual machines, a GPU rear end driver, a graphic frame provided with a GPU driver and a GPU; each virtual machine includes a graphics application and a GPU front end driver;
FIG. 1 is a flowchart of a general purpose graphics processor virtualization method according to an embodiment of the present invention, as shown in FIG. 1, the method includes the following steps:
step S101, a graphic application of any virtual machine receives a graphic command sent by a CPU;
step S102, the GPU front end driver of the virtual machine captures a graphic command of the graphic application and writes the graphic command into a ring queue of the shared memory; the shared memory is applied by the GPU front end driver in the initializing process;
step S103, the GPU back-end driver reads the graphic commands in the annular queue of the shared memory in a zero copy mode, and invokes the graphic framework to operate the GPU driver so that the GPU executes the graphic commands.
The GPU driver in step S103 is the original manufacturer GPU driver. The GPU back-end driver can call the general standard graphic framework of the operating system to execute graphic commands under the condition that the GPU driver of a manufacturer is not required to be modified, so that the purpose of GPU virtualization is realized. According to the invention, hardware virtualization support is not needed, on the basis of a general GPU, only the corresponding GPU driver is needed to support an operating system general standard graphic framework, the original GPU driver is not needed to be modified, the GPU is not required to be provided with a hardware virtualization module, and the GPU virtualization function is realized by adding the GPU front-end driver and the GPU rear-end driver.
The GPU front end driver is positioned in the kernel of the virtual machine and is mainly used for capturing graphic commands of graphic applications in the virtual machine; and then writing a ring shared memory which can be accessed by both the virtual machine and the physical machine, and directly operating the physical kernel driver to execute the graphic command after the GPU back end driver (positioned in the physical machine kernel) directly reads the command from the shared memory in a zero copy mode by utilizing the shared memory, so that the loss of memory copy is avoided, the virtual machine exit times caused by peripheral operation are reduced, and the GPU virtualization efficiency is improved.
According to the method provided by the embodiment of the invention, the GPU front end driver is newly added in the virtual machine, the GPU rear end driver is newly added in the physical machine, the GPU front end driver captures the graphic command of the graphic application in the virtual machine and writes the graphic command into the shared memory, the GPU rear end driver directly reads the command from the shared memory in a zero copy mode by utilizing the shared memory and then invokes the graphic framework to operate the GPU driver so as to finally realize the execution of the graphic command on the GPU, thereby realizing the GPU virtualization function without depending on hardware virtualization support, needing no modification of the existing application program, having strong suitability and universality, and the virtualization mode adopts a zero copy mode forwarding command at a driving layer, so that compared with the common command forwarding mode, the virtualization efficiency is high.
Based on the above embodiment, in step S102, the GPU front-end driver writes the graphics command into the ring queue of the shared memory, and specifically includes:
the GPU front end driver writes the graphic command into a space corresponding to the front end zone bit in the annular queue, and after the writing is successful, the added front end zone bit is used as the updated front end zone bit;
correspondingly, in step S103, the GPU back-end driver reads the graphics command in the ring queue of the shared memory in a zero copy manner, and specifically includes:
the GPU rear end driver reads the graphic command from the space corresponding to the rear end zone bit in the annular queue, and takes the added rear end zone bit as the updated rear end zone bit after the graphic command is successfully read; the initial value of the rear end zone bit is the same as that of the front end zone bit.
It can be understood that the shared memory adopts a first-in first-out annular queue, and the front and rear end driving read-write annular shared memory is realized according to the front and rear end flag bits; the front end flag bit front flag bit and the rear end flag bit back flag bit are used for front and rear end drive maintenance of the annular shared memory, the front and rear end flag bits are all increased from 0, the value of the front and rear end flag bit added to n is 0, and n is the length of the annular queue. The front flag can only be driven to operate by the front end of the GPU, and the back flag can only be driven to operate by the rear end of the GPU, so that misoperation caused by simultaneous operation is avoided.
It should be noted that, because of randomness of the user command, there is a case that the front flag is added with 1 multiple times (capturing multiple times of graphics operation commands) in the front end driver, but the back end driver is not as fast as responding to or executing the winding command to be blocked, and the back flag is added with 1 times less than the front end capturing times (namely, the front flag > back flag), but as long as the difference value of the front end driver and the back end flag does not exceed the annular queue length (namely, the front flag-back flag < n), the program operation is not affected, so that the normal operation of the program can be ensured by increasing the annular queue length n according to the actual operation condition of the program, and the loss of the command is avoided.
The existing ring queue construction mode is generally a mode based on general virtio, wherein the front end and the rear end respectively construct ring queues, the front end only writes the front end queue (reads the rear end queue), the rear end only writes the rear end queue (reads the front end queue), and therefore no lock is achieved, but two queues are required to be constructed, and the two queues occupy a large amount of caches when the cpu runs. In the invention, only one queue is designed to contain front and rear end data, lock-free access is realized by setting the flag bits at the front and rear ends, the construction modes are different, the occupied cache space is reduced, and the cache hit rate is improved.
Based on any of the above embodiments, specifically stored in the ring queue is a pointer to the graphics command.
According to the method provided by the embodiment of the invention, pointers with the storage content mainly being virtual machine data are stored in the annular queue of the shared memory, and the data are not directly stored, so that the space is saved.
Based on any of the above embodiments, the GPU virtualization scheme actually adopted at present is considered to be mainly based on the GPU slicing scheme supported by application layer forwarding or hardware virtualization, and the problem of interference when multiple virtual machines are used simultaneously is solved.
In contrast, in the embodiment of the invention, the front-end driver of the GPU of the virtual machine comprises a scheduling controller, and the rear-end driver of the GPU comprises a scheduling monitor;
the method further comprises the steps of:
the scheduling monitor calculates a frame period required for the graphic command based on the graphic frame rate;
the scheduling monitor sends the frame period to the scheduling controller;
the scheduling controller controls the time at which the graphics commands are written into the ring queue based on the frame period to achieve the time at which the virtual machine uses the GPU.
Here, the graphics frame rate is specifically set according to the actual application scene, and may be, for example, a standard frame rate of graphics such as movies and animations.
The scheduling controller uses the obtained frame period T as a target parameter, adopts a negative feedback thought, and changes the time of writing the graphic command into the annular queue, thereby dynamically distributing the time of using the GPU by the virtual machine. For example, the time for rendering one frame, i.e. the frame period, is fixed, and is 50ms, the cpu controls the time for sending the command to the GPU back end through the scheduling controller, i.e. the time for controlling the command to write into the ring queue, and after sending, the cpu leaves the GPU to render one frame in the remaining time, otherwise, the frame rate requirement cannot be met.
It should be noted that, the design idea of the existing fair scheduling is usually realized by controlling the frame rate, but the fair scheduling is realized by controlling the distributed GPU running time in the invention, so that the method is more direct in logic, less in uplink link is realized, the development and deployment costs can be reduced, and the efficiency is higher.
According to the method provided by the embodiment of the invention, the GPU time sharing scheduling technology is designed, so that the time length of each virtual machine using the GPU can be controlled under the scene of multiple virtual machines, the condition that the single virtual machine occupies too much GPU is avoided, and the fairness of sharing the GPU by the multiple virtual machines is ensured.
Based on any of the above embodiments, step S103 further includes:
after the GPU executes the graphic command, returning the obtained return value to the GPU driver in the graphic framework, and returning the return value to the GPU rear end driver by the graphic framework;
the GPU back-end driver writes the return value into the annular queue;
the GPU front-end driver reads the return value in the annular queue in a zero-copy mode, and finally returns the return value to the graphics application.
Here, the graphic framework may be a DRM framework.
Based on any embodiment, the invention provides a zero-copy and scheduling technology for general graphics processor virtualization, and aims to design a zero-copy forwarding graphics command mechanism to realize domestic general GPU virtualization without depending on GPU hardware virtualization support, and design a GPU time sharing scheduling technology to realize the function of fairly sharing and using a physical machine GPU by multiple virtual machines, thereby providing a solution for realizing the general GPU virtualization in domestic cloud environment. Aiming at the implementation details of the shared annular queue, the front end and the rear end of the shared annular queue have new designs for fairly scheduling the GPU.
Fig. 2 is a general frame diagram of a general purpose graphics processor virtualization method according to an embodiment of the present invention, where in fig. 2, a GPU foreground driver is a GPU front end driver, a GPU background driver is a GPU back end driver, and a GPU scheduler (divided into a scheduling controller and a scheduling monitor) is a main component of the present invention, and the rest is a virtualized environment, which is composed of a virtual machine and a physical machine own system.
The scheme shown in fig. 2 includes a GPU foreground driver and a background driver, which are used for implementing GPU virtualization, where the GPU foreground driver is located in a kernel of a virtual machine and is mainly used for capturing graphics commands of graphics applications in the virtual machine through graphics libraries such as OpenGPL, and then writing the graphics commands into a ring-shaped shared memory accessible to both a virtual machine and a physical machine, and the GPU background driver (located in the kernel of the physical machine) uses the shared memory to directly read the commands from the shared memory in a zero copy form, and then directly operates the physical kernel graphics DRM driver to execute the graphics commands, so as to avoid the loss of memory copy.
Fig. 3 is a schematic diagram of a ring shared memory for zero copy, as shown in fig. 3, when a foreground driver captures a graphics command, a corresponding data pointer is first filled in a space pointed by a foreground flag (recording an end position of a command sent by the foreground driver) in the shared memory, and after writing is successful, the foreground flag is added with 1, which represents that the end position of the command sent by the foreground driver is increased; and then the background driver reads data from a space pointed by a background flag (recording the end position of the read command of the background driver) in the shared memory, and is used for executing graphic operation, and after the reading is successful, the background flag is added with 1 to represent that the end position of the read command of the background driver is increased. Therefore, zero copy forwarding of graphic commands is realized between the front and back drivers of the GPU, loss of copy data is avoided, the exit times of the virtual machine caused by peripheral operation are reduced, and the virtualization efficiency of the GPU is improved.
Meanwhile, the GPU scheduler (divided into a scheduling controller and a scheduling monitor) shown in fig. 2 is mainly used for realizing the requirement of fairly sharing the GPU in the multi-virtual machine scene in the foreground and background driving. The working principle of the GPU scheduler provided by the embodiment of the present invention is shown in fig. 4, and the scheduling monitor is mainly responsible for performing real-time calculation on the physical GPU resource allocation resources to obtain the frame period T required by the graphics command, where the calculation formula is as follows:
T=1/fps
wherein fps is a standard frame rate of graphics (the number of times of rendering pictures per second is defined by a user), then the scheduling monitor sends the calculated frame period T to a scheduling controller in the virtual machine, and the controller uses the obtained frame period T as a target parameter, adopts a negative feedback idea, and according to the following formula,
t gpu =T-t cpu
changing the time t at which cpu controls the command to be sent cpu Dynamically allocating time t for virtual machine to use GPU gpu The situation that a single virtual machine occupies too much GPU is avoided, and fairness of sharing the GPU by multiple virtual machines is guaranteed.
The specific implementation method comprises the following steps of:
1. the virtual machine kernel needs to add a GPU foreground driver, as shown in fig. 2, which includes normal device registration, discovery and graphics operation interfaces; the foreground driver will replace the normal GPU driver in the virtual machine, capturing the graphics commands of the virtual machine application. But the foreground driver does not process the foreground driver and writes the received command into the ring-shaped shared memory in a zero copy mode; the foreground driver is also responsible for initializing the ring shared memory required by zero copy and calculating control sending commandTime t of (2) cpu
2. As shown in fig. 3, a shared memory is applied during the initialization of the GPU foreground driver, and both the physical machine and the virtual machine access the memory in the form of a ring queue based on a sharing mechanism (the mechanism is provided by the operating system); the memory can not directly store data, but store a data pointer, and a large amount of memory space can be accessed only by applying for a smaller memory by utilizing the characteristic of the pointer. For the maintenance of the shared memory, two flag bits are mainly utilized: front flag and back flag. The front flag is used for recording the end position of a command sent by a foreground driver in the ring queue, when the foreground driver captures a graphics command, the front flag needs to be written into the ring shared memory according to the front flag value, and after the writing is successful, the front flag needs to be subjected to 1-adding operation (the value added to n is 0). The back flag records the end position of the read command of the background driver, and the command is read from the annular shared memory according to the back flag value in each background driver operation; backFlag also needs to be added 1 after success (value returns to 0 after n is added). The front and back drive read-write ring shared memory location depends on the front and back flag bits: frontFlag, backFlag.
3. The physical machine kernel needs to add a GPU background driver, as shown in fig. 2, for directly reading the graphics command from the shared memory in a zero copy form, and then invoking a physical machine graphics driver function in the physical machine kernel to operate the GPU driver of the actual manufacturer, so as to execute the user graphics command. The driver can be operated without modification to the original manufacturer.
Based on any of the above embodiments, the present invention provides a general purpose graphics processor virtualization system. FIG. 5 is a schematic diagram of a general purpose graphics processor virtualization system according to an embodiment of the present invention, where, as shown in FIG. 5, the system includes a CPU510 and a physical machine 520, the physical machine 520 includes a plurality of virtual machines 521, a GPU back-end driver 524, a graphics framework 525 provided with a GPU driver 526, and a GPU527; each virtual machine 521 includes a graphics application 522 and a GPU front-end driver 523;
the graphics application 522 of any virtual machine 521 is configured to receive a graphics command sent by the CPU 510;
the GPU front end driver 523 of any virtual machine 521 is configured to capture a graphics command of the graphics application 522, and write the graphics command into a ring queue of the shared memory; the shared memory is applied by the GPU front-end driver 523 in the initialization process;
the GPU backend driver 524 is configured to read the graphics commands in the ring queue of the shared memory in a zero copy manner, and call the graphics framework 525 to operate the GPU driver 526, so that the GPU527 executes the graphics commands.
It can be understood that the detailed functional implementation of each module may be referred to the description in the foregoing method embodiment, and will not be repeated herein.
In addition, an embodiment of the present invention provides another general-purpose graphics processor virtualization apparatus, including: a memory and a processor;
the memory is used for storing a computer program;
the processor is configured to implement the method in the above-described embodiments when executing the computer program.
Furthermore, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method in the above embodiments.
Based on the method in the above embodiments, an embodiment of the present invention provides a computer program product, which when run on a processor causes the processor to perform the method in the above embodiments.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. The method is applied to a physical machine, wherein the physical machine comprises a plurality of virtual machines, a GPU back end driver, a graphic framework provided with a GPU driver and a GPU; each virtual machine includes a graphics application and a GPU front end driver;
the method comprises the following steps:
step S101, a graphic application of any virtual machine receives a graphic command sent by a CPU;
step S102, the GPU front end driver of any virtual machine captures a graphic command of a graphic application and writes the graphic command into a ring queue of a shared memory; the shared memory is applied by the GPU front end driver in the initializing process;
step S103, the GPU back-end driver reads the graphic commands in the annular queue of the shared memory in a zero copy mode, and invokes the graphic framework to operate the GPU driver so that the GPU executes the graphic commands.
2. The method according to claim 1, wherein the GPU front-end driver writes the graphics command to the ring queue of the shared memory in step S102, specifically comprising:
the GPU front end driver writes the graphic command into a space corresponding to the front end zone bit in the annular queue, and after the writing is successful, the added front end zone bit is used as the updated front end zone bit;
correspondingly, in step S103, the GPU back-end driver reads the graphics command in the ring queue of the shared memory in a zero copy manner, and specifically includes:
the GPU rear end driver reads the graphic command from the space corresponding to the rear end zone bit in the annular queue, and takes the added rear end zone bit as the updated rear end zone bit after the graphic command is successfully read; the initial value of the rear end zone bit is the same as that of the front end zone bit.
3. The method of claim 2, wherein specifically stored in the ring queue is a pointer to the graphics command.
4. The method of claim 1, wherein the GPU front-end driver of any virtual machine comprises a dispatch controller and the GPU back-end driver comprises a dispatch monitor;
the method further comprises the steps of:
the scheduling monitor calculates a frame period required for the graphic command based on the graphic frame rate;
the scheduling monitor transmits the frame period to a scheduling controller;
the scheduling controller controls the time of the graphic command to be written into the annular queue based on the frame period so as to realize the time of distributing the GPU used by any virtual machine.
5. The general graphic processor virtualization system is characterized by comprising a CPU and a physical machine, wherein the physical machine comprises a plurality of virtual machines, a GPU rear end driver and a graphic framework provided with a GPU driver and a GPU; each virtual machine includes a graphics application and a GPU front end driver;
the graphics application of any virtual machine is used for receiving the graphics command sent by the CPU;
the GPU front end driver of any virtual machine is used for capturing graphic commands of the graphic application and writing the graphic commands into a ring-shaped queue of the shared memory; the shared memory is applied by the GPU front end driver in the initializing process;
and the GPU back-end driver is used for reading the graphic command in the annular queue of the shared memory in a zero copy mode and calling the graphic framework to operate the GPU driver so as to enable the GPU to execute the graphic command.
6. The system of claim 5, wherein the GPU front end driver of any virtual machine is specifically configured to write a graphics command into a space corresponding to a front end flag bit in the ring queue, and after writing is successful, use the added front end flag bit as an updated front end flag bit;
correspondingly, the GPU rear end driver is specifically used for the GPU rear end driver to read the graphic command from the space corresponding to the rear end zone bit in the annular queue, and after the graphic command is successfully read, the added rear end zone bit is used as the updated rear end zone bit; the initial value of the rear end zone bit is the same as that of the front end zone bit.
7. The system of claim 6, wherein specifically stored in the ring queue is a pointer to the graphics command.
8. The system of claim 5, wherein the GPU front-end driver of any virtual machine comprises a dispatch controller, and the GPU back-end driver comprises a dispatch monitor;
the scheduling monitor is used for calculating the frame period required by the graphic command based on the graphic frame rate;
the scheduling monitor is used for sending the frame period to the scheduling controller;
the scheduling controller is used for controlling the time of writing the graphic command into the annular queue based on the frame period so as to realize the time of distributing any virtual machine to use the GPU.
9. An electronic device, comprising:
at least one memory for storing a program;
at least one processor for executing the memory-stored program, which processor is adapted to perform the method according to any of claims 1-4, when the memory-stored program is executed.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when run on a processor, causes the processor to perform the method according to any of claims 1-4.
CN202311148059.0A 2023-09-06 2023-09-06 Universal graphic processor virtualization method and system Pending CN117290052A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311148059.0A CN117290052A (en) 2023-09-06 2023-09-06 Universal graphic processor virtualization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311148059.0A CN117290052A (en) 2023-09-06 2023-09-06 Universal graphic processor virtualization method and system

Publications (1)

Publication Number Publication Date
CN117290052A true CN117290052A (en) 2023-12-26

Family

ID=89250879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311148059.0A Pending CN117290052A (en) 2023-09-06 2023-09-06 Universal graphic processor virtualization method and system

Country Status (1)

Country Link
CN (1) CN117290052A (en)

Similar Documents

Publication Publication Date Title
US6046752A (en) Peer-to-peer parallel processing graphics accelerator
US11093297B2 (en) Workload optimization system
WO2018119952A1 (en) Device virtualization method, apparatus, system, and electronic device, and computer program product
CN110609730B (en) Method and equipment for realizing interrupt transparent transmission between virtual processors
CN105122210B (en) GPU virtualization implementation method and related device and system
EP1734444A2 (en) Exchanging data between a guest operating system and a control operating system via memory mapped I/O
EP3796168A1 (en) Information processing apparatus, information processing method, and virtual machine connection management program
CN104583979A (en) Techniques for dynamic physical memory partitioning
JP2005122640A (en) Server system and method for sharing i/o slot
JP2009512921A (en) Hardware processing of commands within a virtual client computing environment
JP2000330806A (en) Computer system
JP2022516486A (en) Resource management methods and equipment, electronic devices, and recording media
WO2019028682A1 (en) Multi-system shared memory management method and device
US20070239890A1 (en) Method, system and program storage device for preventing a real-time application from running out of free threads when the real-time application receives a device interface request
CN111338769A (en) Data processing method and device and computer readable storage medium
KR102001641B1 (en) Method and apparatus for managing gpu resource in virtualization environment
US20200201691A1 (en) Enhanced message control banks
CN116578416A (en) Signal-level simulation acceleration method based on GPU virtualization
CN117290052A (en) Universal graphic processor virtualization method and system
CN113268356B (en) LINUX system-based multi-GPU board card bounding system, method and medium
US11061571B1 (en) Techniques for efficiently organizing and accessing compressible data
CN112711442A (en) Host command writing method, device and system and readable storage medium
CN116601616A (en) Data processing device, method and related equipment
CN114510324B (en) Disk management method and system for KVM virtual machine with ceph volume mounted thereon
CN114662162B (en) Multi-algorithm-core high-performance SR-IOV encryption and decryption system and method for realizing dynamic VF distribution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination