CN115375530A - Multi-GPU collaborative rendering method, system, device and storage medium - Google Patents

Multi-GPU collaborative rendering method, system, device and storage medium Download PDF

Info

Publication number
CN115375530A
CN115375530A CN202210819159.0A CN202210819159A CN115375530A CN 115375530 A CN115375530 A CN 115375530A CN 202210819159 A CN202210819159 A CN 202210819159A CN 115375530 A CN115375530 A CN 115375530A
Authority
CN
China
Prior art keywords
rendering
gpu
frame
time
queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210819159.0A
Other languages
Chinese (zh)
Inventor
黄巍
王必成
聂凯旋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Songying Technology Co ltd
Original Assignee
Beijing Songying Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Songying Technology Co ltd filed Critical Beijing Songying Technology Co ltd
Priority to CN202210819159.0A priority Critical patent/CN115375530A/en
Publication of CN115375530A publication Critical patent/CN115375530A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Abstract

The invention discloses a multi-GPU collaborative rendering method, a multi-GPU collaborative rendering system, a multi-GPU collaborative rendering device and a storage medium. The invention can be widely applied to the technical field of computers.

Description

Multi-GPU collaborative rendering method, system, device and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, a system, an apparatus, and a storage medium for collaborative rendering with multiple GPUs.
Background
Video streaming based real-time rendering techniques are commonly applied in cloud game scenes. In a cloud game scene, data storage, calculation and rendering are completed by a cloud end, the cloud end stores an output image to a video memory after rendering is completed, then the image is read back to the memory and is encoded to generate a video code stream, a real-time game picture is streamed to a terminal to be displayed, and finally the game picture is displayed to a user.
On the one hand, compared with the cloud game field, the fields of digital twins, industrial simulation and the like have the characteristics of large scene scale (large model data volume), high resolution requirement and the like, so that when the real-time rendering technology is applied to the fields of digital twins, industrial simulation and the like, the calculation pressure of GPU rendering is high, the frame rate cannot be continuously increased after the load of the GPU reaches the upper limit, and the resource expansion (scale-out) capability of the cloud environment cannot be fully utilized; on the other hand, because the rendering and encoding processes of the existing real-time rendering technology depend on the same GPU to be executed in series, the rendering and encoding efficiency of the video frame is low, and the optimization is difficult to be carried out in a parallel mode.
Disclosure of Invention
The present invention aims to solve at least to some extent one of the technical problems existing in the prior art.
Therefore, an object of the embodiments of the present invention is to provide a method, a system, an apparatus, and a storage medium for multi-GPU collaborative rendering, which reduce the computational pressure of GPU rendering through multi-GPU collaborative rendering, and improve the rendering and encoding efficiency of video frames.
In order to achieve the technical purpose, the technical scheme adopted by the embodiment of the invention comprises the following steps:
in one aspect, an embodiment of the present invention provides a multi-GPU collaborative rendering method, including the following steps:
in response to receiving the rendering task, generating a first rendering job for a frame;
confirming that an idle queue is not empty, and submitting the first rendering job to a first GPU, wherein the first GPU is any one GPU in the idle queue, and the idle queue is used for placing idle GPUs;
transferring the first GPU from the idle queue to a work queue, wherein the work queue is used for placing the GPU which is working;
responding to the first GPU to finish the rendering of the first rendering operation, and reading back frame data, wherein the frame data is data generated after the first GPU renders the first rendering operation;
transferring the first GPU from the work queue to the free queue.
According to the multi-GPU collaborative rendering method, the idle GPU is placed in the idle queue, the GPU which is working is placed in the work queue, the CPU generates a frame of rendering operation, the idle queue is not empty and submits the rendering operation to any GPU in the idle queue for rendering, the CPU reads back frame data and encodes the frame data, the multi-frame rendering operation is cooperatively and parallelly rendered when the idle queue has a plurality of idle GPUs, the GPU rendering calculation pressure is reduced, compared with the traditional real-time rendering technology, the multi-GPU collaborative rendering method can render more frames in the same time period, and the video frame rendering and encoding efficiency is improved.
In addition, the multi-GPU collaborative rendering method according to the above embodiment of the present invention may further have the following additional technical features:
further, in the multi-GPU collaborative rendering method according to the embodiment of the present invention, the rendering task includes a rendering frame rate, and the method further includes:
calculating a frame interval according to the rendering frame rate;
and setting a plurality of frame submitting moments and a plurality of frame coding moments according to the frame intervals, wherein the intervals between the frame submitting moments are equal to the frame intervals, and the intervals between the frame coding moments are equal to the frame intervals.
Further, in an embodiment of the present invention, the confirming that the free queue is not empty, and submitting the first rendering job to the first GPU includes:
determining whether the idle queue is not empty, and judging whether the current moment reaches a first moment, wherein the first moment is the preset frame submission moment of the first rendering job;
if so, submitting the first rendering job to the first GPU, and setting the frame submission time next to the first time as a second time, wherein the second time is the preset submission time of a second rendering job, and the second rendering job is the next frame rendering job of the first rendering job;
if not, the first rendering operation is submitted to the first GPU when the current time reaches the first time, and the frame submitting time next to the first time is set as the second time.
Further, in an embodiment of the present invention, the determining that the free queue is not empty, and submitting the first rendering job to the first GPU further includes:
generating synchronous data according to the first rendering operation and the second rendering operation, wherein the synchronous data is context data between the first rendering operation and the second rendering operation;
and sending the synchronous data to other GPUs except the first GPU.
Further, in an embodiment of the present invention, after the step of generating the first rendering job for one frame in response to receiving the rendering task, the method further includes:
and confirming that the free queue is empty, and blocking the frame submission process.
Further, in an embodiment of the present invention, the encoding the frame data includes:
judging whether the current moment reaches a third moment, wherein the third moment is the preset frame coding moment of the frame data corresponding to the first rendering operation;
if so, encoding the frame data, and setting the frame encoding time next to the third time as a fourth time, wherein the fourth time is the encoding time of the next frame data of the preset frame data;
if not, encoding the frame data when the current time reaches the third time, and setting the frame encoding time next to the third time as the fourth time.
Further, in one embodiment of the present invention, after the step of transferring the first GPU from the work queue to the free queue, the method further comprises:
and confirming that the frame submitting process is not blocked, and waiting for the generation of the second rendering operation.
On the other hand, an embodiment of the present invention provides a multi-GPU collaborative rendering system, including:
a generating module, configured to generate a first rendering job for a frame in response to receiving a rendering task;
the submitting module is used for confirming that an idle queue is not empty and submitting the first rendering job to a first GPU, wherein the first GPU is any one GPU in the idle queue, and the idle queue is used for placing idle GPUs;
a first transfer module to transfer the first GPU from the idle queue to a work queue, the work queue to place a GPU that is working;
a frame data processing module, configured to respond to that the first GPU completes rendering of the first rendering job, read back frame data, and encode the frame data, where the frame data is data generated after the first GPU renders the first rendering job;
a second transfer module to transfer the first GPU from the work queue to the idle queue.
In another aspect, an embodiment of the present invention provides a multi-GPU collaborative rendering apparatus, including:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method for multi-GPU collaborative rendering.
In another aspect, an embodiment of the present invention provides a storage medium, in which a processor-executable program is stored, where the processor-executable program is used to implement the multi-GPU collaborative rendering method when executed by a processor.
Advantages and benefits of the present invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application:
according to the embodiment of the invention, the idle GPU is placed in the idle queue, the GPU which is working is placed in the work queue, the CPU generates a frame of rendering operation, the idle queue is not empty, the rendering operation is submitted to any GPU in the idle queue for rendering, the CPU reads back frame data and encodes the frame data, the multi-frame rendering operation is cooperatively and parallelly rendered when a plurality of idle GPUs exist in the idle queue, the GPU rendering calculation pressure is reduced, and compared with the traditional real-time rendering technology, the multi-GPU cooperative rendering method can render more frames in the same time period, and the video frame rendering and encoding efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description is made on the drawings of the embodiments of the present application or the related technical solutions in the prior art, and it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flowchart illustrating a method for collaborative rendering with multiple GPUs according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating frame interval control in an embodiment of a multi-GPU collaborative rendering method according to the present invention;
FIG. 3 is a schematic diagram illustrating synchronous data transmission according to an embodiment of a multi-GPU collaborative rendering method;
FIG. 4 is a schematic diagram illustrating an improvement in rendering efficiency according to an embodiment of a multi-GPU collaborative rendering method according to the present invention;
FIG. 5 is a schematic structural diagram illustrating a multi-GPU collaborative rendering system according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a multi-GPU collaborative rendering apparatus according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of the invention and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Video streaming based real-time rendering techniques are commonly applied in cloud game scenes. In a cloud game scene, data storage, calculation and rendering are completed by a cloud end, the cloud end stores an output image to a video memory after rendering is completed, then reads the image back to the memory and encodes the image to generate a video code stream, and a real-time game picture is streamed to a terminal to be displayed and finally displayed to a user.
On the one hand, compared with the cloud game field, the fields of digital twinning, industrial simulation and the like have the characteristics of large scene scale (large model data volume), high resolution requirement and the like, so that the real-time rendering technology is applied to the fields of digital twinning, industrial simulation and the like, the calculation pressure of GPU rendering is high, the frame rate cannot be continuously increased after the load of the GPU reaches the upper limit, and the resource expansion capability of the cloud environment cannot be fully utilized; on the other hand, because the rendering and encoding processes of the existing real-time rendering technology depend on the same GPU to be executed in series, the rendering and encoding efficiency of the video frame is low, and the optimization is difficult to be carried out in a parallel mode.
Therefore, the invention provides a multi-GPU collaborative rendering method, a multi-GPU collaborative rendering system, a multi-GPU collaborative rendering device and a storage medium, idle GPUs are placed in an idle queue, a working GPU is placed in a work queue, a frame of rendering operation is generated by a CPU, the rendering operation is submitted to any one GPU in the idle queue when the idle queue is not empty, frame data is encoded after the CPU reads back frame data, the collaborative and parallel rendering of multi-frame rendering operation when a plurality of idle GPUs exist in the idle queue is realized, the GPU rendering calculation pressure is reduced, and compared with the traditional real-time rendering technology, the multi-GPU collaborative rendering method can render more frames in the same time period, and the video frame rendering and encoding efficiency is improved.
A multi-GPU collaborative rendering method, a system, an apparatus, and a storage medium according to embodiments of the present invention will be described in detail below with reference to the accompanying drawings, and first, a multi-GPU collaborative rendering method according to embodiments of the present invention will be described with reference to the accompanying drawings.
Referring to fig. 1, an embodiment of the present invention provides a multi-GPU collaborative rendering method, and the multi-GPU collaborative rendering method in the embodiment of the present invention may be applied to a terminal, a server, or software running in the terminal or the server. The terminal may be, but is not limited to, a tablet computer, a notebook computer, a desktop computer, and the like. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. The multi-GPU collaborative rendering method in the embodiment of the invention mainly comprises the following steps:
s101, responding to the received rendering task, and generating a first rendering job of one frame;
the rendering task comprises a rendering frame rate (FPS), wherein the rendering frame rate is the number of frames rendered and played in one second.
Figure BDA0003743482510000051
Wherein, T frame Indicating the frame interval.
Specifically, a frame of rendering job, i.e., the first rendering job, is generated by the CPU according to the rendering task.
In the embodiment of the present invention, after receiving the rendering task, the method further includes the following steps:
1) Calculating a frame interval T according to the rendering frame rate frame
2) And setting a plurality of frame submitting moments and a plurality of frame coding moments according to the frame intervals, wherein the intervals among the frame submitting moments are equal to the frame intervals, the intervals among the frame coding moments are equal to the frame intervals, and the like.
S102, confirming that the idle queue is not empty, and submitting the first rendering job to a first GPU;
the first GPU is any one GPU in the idle queue, and the idle queue is used for placing idle GPUs.
In an embodiment of the invention, the frame commit process is blocked if it is confirmed that the free queue is empty.
Specifically, referring to fig. 2, taking three GPUs for collaborative rendering as an example, in the embodiment of the present invention, before submitting a rendering job for each frame, it is necessary to adjust a rendering frame interval according to the frame interval and the frame submitting time obtained in step S101, so that the generation of frame data is kept consistent with the ideal frame interval, thereby making the inter-frame picture effect smoother.
S102 may be further divided into the following steps S1021-S1023:
step S1021, determining whether the idle queue is not empty, and judging whether the current moment reaches a first moment;
and the first moment is the preset submission moment of the first rendering job in the frame submission moments.
Step S1022, if yes, submit the first rendering job to the first GPU, and set the frame submission time next to the first time as a second time.
And the second moment is the preset submission moment of a second rendering job, and the second rendering job is the rendering job of the next frame of the first rendering job.
Step S1023, if not, submitting the first rendering job to the first GPU when the current time reaches the first time, and setting the frame submission time next to the first time as the second time.
According to the priori knowledge, in one rendering task, the rendering of each frame data needs to depend on the data of the preamble frame, so that the problem of data missing in the rendering of the frame does not occur. Referring to fig. 3, in an embodiment of the present invention, in order to reduce the data amount of the interaction between the CPU and each GPU, for the GPU executing the first rendering job, i.e., the first GPU, the full data of the first rendering job is sent, and for the GPUs other than the first GPU, the context data of the first rendering data and the next frame data of the first rendering data (i.e., the second rendering job) is sent as follows:
1) Generating synchronous data according to the first rendering operation and the second rendering operation, wherein the synchronous data is context data between the first rendering operation and the second rendering operation;
2) And sending the synchronous data to other GPUs except the first GPU.
Alternatively, the synchronization data may be sent synchronously when the first rendering job is submitted, or the synchronization data may be sent late after the first rendering job is submitted.
S103, transferring the first GPU from the idle queue to a work queue;
wherein, the work queue is used for placing the GPU which is working.
S104, responding to the first GPU to finish rendering of the first rendering operation, reading back frame data, and encoding the frame data;
wherein the frame data is data generated after the first GPU renders the first rendering job
In an embodiment of the present invention, after the first GPU completes rendering of the first rendering job, the generated frame data is stored in a video memory of the first GPU. The CPU can access the video memory of each GPU in the embodiment of the invention, and read back the generated frame data, thereby encoding the frame data.
Optionally, the CPU reads back the rendered frame data, and sequentially sends the frame data to the encoder for encoding according to the read-back order of the frame data.
Specifically, with continued reference to fig. 2, in the embodiment of the present invention, before submitting frame data of each frame for encoding of the frame data, the frame interval obtained in step S101 and the frame encoding time are required to be adjusted for encoding the frame rate of the video code stream generated by encoding to be more stable, so as to finally present a smooth and stable dynamic picture. Meanwhile, the load of the encoder is reduced by adjusting the encoding interval of the frame data, so that the encoding efficiency is improved. In an embodiment of the present invention, encoding frame data specifically includes the following steps:
1) Judging whether the current moment reaches a third moment, wherein the third moment is the preset frame coding moment of the frame data corresponding to the first rendering operation;
2) If so, encoding the frame data, and setting the frame encoding time next to the third time as a fourth time, wherein the fourth time is the preset encoding time of the frame data next to the frame data;
3) If not, encoding the frame data when the current time reaches the third time, and setting the frame encoding time next to the third time as the fourth time.
S105, transferring the first GPU from the work queue to the idle queue.
In an embodiment of the present invention, after the first GPU is transferred from the work queue to the free queue, if the frame submission process is not blocked, the next frame data (i.e., the second rendering job) is waited for to be generated.
It can be known from the multi-GPU collaborative rendering method described in connection with steps S101-S105 that, in the present invention, by placing an idle GPU in an idle queue, placing a working GPU in a work queue, rendering a frame of rendering job generated by a CPU and submitted to any GPU in the idle queue for rendering when the idle queue is not empty, and encoding frame data after the CPU reads back frame data, it is possible to achieve collaborative and parallel rendering of multi-frame rendering jobs when there are multiple idle GPUs in the idle queue, reducing computational pressure of GPU rendering, and enabling the multi-GPU collaborative rendering method of the present invention to render more frames in the same time period compared with the conventional real-time rendering technology, and improving rendering and encoding efficiency of video frames.
Referring to fig. 4, taking three GPUs as an example, compared with the conventional real-time rendering technology of a single GPU, the multi-GPU collaborative rendering of the embodiment of the present invention can render four frames in the same time period, and the real-time rendering technology of the single GPU can only render two frames in the same time period. It can be understood that the rendering efficiency of the video frame can be further improved by arranging more GPUs
Next, a multi-GPU collaborative rendering system according to an embodiment of the present application will be described with reference to the drawings.
Fig. 5 is a schematic structural diagram of a multi-GPU collaborative rendering system according to an embodiment of the present application.
The system specifically comprises:
a generating module 501, configured to generate a first rendering job for one frame in response to receiving a rendering task;
a submitting module 502, configured to determine that an idle queue is not empty, and submit the first rendering job to a first GPU, where the first GPU is any GPU in the idle queue, and the idle queue is used to place an idle GPU;
a first transfer module 503, configured to transfer the first GPU from the free queue to a work queue, where the work queue is used to place a GPU that is working;
a frame data processing module 504, configured to respond to that the first GPU completes rendering of the first rendering job, read back frame data, and encode the frame data, where the frame data is data generated after the first GPU renders the first rendering job;
a second transfer module 505 to transfer the first GPU from the work queue to the free queue.
It can be seen that the contents in the foregoing method embodiments are all applicable to this system embodiment, the functions specifically implemented by this system embodiment are the same as those in the foregoing method embodiment, and the advantageous effects achieved by this system embodiment are also the same as those achieved by the foregoing method embodiment.
Referring to fig. 6, an embodiment of the present application provides a multi-GPU collaborative rendering apparatus, including:
at least one processor 601;
at least one memory 602 for storing at least one program;
when the at least one program is executed by the at least one processor 601, the at least one processor 601 may implement the multi-GPU co-rendering method described in steps S101-S105.
Similarly, the contents in the method embodiments are all applicable to the apparatus embodiment, the functions specifically implemented by the apparatus embodiment are the same as those in the method embodiments, and the beneficial effects achieved by the apparatus embodiment are also the same as those achieved by the method embodiments.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present application are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present application is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion regarding the actual implementation of each module is not necessary for an understanding of the present application. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the present application as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the application, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, may be embodied in the form of a software product, which is stored in a storage medium and includes several programs for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable programs that can be considered for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with a program execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the programs from the program execution system, apparatus, or device and execute the programs. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the program execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable program execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the foregoing description of the specification, reference to the description of "one embodiment/example," "another embodiment/example," or "certain embodiments/examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that: numerous changes, modifications, substitutions and variations can be made to the embodiments without departing from the principles and spirit of the application, the scope of which is defined by the claims and their equivalents.
While the present application has been described with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A multi-GPU collaborative rendering method is characterized by comprising the following steps:
generating a first rendering job of a frame in response to receiving the rendering task;
confirming that an idle queue is not empty, and submitting the first rendering job to a first GPU, wherein the first GPU is any one GPU in the idle queue, and the idle queue is used for placing idle GPUs;
transferring the first GPU from the idle queue to a work queue, wherein the work queue is used for placing the GPU which is working;
responding to the first GPU to finish rendering of the first rendering operation, reading back frame data, and encoding the frame data, wherein the frame data is generated after the first GPU renders the first rendering operation;
transferring the first GPU from the work queue to the free queue.
2. The multi-GPU collaborative rendering method according to claim 1, wherein the rendering task comprises a rendering frame rate, the method further comprising:
calculating a frame interval according to the rendering frame rate;
and setting a plurality of frame submitting moments and a plurality of frame coding moments according to the frame intervals, wherein the intervals between the frame submitting moments are equal to the frame intervals, and the intervals between the frame coding moments are equal to the frame intervals.
3. The method of claim 2, wherein the determining that the free queue is not empty, submitting the first rendering job to a first GPU comprises:
determining whether the idle queue is not empty, and judging whether the current moment reaches a first moment, wherein the first moment is the preset frame submission moment of the first rendering job;
if so, submitting the first rendering job to the first GPU, and setting the frame submission time next to the first time as a second time, wherein the second time is the preset submission time of a second rendering job, and the second rendering job is the next frame rendering job of the first rendering job;
if not, the first rendering operation is submitted to the first GPU when the current time reaches the first time, and the frame submission time next to the first time is set as the second time.
4. The method of claim 3, wherein the determining that the free queue is not empty, submitting the first rendering job to a first GPU further comprises:
generating synchronous data according to the first rendering operation and the second rendering operation, wherein the synchronous data is context data between the first rendering operation and the second rendering operation;
and sending the synchronous data to other GPUs except the first GPU.
5. The multi-GPU collaborative rendering method according to claim 3, wherein after the step of generating a first rendering job for a frame in response to receiving a rendering task, the method further comprises:
and confirming that the free queue is empty, and blocking the frame submission process.
6. The method according to claim 2, wherein the encoding the frame data comprises:
judging whether the current moment reaches a third moment, wherein the third moment is the coding moment of the frame data corresponding to the first rendering operation in the preset frame coding moments;
if so, encoding the frame data, and setting the frame encoding time next to the third time as a fourth time, wherein the fourth time is the encoding time of the next frame data of the preset frame data;
if not, encoding the frame data when the current time reaches the third time, and setting the frame encoding time next to the third time as the fourth time.
7. The method of claim 5, wherein after the step of transferring the first GPU from the work queue to the free queue, the method further comprises:
and confirming that the frame submitting process is not blocked, and waiting for the generation of the second rendering operation.
8. A multi-GPU collaborative rendering system, comprising:
a generating module, configured to generate a first rendering job for a frame in response to receiving a rendering task;
the submitting module is used for confirming that an idle queue is not empty and submitting the first rendering job to a first GPU, wherein the first GPU is any one GPU in the idle queue, and the idle queue is used for placing idle GPUs;
a first transfer module, configured to transfer the first GPU from the idle queue to a work queue, where the work queue is used to place a GPU that is working;
a frame data processing module, configured to respond to that the first GPU completes rendering of the first rendering job, read back frame data, and encode the frame data, where the frame data is data generated after the first GPU renders the first rendering job;
a second transfer module to transfer the first GPU from the work queue to the idle queue.
9. A multi-GPU collaborative rendering apparatus, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement a multi-GPU collaborative rendering method according to any one of claims 1-7.
10. A storage medium having stored therein a processor-executable program, the processor-executable program when executed by a processor being for implementing a multi-GPU collaborative rendering method according to any of claims 1-7.
CN202210819159.0A 2022-07-13 2022-07-13 Multi-GPU collaborative rendering method, system, device and storage medium Pending CN115375530A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210819159.0A CN115375530A (en) 2022-07-13 2022-07-13 Multi-GPU collaborative rendering method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210819159.0A CN115375530A (en) 2022-07-13 2022-07-13 Multi-GPU collaborative rendering method, system, device and storage medium

Publications (1)

Publication Number Publication Date
CN115375530A true CN115375530A (en) 2022-11-22

Family

ID=84062730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210819159.0A Pending CN115375530A (en) 2022-07-13 2022-07-13 Multi-GPU collaborative rendering method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN115375530A (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006126090A2 (en) * 2005-05-27 2006-11-30 Ati Technologies Inc. APPLYING NON-HOMOGENEOUS PROPERTIES TO MULTIPLE VIDEO PROCESSING UNITS (VPUs)
US20090135180A1 (en) * 2007-11-28 2009-05-28 Siemens Corporate Research, Inc. APPARATUS AND METHOD FOR VOLUME RENDERING ON MULTIPLE GRAPHICS PROCESSING UNITS (GPUs)
US20090201303A1 (en) * 2007-11-23 2009-08-13 Mercury Computer Systems, Inc. Multi-user multi-gpu render server apparatus and methods
US7969444B1 (en) * 2006-12-12 2011-06-28 Nvidia Corporation Distributed rendering of texture data
CN104052803A (en) * 2014-06-09 2014-09-17 国家超级计算深圳中心(深圳云计算中心) Decentralized distributed rendering method and system
CN104952096A (en) * 2014-03-31 2015-09-30 中国电信股份有限公司 CPU and GPU hybrid cloud rendering method, device and system
CN105453130A (en) * 2013-08-12 2016-03-30 三星电子株式会社 Graphics processing apparatus and graphics processing method
CN106575302A (en) * 2014-08-08 2017-04-19 超威半导体公司 Method and system for frame pacing
CN109978751A (en) * 2017-12-28 2019-07-05 辉达公司 More GPU frame renderings
CN111737015A (en) * 2020-08-10 2020-10-02 成都索贝数码科技股份有限公司 Method for increasing number of real-time layers of large-format nonlinear editing based on multiple GPUs
CN112954233A (en) * 2021-01-29 2021-06-11 稿定(厦门)科技有限公司 Video synthesis system and method based on GPU
US20210241416A1 (en) * 2020-02-03 2021-08-05 Sony Interactive Entertainment Inc. System and method for efficient multi-gpu rendering of geometry by subdividing geometry
CN113521735A (en) * 2020-09-02 2021-10-22 北京蔚领时代科技有限公司 Multi-GPU-based real-time rendering method for single-frame picture
JP2022521455A (en) * 2019-01-30 2022-04-08 ソニー・インタラクティブエンタテインメント エルエルシー Scalable game console CPU / GPU design for home game consoles and cloud games
KR20220045286A (en) * 2020-10-05 2022-04-12 서강대학교산학협력단 GPU Scheduling Framework for Accelerating Deep Learning Hyper Parameter Optimization in a Cloud

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006126090A2 (en) * 2005-05-27 2006-11-30 Ati Technologies Inc. APPLYING NON-HOMOGENEOUS PROPERTIES TO MULTIPLE VIDEO PROCESSING UNITS (VPUs)
US7969444B1 (en) * 2006-12-12 2011-06-28 Nvidia Corporation Distributed rendering of texture data
US20090201303A1 (en) * 2007-11-23 2009-08-13 Mercury Computer Systems, Inc. Multi-user multi-gpu render server apparatus and methods
US20090135180A1 (en) * 2007-11-28 2009-05-28 Siemens Corporate Research, Inc. APPARATUS AND METHOD FOR VOLUME RENDERING ON MULTIPLE GRAPHICS PROCESSING UNITS (GPUs)
CN105453130A (en) * 2013-08-12 2016-03-30 三星电子株式会社 Graphics processing apparatus and graphics processing method
CN104952096A (en) * 2014-03-31 2015-09-30 中国电信股份有限公司 CPU and GPU hybrid cloud rendering method, device and system
CN104052803A (en) * 2014-06-09 2014-09-17 国家超级计算深圳中心(深圳云计算中心) Decentralized distributed rendering method and system
CN106575302A (en) * 2014-08-08 2017-04-19 超威半导体公司 Method and system for frame pacing
CN109978751A (en) * 2017-12-28 2019-07-05 辉达公司 More GPU frame renderings
JP2022521455A (en) * 2019-01-30 2022-04-08 ソニー・インタラクティブエンタテインメント エルエルシー Scalable game console CPU / GPU design for home game consoles and cloud games
US20210241416A1 (en) * 2020-02-03 2021-08-05 Sony Interactive Entertainment Inc. System and method for efficient multi-gpu rendering of geometry by subdividing geometry
CN111737015A (en) * 2020-08-10 2020-10-02 成都索贝数码科技股份有限公司 Method for increasing number of real-time layers of large-format nonlinear editing based on multiple GPUs
CN113521735A (en) * 2020-09-02 2021-10-22 北京蔚领时代科技有限公司 Multi-GPU-based real-time rendering method for single-frame picture
KR20220045286A (en) * 2020-10-05 2022-04-12 서강대학교산학협력단 GPU Scheduling Framework for Accelerating Deep Learning Hyper Parameter Optimization in a Cloud
CN112954233A (en) * 2021-01-29 2021-06-11 稿定(厦门)科技有限公司 Video synthesis system and method based on GPU

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XIAOWEI REN ET AL.: "CHOPIN: Scalable Graphics Rendering in Multi-GPU Systems via Parallel Image Composition", 2021 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 22 April 2021 (2021-04-22), pages 709 - 722 *
刘红健;: "基于GPU的集群渲染管理系统设计与实现", 广州航海高等专科学校学报, vol. 21, no. 01, 30 March 2013 (2013-03-30), pages 32 - 35 *
曹旻;田力;朱永华;: "多GPU混合结构下FMM近程算法的优化", 计算机工程与应用, vol. 49, no. 08, 15 April 2013 (2013-04-15), pages 37 - 42 *

Similar Documents

Publication Publication Date Title
CN103838779A (en) Idle computing resource multiplexing type cloud transcoding method and system and distributed file device
Tan et al. Media cloud: When media revolution meets rise of cloud computing
CN111314741B (en) Video super-resolution processing method and device, electronic equipment and storage medium
CN112843676B (en) Data processing method, device, terminal, server and storage medium
WO2023011033A1 (en) Image processing method and apparatus, computer device and storage medium
CN111310744A (en) Image recognition method, video playing method, related device and medium
CN111669595A (en) Screen content coding method, device, equipment and medium
CN100446562C (en) Mutimedium stream system for wireless manual apparatus
US20120265858A1 (en) Streaming portions of a quilted graphic 2d image representation for rendering into a digital asset
CN115375530A (en) Multi-GPU collaborative rendering method, system, device and storage medium
Liu et al. An efficient video steganography method based on HEVC
CN115391053A (en) Online service method and device based on CPU and GPU hybrid calculation
US20120263224A1 (en) Encoding digital assets as an image
CN111510715B (en) Video processing method, system, computer device and storage medium
CN112188235A (en) Media processing mode selection method and media processing method
CN112131423A (en) Picture acquisition method, device and system
Zheng et al. A rate control scheme for distributed high performance video encoding in cloud
WO2023078204A1 (en) Data processing method and apparatus, device, readable storage medium, and program product
CN115866248B (en) Video transcoding method, device, computer equipment and storage medium
WO2024060213A1 (en) Viewport switch latency reduction in live streaming
EP3989566A1 (en) Motion information list construction method in video encoding and decoding, device, and apparatus
WO2024037137A1 (en) Data processing method and apparatus for immersive media, and device, medium and product
CN104219540B (en) Distributing coding/decoding system and method
CN116886982A (en) Video data processing method, device, computer equipment and storage medium
CN117412064A (en) Video decoding method, device, electronic equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination