CN112328532A - Multi-GPU communication method and device, storage medium and electronic device - Google Patents

Multi-GPU communication method and device, storage medium and electronic device Download PDF

Info

Publication number
CN112328532A
CN112328532A CN202011202092.3A CN202011202092A CN112328532A CN 112328532 A CN112328532 A CN 112328532A CN 202011202092 A CN202011202092 A CN 202011202092A CN 112328532 A CN112328532 A CN 112328532A
Authority
CN
China
Prior art keywords
gpu
image processing
slave
gpus
processing task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011202092.3A
Other languages
Chinese (zh)
Other versions
CN112328532B (en
Inventor
龙斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha Jingmei Integrated Circuit Design Co ltd
Changsha Jingjia Microelectronics Co ltd
Original Assignee
Changsha Jingmei Integrated Circuit Design Co ltd
Changsha Jingjia Microelectronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha Jingmei Integrated Circuit Design Co ltd, Changsha Jingjia Microelectronics Co ltd filed Critical Changsha Jingmei Integrated Circuit Design Co ltd
Priority to CN202011202092.3A priority Critical patent/CN112328532B/en
Publication of CN112328532A publication Critical patent/CN112328532A/en
Application granted granted Critical
Publication of CN112328532B publication Critical patent/CN112328532B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the application provides a method and a device for multi-GPU communication, a storage medium and an electronic device, wherein the method is used for communication between a master GPU and a plurality of slave GPUs, the master GPU and each slave GPU are in one-to-one communication through a high-speed interconnection bus, and the one-to-one communication comprises the steps of receiving image processing tasks; dividing the image processing task to obtain a plurality of image processing subtasks; distributing the plurality of image processing subtasks according to the number of the slave GPUs; receiving a processing result of the slave GPU on the image processing task; and merging the processing results and splicing to obtain a target image. By adopting the scheme in the application, the GPU is used for cooperatively finishing complex drawing and computing tasks.

Description

Multi-GPU communication method and device, storage medium and electronic device
Technical Field
The present application relates to computer image processing technologies, and in particular, to a method and apparatus for multi-GPU communication, a storage medium, and an electronic apparatus.
Background
In the field of image processing, multiple GPUs are commonly used to cooperate with each other to perform a part of image processing tasks.
As image processing tasks become more complex, the overall performance of the GPU is on the decline.
Aiming at the problem that the overall performance of a GPU is not high when complex image processing tasks are processed in the related art, an effective solution does not exist at present.
Disclosure of Invention
The embodiment of the application provides a multi-GPU communication method and device, a storage medium and an electronic device, so as to at least solve the problem that the overall performance of a GPU is not high when complex image processing tasks are processed in the related art.
According to a first aspect of embodiments of the present application, there is provided a multi-GPU communication method for performing communication between a master GPU and a plurality of slave GPUs, the master GPU performing one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, the method comprising: receiving an image processing task; dividing the image processing task to obtain a plurality of image processing subtasks; distributing the plurality of image processing subtasks according to the number of the slave GPUs; receiving a processing result of the slave GPU on the image processing task; and merging the processing results and splicing to obtain a target image.
According to a second aspect of the embodiments of the present application, there is provided a method for multi-GPU communication, for performing communication between a master GPU and a plurality of slave GPUs, the master GPU performing one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, the method comprising: receiving a plurality of image processing subtasks obtained by dividing the image processing task by the main GPU; processing the plurality of image processing subtasks according to a preset mode; and sending the processing result of the image processing task to the main GPU.
According to a third aspect of the embodiments of the present application, there is provided a multi-GPU communication apparatus for performing communication between a master GPU and a plurality of slave GPUs, the master GPU performing one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, the apparatus comprising: the first receiving module is used for receiving the image processing task; the segmentation module is used for segmenting the image processing task to obtain a plurality of image processing subtasks; the distribution module is used for distributing the image processing subtasks according to the number of the secondary GPUs; the first receiving module is further configured to receive a processing result of the slave GPU on the image processing task; and the splicing module is used for merging the processing results and splicing to obtain a target image.
According to a fourth aspect of the embodiments of the present application, there is provided a multi-GPU communication apparatus for performing communication between a master GPU and a plurality of slave GPUs, the master GPU performing one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, the apparatus comprising: the second receiving module is used for receiving a plurality of image processing subtasks obtained by dividing the image processing task by the main GPU; the processing module is used for processing the image processing subtasks according to a preset mode; and the sending module is used for sending the processing result of the image processing task to the main GPU.
According to a fifth aspect of embodiments of the present application, there is provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above-mentioned method embodiments when executed.
According to a sixth aspect of embodiments of the present application, there is provided an electronic apparatus, comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the steps of any of the above method embodiments.
By adopting the multi-GPU communication method and device, the storage medium and the electronic device, the main GPU receives the drawing tasks sent by the upper layer system, the automatic cutting of the drawing tasks is completed, partial drawing tasks are distributed to the multiple secondary GPUs, the loads of the multiple secondary GPUs are managed simultaneously, and after the secondary GPUs finish drawing, the drawing results are received, merged and spliced into a whole image to be output. The purpose that the GPU completes complex drawing and computing tasks in a coordinated mode is achieved, and the technical effect of improving the overall performance of the GPU is achieved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic diagram of a system architecture in an embodiment of the present application;
FIG. 2 is a schematic flowchart of a method for multi-GPU communication according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of an apparatus for multi-GPU communication according to an embodiment of the present disclosure;
FIG. 4 is a flowchart illustrating a method for multi-GPU communication according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of an apparatus for multi-GPU communication according to an embodiment of the present disclosure;
FIG. 6 is a diagram illustrating an example of uniform distribution of image processing tasks according to drawing areas;
fig. 7 is a schematic diagram illustrating non-uniform distribution of image processing tasks according to drawing areas in an embodiment of the present application.
Detailed Description
In the process of implementing the present application, the inventors found that, in image processing, graphics are drawn by multiple GPUs and then displayed, however, the multiple GPUs cannot cooperatively process tasks in parallel, which results in low overall performance of the GPUs.
In view of the foregoing problems, an embodiment of the present application provides a multi-GPU communication method for performing communication between a master GPU and a plurality of slave GPUs, where the master GPU performs one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, and the method includes: receiving an image processing task; dividing the image processing task to obtain a plurality of image processing subtasks; distributing the plurality of image processing subtasks according to the number of the slave GPUs; receiving a processing result of the slave GPU on the image processing task; and merging the processing results and splicing to obtain a target image. In the application, a scheme of interconnection and cooperative work of a plurality of GPUs is adopted, and the purpose of cooperatively finishing complex drawing and computing tasks is achieved.
In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Example one
The method provided by the first embodiment of the present application may be executed in a system with multiple GPUs or a system with a similar architecture. Taking a system operating on multiple GPUs as an example, fig. 1 is a schematic structural diagram of a GPU system implemented by the method for communicating multiple GPUs according to the embodiment of the present invention. Fig. 1 shows a GPU architecture of one master GPU and four slave GPUs. And the master GPU and the slave GPU are in one-to-one communication through a high-speed interconnection bus. And the main GPU receives the drawing tasks sent by the PCIE bus at the upper layer and transmits the drawing tasks to the other three slave GPUs through the high-speed interconnection bus, the three slave GPUs are also communicated through the high-speed interconnection bus, and the data required in the drawing process can be shared between the main GPU and the slave GPUs through the shared RAM and the shared Cache.
In this embodiment, a method for multi-GPU communication is provided, which is used for performing communication between a master GPU and a plurality of slave GPUs, where the master GPU performs one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, as shown in fig. 2, and the process includes the following steps:
step S201, receiving an image processing task;
step S202, the image processing task is divided to obtain a plurality of image processing subtasks;
step S203, distributing the image processing subtasks according to the number of the slave GPUs;
step S204, receiving the processing result of the image processing task from the GPU;
and S205, merging the processing results and splicing to obtain a target image.
According to the multi-GPU communication method, the drawing tasks sent by the upper-layer system are received by the main GPU, automatic drawing task cutting is completed, partial drawing tasks are distributed to the multiple auxiliary GPUs, the loads of the multiple auxiliary GPUs are managed simultaneously, and after the auxiliary GPUs finish drawing, drawing results are received, merged and spliced into a whole image to be output. The purpose that the GPU completes complex drawing and computing tasks in a coordinated mode is achieved, and the technical effect of improving the overall performance of the GPU is achieved.
In step S201, the image processing task is received by the master GPU.
In a specific embodiment, the master GPU receives an image processing task issued by an upper system, i.e., a CPU.
In a preferred embodiment, the image processing task comprises: and (5) drawing tasks.
In step S202, the master GPU divides the image processing task to obtain a plurality of image processing subtasks.
In a specific embodiment, the master GPU performs partitioning according to a drawing task sent by an upper system, and evenly distributes the drawing task to a plurality of slave GPUs according to the number of the slave GPUs.
In a preferred embodiment, the transmission of the drawing task when the drawing task is uniformly distributed to a plurality of slave GPUs is realized by a high-speed interconnection bus.
In step S203, the master GPU allocates the plurality of image processing subtasks according to the number of the slave GPUs.
In one embodiment, the master GPU receives the upper system drawing task, packages the drawing data required by the slave GPU, and sends the data package to the slave GPU.
In a preferred embodiment, the slave GPU parses the data packet after receiving the data packet, and stores the data packet in a specified location. And after receiving, sending a response information packet to the main GPU.
In step S204, the master GPU receives the processing result of the image processing task from the slave GPU, and the master GPU also performs the processing of the image processing task.
In a preferred embodiment, the image processing task comprises: and (5) drawing tasks.
In one embodiment, the master GPU sends the associated drawing commands to the slave GPU for processing.
In a preferred embodiment, the slave GPU automatically performs a drawing task according to a drawing command and drawing data, and sends a drawing completion signal to the master GPU after the drawing task is completed.
In step S205, the master GPU merges and splices the processing results to obtain a target image.
In a specific embodiment, the processing result and the image processing result of the main GPU are merged and then spliced to obtain a target image.
In a preferred embodiment, after receiving all the drawing results sent by the slave GPU, the master GPU splices the drawing results and its own drawing results into a complete image and outputs the complete image to the display port.
In a preferred embodiment, the master GPU repeatedly sends different drawing data and drawing commands to the slave GPU for a plurality of times in succession, so as to drive the slave GPU to draw different graphics in the same sub-screen. In terms of display effect, one graphic is drawn in the same sub-screen, and then other graphics need to be drawn, so as to achieve effects such as overlapping and perspective of multiple graphics.
As an optional implementation manner of the present application, the allocating the plurality of image processing subtasks according to the number of the slave GPUs includes: sending a drawing data packet in the image processing subtask to be distributed by the slave GPU to the slave GPU; and receiving an information response packet when the slave GPU finishes receiving the drawing data packet.
In specific implementation, the master GPU receives the upper system drawing task, packages the drawing data required by the slave GPU, and sends the data package to the slave GPU. And after receiving the data packet, the slave GPU analyzes the data packet, stores the data to a specified position, and sends a response information packet to the master GPU and executes a work task.
As an optional implementation manner of the present application, the allocating the plurality of image processing subtasks according to the number of the slave GPUs further includes: sending an image processing task command to the slave GPU; receiving a completion response packet of the image processing task under the condition that the slave GPU completes the image processing task according to the image processing task command and the drawing data packet; sending an image processing task result permission receiving request to the slave GPU; and receiving a processing result of the image processing task according to the permission receiving request.
During specific implementation, the master GPU sends a drawing command to the slave GPU, and the slave GPU automatically executes to complete a drawing task according to the drawing command and drawing data. And sending a drawing completion signal to the master GPU after the drawing task is completed.
As an optional implementation manner of the present application, the allocating the plurality of image processing subtasks according to the number of the slave GPUs further includes: dividing a drawing area according to the coordinates of the target image and the number of the slave GPUs and the master GPU to obtain a target area; and sending the drawing data packets and the image processing task commands belonging to different drawing areas to the corresponding slave GPU.
In a specific implementation, the master GPU sends the drawing result to the slave GPU, allowing the receiving of the request, and the slave GPU sends the drawing result to the master GPU.
As an optional implementation manner of the present application, the performing, by the master GPU, one-to-one communication with each of the slave GPUs through the high-speed interconnection bus includes: performing one-to-one communication between each slave GPU through a high-speed interconnection bus; and/or the master GPU and each slave GPU share data required in the process of executing the image processing task through a shared RAM; and/or sharing data required in the process of executing the image processing task between the master GPU and each slave GPU through a shared Cache.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example two
In this embodiment, a method for multi-GPU communication is provided, which is used for performing communication between a master GPU and a plurality of slave GPUs, where the master GPU performs one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, as shown in fig. 4, the process includes the following steps:
step S401, receiving a plurality of image processing subtasks obtained by the main GPU after the image processing task is divided;
step S402, processing the image processing subtasks according to a preset mode;
step S403, sending the processing result of the image processing task to the master GPU.
According to the multi-GPU communication method, the drawing tasks sent by the upper-layer system are received by the main GPU, automatic drawing task cutting is completed, partial drawing tasks are distributed to the multiple auxiliary GPUs, the loads of the multiple auxiliary GPUs are managed simultaneously, and after the auxiliary GPUs finish drawing, drawing results are received, merged and spliced into a whole image to be output. The purpose of completing complex drawing and computing tasks in a coordinated mode is achieved through the GPU configuration method and the GPU configuration method, and the technical effect of improving the overall performance of the GPU is achieved.
In step S401, a plurality of image processing subtasks obtained by dividing the image processing task by the master GPU are received in the plurality of slave GPUs.
In a specific embodiment, the master GPU performs partitioning according to a drawing task sent by an upper system, and evenly distributes the drawing task to a plurality of slave GPUs according to the number of the slave GPUs.
In a preferred embodiment, the transmission of the drawing task when the drawing task is uniformly distributed to a plurality of slave GPUs is realized by a high-speed interconnection bus.
In step S402, the plurality of image processing sub-tasks are processed in the plurality of slave GPUs according to a preset manner.
In one embodiment, the master GPU receives the upper system drawing task, packages the drawing data required by the slave GPU, and sends the data package to the slave GPU.
In a preferred embodiment, the slave GPU parses the data packet after receiving the data packet, and stores the data packet in a specified location. And after receiving, sending a response information packet to the main GPU.
In step S403, the slave GPU sends the processing result of the image processing task to the master GPU.
In a preferred embodiment, the image processing task comprises: and (5) drawing tasks.
In one embodiment, the master GPU sends the associated drawing commands to the slave GPU for processing.
In a preferred embodiment, the slave GPU automatically performs a drawing task according to a drawing command and drawing data, and sends a drawing completion signal to the master GPU after the drawing task is completed.
In a preferred embodiment, after receiving all the drawing results sent by the slave GPU, the master GPU splices the drawing results and its own drawing results into a complete image and outputs the complete image to the display port.
In a preferred embodiment, the master GPU repeatedly sends different drawing data and drawing commands to the slave GPU for a plurality of times in succession, so as to drive the slave GPU to draw different graphics in the same sub-screen. In terms of display effect, one graphic is drawn in the same sub-screen, and then other graphics need to be drawn, so as to achieve effects such as overlapping and perspective of multiple graphics.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
EXAMPLE III
In this embodiment, a multi-GPU communication device is also provided, and the device is used for implementing the above embodiments and preferred embodiments, which have already been described and will not be described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 3 is a block diagram of an apparatus for multi-GPU communication according to an embodiment of the present invention, configured to perform communication between a master GPU and a plurality of slave GPUs, the master GPU performing one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, as shown in fig. 3, the apparatus including:
a first receiving module 30, configured to receive an image processing task;
a segmentation module 31, configured to segment the image processing task to obtain a plurality of image processing subtasks;
an allocation module 32, configured to allocate the plurality of image processing subtasks according to the number of the slave GPUs;
the first receiving module 30 is further configured to receive a processing result of the slave GPU on the image processing task;
and the splicing module 33 is configured to merge and splice the processing results to obtain a target image.
The image processing task is received by the master GPU in the first receiving module 30.
In a specific embodiment, the master GPU receives an image processing task issued by an upper system, i.e., a CPU.
In a preferred embodiment, the image processing task comprises: and (5) drawing tasks.
In the segmentation module 31, the main GPU segments the image processing task to obtain a plurality of image processing subtasks.
In a specific embodiment, the master GPU performs partitioning according to a drawing task sent by an upper system, and evenly distributes the drawing task to a plurality of slave GPUs according to the number of the slave GPUs.
In a preferred embodiment, the transmission of the drawing task when the drawing task is uniformly distributed to a plurality of slave GPUs is realized by a high-speed interconnection bus.
The master GPU in the allocation module 32 allocates the plurality of image processing subtasks according to the number of the slave GPUs.
In one embodiment, the master GPU receives the upper system drawing task, packages the drawing data required by the slave GPU, and sends the data package to the slave GPU.
In a preferred embodiment, the slave GPU parses the data packet after receiving the data packet, and stores the data packet in a specified location. And after receiving, sending a response information packet to the main GPU.
In the first receiving module 30, the master GPU receives the processing result of the image processing task from the slave GPU, and the master GPU also performs the processing of the image processing task.
In a preferred embodiment, the image processing task comprises: and (5) drawing tasks.
In one embodiment, the master GPU sends the associated drawing commands to the slave GPU for processing.
In a preferred embodiment, the slave GPU automatically performs a drawing task according to a drawing command and drawing data, and sends a drawing completion signal to the master GPU after the drawing task is completed.
And the master GPU in the stitching module 33 merges the processing results and then stitches the processing results to obtain a target image.
In a specific embodiment, the processing result and the image processing result of the main GPU are merged and then spliced to obtain a target image.
In a preferred embodiment, after receiving all the drawing results sent by the slave GPU, the master GPU splices the drawing results and its own drawing results into a complete image and outputs the complete image to the display port.
In a preferred embodiment, the master GPU repeatedly sends different drawing data and drawing commands to the slave GPU for a plurality of times in succession, so as to drive the slave GPU to draw different graphics in the same sub-screen. In terms of display effect, one graphic is drawn in the same sub-screen, and then other graphics need to be drawn, so as to achieve effects such as overlapping and perspective of multiple graphics.
Example four
In this embodiment, a multi-GPU communication device is also provided, and the device is used for implementing the above embodiments and preferred embodiments, which have already been described and will not be described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 5 is a block diagram of an apparatus for multi-GPU communication according to an embodiment of the present invention, configured to perform communication between a master GPU and a plurality of slave GPUs, the master GPU performing one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, as shown in fig. 5, the apparatus including:
a second receiving module 50, configured to receive a plurality of image processing subtasks obtained by splitting the image processing task by the master GPU;
a processing module 51, configured to process the multiple image processing sub-tasks according to a preset manner;
a sending module 52, configured to send the processing result of the image processing task to the master GPU.
The second receiving module 50 receives a plurality of image processing subtasks obtained by dividing the image processing task by the master GPU from the plurality of slave GPUs.
In a specific embodiment, the master GPU performs partitioning according to a drawing task sent by an upper system, and evenly distributes the drawing task to a plurality of slave GPUs according to the number of the slave GPUs.
In a preferred embodiment, the transmission of the drawing task when the drawing task is uniformly distributed to a plurality of slave GPUs is realized by a high-speed interconnection bus.
The processing module 51 processes the image processing sub-tasks in the slave GPUs according to a preset manner.
In one embodiment, the master GPU receives the upper system drawing task, packages the drawing data required by the slave GPU, and sends the data package to the slave GPU.
In a preferred embodiment, the slave GPU parses the data packet after receiving the data packet, and stores the data packet in a specified location. And after receiving, sending a response information packet to the main GPU.
The sending module 52 sends the processing result of the image processing task to the master GPU in the slave GPU.
In a preferred embodiment, the image processing task comprises: and (5) drawing tasks.
In one embodiment, the master GPU sends the associated drawing commands to the slave GPU for processing.
In a preferred embodiment, the slave GPU automatically performs a drawing task according to a drawing command and drawing data, and sends a drawing completion signal to the master GPU after the drawing task is completed.
In a preferred embodiment, after receiving all the drawing results sent by the slave GPU, the master GPU splices the drawing results and its own drawing results into a complete image and outputs the complete image to the display port.
In a preferred embodiment, the master GPU repeatedly sends different drawing data and drawing commands to the slave GPU for a plurality of times in succession, so as to drive the slave GPU to draw different graphics in the same sub-screen. In terms of display effect, one graphic is drawn in the same sub-screen, and then other graphics need to be drawn, so as to achieve effects such as overlapping and perspective of multiple graphics.
In order to better understand the method flow of the multi-GPU communication, the following explains the technical solutions with reference to the preferred embodiments, but is not limited to the technical solutions of the embodiments of the present invention.
In the application, the master GPU packs the working data to be executed by the slave GPU and directly sends the data packet to the slave GPU through the high-speed inter-chip interconnection bus; the slave GPU analyzes the data packet after receiving the data packet and executes a work task; and after the task execution is finished, the slave GPU packs the task result data and returns the task result data to the master GPU through the high-speed inter-slice interconnection bus.
In the following, taking an image processing task as an example of a drawing task, the GPU system includes a master GPU and three slave GPUs for detailed description.
The method comprises the following steps that a primary GPU sends drawing tasks to be divided according to an upper layer, the drawing tasks are uniformly distributed to a plurality of secondary GPUs according to the number of the secondary GPUs, and a transmission process adopts a high-speed interconnection bus, and specifically comprises the following steps:
step S1, the master GPU receives the upper layer system drawing task, packs the drawing data needed by the slave GPU and sends the data packet to the slave GPU;
step S2, analyzing the data packet after receiving the data packet from the GPU, and storing the data to a specified position; after receiving, sending a response information packet to the main GPU and executing a work task;
step S3, the master GPU sends a drawing command to a command packet of the slave GPU;
step S4, the slave GPU automatically executes and completes the drawing task according to the drawing command and the drawing data, and sends a response packet of a drawing completion signal to the master GPU after the drawing task is completed;
step S5, the master GPU sends a drawing result to the slave GPU to allow receiving the requested data packet;
step S6, sending a data packet of the drawing result from the GPU to the master GPU;
and step S7, after completing all the drawing results sent by the slave GPU, the master GPU splices the drawing results into a complete image and outputs the complete image to the display port.
In particular, the steps S2 to S6 may be irregular, or repeated for multiple times, that is, different drawing data and drawing commands may be sent to the slave GPU repeatedly and continuously for multiple times, so as to drive the slave GPU to draw different graphics in the same sub-frame. That is, in the same picture, one graph is drawn first, and then other graphs need to be drawn, so as to realize the effects of overlapping and perspective of multiple graphs.
The above drawing data packet format sequentially includes: the system comprises a header as a starting delimiter, a packet attribute field, a destination address, data length, data, a check code and a packet tail as an ending delimiter.
The header is a string of continuous specific characters, and the characters do not continuously appear in the data packet, so as to avoid parsing errors when the data packet is received.
The packet attribute field indicates the format of data in the current data packet, and may be command, data, and response.
The command is used for indicating that the current data packet is a drawing command, and the slave equipment needs to automatically analyze and complete the drawing command.
The data is used for indicating the received GPU, the currently transmitted GPU is drawing data or drawing results, and the receiving equipment only needs to complete the analysis of the data packet and then store the data packet to a specified destination address.
The response means that the data in the current data packet is: the current receiving device response information specifically includes: the receiving completion and the verification information indicate that the receiving from the GPU is completed currently. And when the data check received from the device fails, the transmitting device needs to decide whether to retransmit at this time.
The permission to send refers to permission of the current device to receive data.
The state information refers to the state information of the current receiving device.
The destination address refers to a destination address where data in the current data packet needs to be stored.
The data length refers to the number of real data in the current data packet,
The data refers to data bits in a data packet, and the specific meaning is analyzed according to packet attribute fields;
the check code means that data must be checked because an external interconnection bus is adopted and errors are difficult to avoid in the data transmission process.
The packet tail is a continuous specific character similar to the packet head and identifies the end of the data packet.
As shown in fig. 6, the master GPU divides the drawing area into four areas according to the number of GPUs according to the image coordinates, sends drawing data and drawing commands belonging to different drawing areas to the designated slave GPUs, and sends the drawing result of the self-managed area to the master GPU after one slave GPU completes the drawing task; and the master GPU collects the drawing results of all the slave GPUs and simultaneously splices the drawing results with the drawing results of the self GPU to restore the drawing results into images required by the upper system for display and output.
As shown in fig. 7, the drawing task is assigned according to image density and complexity. Because the real drawing tasks are not uniformly distributed in the image area, most of the drawing tasks may be concentrated in a specific area at some time, and at this time, the master GPU needs to wait for all the slave GPUs and the self GPU to finish drawing so as to collect and splice the final display image. The longest time at this time, the drawing performance, is determined by the most complex area. At this time, the master GPU reasonably arranges the drawing work of each slave GPU according to the drawing complexity. As shown in fig. 7, when the drawing tasks are all concentrated on the upper half area of the image, the master GPU will autonomously allocate three GPUs to draw the upper half area at the same time, so as to balance the drawing load of each GPU as a whole, thereby achieving the purpose of improving the overall performance.
Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, receiving an image processing task;
s2, dividing the image processing task to obtain a plurality of image processing subtasks;
s3, distributing the image processing subtasks according to the number of the slave GPUs;
s4, receiving the processing result of the image processing task from the GPU;
and S5, merging the processing results and splicing to obtain the target image.
And/or the presence of a gas in the gas,
s1, receiving a plurality of image processing subtasks obtained by dividing the image processing task by the main GPU;
s2, processing the image processing subtasks according to a preset mode;
s3, sending the processing result of the image processing task to the main GPU.
Optionally, the storage medium is further arranged to store a computer program for performing the steps of:
s31, sending the drawing data packet in the image processing subtask to be distributed by the slave GPU to the slave GPU;
s32, receiving an information response packet when the slave GPU has received the drawing data packet.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, receiving an image processing task;
s2, dividing the image processing task to obtain a plurality of image processing subtasks;
s3, distributing the image processing subtasks according to the number of the slave GPUs;
s4, receiving the processing result of the image processing task from the GPU;
and S5, merging the processing results and splicing to obtain the target image.
And/or the presence of a gas in the gas,
s1, receiving a plurality of image processing subtasks obtained by dividing the image processing task by the main GPU;
s2, processing the image processing subtasks according to a preset mode;
s3, sending the processing result of the image processing task to the main GPU.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (12)

1. A method for multi-GPU communication, for communicating between a master GPU and a plurality of slave GPUs, the master GPU in one-to-one communication with each of the slave GPUs via a high-speed interconnect bus, the method comprising:
receiving an image processing task;
dividing the image processing task to obtain a plurality of image processing subtasks;
distributing the plurality of image processing subtasks according to the number of the slave GPUs;
receiving a processing result of the slave GPU on the image processing task;
and merging the processing results and splicing to obtain a target image.
2. The method of claim 1, wherein said allocating the plurality of image processing subtasks according to the number of slave GPUs comprises:
sending a drawing data packet in the image processing subtask to be distributed by the slave GPU to the slave GPU;
and receiving an information response packet when the slave GPU finishes receiving the drawing data packet.
3. The method of claim 2, wherein said allocating the plurality of image processing subtasks according to the number of slave GPUs further comprises:
sending an image processing task command to the slave GPU;
receiving a completion response packet of the image processing task under the condition that the slave GPU completes the image processing task according to the image processing task command and the drawing data packet;
sending an image processing task result permission receiving request to the slave GPU;
and receiving a processing result of the image processing task according to the permission receiving request.
4. The method of claim 1, wherein said allocating the plurality of image processing subtasks according to the number of slave GPUs further comprises:
dividing a drawing area according to the coordinates of the target image and the number of the slave GPUs and the master GPU to obtain a target area;
and sending the drawing data packets and the image processing task commands belonging to different drawing areas to the corresponding slave GPU.
5. The method of claim 1, wherein the one-to-one communication between the master GPU and each of the slave GPUs via a high speed interconnect bus comprises:
performing one-to-one communication between each slave GPU through a high-speed interconnection bus;
and/or the master GPU and each slave GPU share data required in the process of executing the image processing task through a shared RAM;
and/or sharing data required in the process of executing the image processing task between the master GPU and each slave GPU through a shared Cache.
6. The method according to claim 1, wherein the merging and splicing the processing results to obtain the target image comprises:
and merging the processing result and the image processing result of the main GPU and splicing to obtain a target image.
7. A method for multi-GPU communication, for communicating between a master GPU and a plurality of slave GPUs, the master GPU in one-to-one communication with each of the slave GPUs via a high-speed interconnect bus, the method comprising:
receiving a plurality of image processing subtasks obtained by dividing the image processing task by the main GPU;
processing the plurality of image processing subtasks according to a preset mode;
and sending the processing result of the image processing task to the main GPU.
8. The method of claim 7, wherein the receiving a plurality of image processing subtasks resulting from the splitting of the image processing task by the master GPU comprises:
receiving a drawing data packet in an image processing subtask to be distributed;
under the condition that the drawing data packet is received completely, sending an information response packet to the main GPU;
executing and completing the image processing task according to the image processing task command and the drawing data packet;
sending a response packet of the image processing task to the master GPU when the image processing task is completed;
and sending the processing result of the image processing task to the main GPU under the condition that the image processing task result permission receiving request is received.
9. An apparatus for multi-GPU communication, wherein the apparatus is configured to communicate between a master GPU and a plurality of slave GPUs, and wherein the master GPU communicates with each of the slave GPUs in a one-to-one manner via a high-speed interconnect bus, the apparatus comprising:
the first receiving module is used for receiving the image processing task;
the segmentation module is used for segmenting the image processing task to obtain a plurality of image processing subtasks;
the distribution module is used for distributing the image processing subtasks according to the number of the secondary GPUs;
the first receiving module is further configured to receive a processing result of the slave GPU on the image processing task;
and the splicing module is used for merging the processing results and splicing to obtain a target image.
10. An apparatus for multi-GPU communication, wherein the apparatus is configured to communicate between a master GPU and a plurality of slave GPUs, and wherein the master GPU communicates with each of the slave GPUs in a one-to-one manner via a high-speed interconnect bus, the apparatus comprising:
the second receiving module is used for receiving a plurality of image processing subtasks obtained by dividing the image processing task by the main GPU;
the processing module is used for processing the image processing subtasks according to a preset mode;
and the sending module is used for sending the processing result of the image processing task to the main GPU.
11. A storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the method of any of claims 1 to 6 when executed, and/or to perform the method of any of claims 7 to 8.
12. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 6 and/or to perform the method of any of claims 7 to 8.
CN202011202092.3A 2020-11-02 2020-11-02 Method and device for multi-GPU communication, storage medium and electronic device Active CN112328532B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011202092.3A CN112328532B (en) 2020-11-02 2020-11-02 Method and device for multi-GPU communication, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011202092.3A CN112328532B (en) 2020-11-02 2020-11-02 Method and device for multi-GPU communication, storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN112328532A true CN112328532A (en) 2021-02-05
CN112328532B CN112328532B (en) 2024-02-09

Family

ID=74324011

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011202092.3A Active CN112328532B (en) 2020-11-02 2020-11-02 Method and device for multi-GPU communication, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN112328532B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051212A (en) * 2021-03-02 2021-06-29 长沙景嘉微电子股份有限公司 Graphics processor, data transmission method, data transmission device, electronic device, and storage medium

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101025821A (en) * 2006-02-21 2007-08-29 辉达公司 Asymmetric multi-GPU processing
US20080068389A1 (en) * 2003-11-19 2008-03-20 Reuven Bakalash Multi-mode parallel graphics rendering system (MMPGRS) embodied within a host computing system and employing the profiling of scenes in graphics-based applications
US20080129747A1 (en) * 2003-11-19 2008-06-05 Reuven Bakalash Multi-mode parallel graphics rendering system employing real-time automatic scene profiling and mode control
US20090201303A1 (en) * 2007-11-23 2009-08-13 Mercury Computer Systems, Inc. Multi-user multi-gpu render server apparatus and methods
US20120001905A1 (en) * 2010-06-30 2012-01-05 Ati Technologies, Ulc Seamless Integration of Multi-GPU Rendering
TW201432566A (en) * 2013-02-04 2014-08-16 Hon Hai Prec Ind Co Ltd Expansion card of graphic processing unit and expanding method
CN107027042A (en) * 2017-04-19 2017-08-08 中国电子科技集团公司电子科学研究院 A kind of panorama live video stream processing method and processing device based on many GPU
CN107122244A (en) * 2017-04-25 2017-09-01 华中科技大学 A kind of diagram data processing system and method based on many GPU
CN108229687A (en) * 2016-12-14 2018-06-29 腾讯科技(深圳)有限公司 Data processing method, data processing equipment and electronic equipment
CN109255439A (en) * 2017-07-12 2019-01-22 北京图森未来科技有限公司 A kind of DNN model training method and device that multiple GPU are parallel
CN109408449A (en) * 2017-08-15 2019-03-01 Arm有限公司 Data processing system
CN110717853A (en) * 2019-12-12 2020-01-21 武汉精立电子技术有限公司 Optical image processing system based on embedded GPU
CN110716805A (en) * 2019-09-27 2020-01-21 上海依图网络科技有限公司 Task allocation method and device of graphic processor, electronic equipment and storage medium
CN110874811A (en) * 2018-08-29 2020-03-10 英特尔公司 Position-based rendering apparatus and method for multi-die/GPU graphics processing
CN111045623A (en) * 2019-11-21 2020-04-21 中国航空工业集团公司西安航空计算技术研究所 Method for processing graphics commands in multi-GPU (graphics processing Unit) splicing environment
CN111289975A (en) * 2020-01-21 2020-06-16 博微太赫兹信息科技有限公司 Rapid imaging processing system for multi-GPU parallel computing

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080068389A1 (en) * 2003-11-19 2008-03-20 Reuven Bakalash Multi-mode parallel graphics rendering system (MMPGRS) embodied within a host computing system and employing the profiling of scenes in graphics-based applications
US20080129747A1 (en) * 2003-11-19 2008-06-05 Reuven Bakalash Multi-mode parallel graphics rendering system employing real-time automatic scene profiling and mode control
CN101025821A (en) * 2006-02-21 2007-08-29 辉达公司 Asymmetric multi-GPU processing
US20090201303A1 (en) * 2007-11-23 2009-08-13 Mercury Computer Systems, Inc. Multi-user multi-gpu render server apparatus and methods
US20120001905A1 (en) * 2010-06-30 2012-01-05 Ati Technologies, Ulc Seamless Integration of Multi-GPU Rendering
TW201432566A (en) * 2013-02-04 2014-08-16 Hon Hai Prec Ind Co Ltd Expansion card of graphic processing unit and expanding method
CN108229687A (en) * 2016-12-14 2018-06-29 腾讯科技(深圳)有限公司 Data processing method, data processing equipment and electronic equipment
CN107027042A (en) * 2017-04-19 2017-08-08 中国电子科技集团公司电子科学研究院 A kind of panorama live video stream processing method and processing device based on many GPU
CN107122244A (en) * 2017-04-25 2017-09-01 华中科技大学 A kind of diagram data processing system and method based on many GPU
CN109255439A (en) * 2017-07-12 2019-01-22 北京图森未来科技有限公司 A kind of DNN model training method and device that multiple GPU are parallel
CN109408449A (en) * 2017-08-15 2019-03-01 Arm有限公司 Data processing system
CN110874811A (en) * 2018-08-29 2020-03-10 英特尔公司 Position-based rendering apparatus and method for multi-die/GPU graphics processing
CN110716805A (en) * 2019-09-27 2020-01-21 上海依图网络科技有限公司 Task allocation method and device of graphic processor, electronic equipment and storage medium
CN111045623A (en) * 2019-11-21 2020-04-21 中国航空工业集团公司西安航空计算技术研究所 Method for processing graphics commands in multi-GPU (graphics processing Unit) splicing environment
CN110717853A (en) * 2019-12-12 2020-01-21 武汉精立电子技术有限公司 Optical image processing system based on embedded GPU
CN111289975A (en) * 2020-01-21 2020-06-16 博微太赫兹信息科技有限公司 Rapid imaging processing system for multi-GPU parallel computing

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051212A (en) * 2021-03-02 2021-06-29 长沙景嘉微电子股份有限公司 Graphics processor, data transmission method, data transmission device, electronic device, and storage medium
CN113051212B (en) * 2021-03-02 2023-12-05 长沙景嘉微电子股份有限公司 Graphics processor, data transmission method, data transmission device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112328532B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
US10305823B2 (en) Network interface card configuration method and resource management center
CN112671830B (en) Resource scheduling method, system, device, computer equipment and storage medium
US11256749B2 (en) Graph data processing method and apparatus, and system
CN107343045B (en) Cloud computing system and cloud computing method and device for controlling server
US9734546B2 (en) Split driver to control multiple graphics processors in a computer system
US20200257972A1 (en) Method and apparatus for determining memory requirement in a network
US20180018197A1 (en) Virtual Machine Resource Allocation Method and Apparatus
KR102163402B1 (en) System for executing distributed deep learning using multi node and multi graphics processing unit and method thereof
CN112784989B (en) Inference system, inference method, electronic device, and computer storage medium
US11928493B2 (en) Sharing of FPGA board by multiple virtual machines
US20130207983A1 (en) Central processing unit, gpu simulation method thereof, and computing system including the same
CN110223216A (en) A kind of data processing method based on parallel PLB, device and computer storage medium
CN113849312A (en) Data processing task allocation method and device, electronic equipment and storage medium
US20130176323A1 (en) Method and apparatus for graphic processing using multi-threading
CN112328532B (en) Method and device for multi-GPU communication, storage medium and electronic device
US20170018052A1 (en) Method and Apparatus for Data Communication in Virtualized Environment, and Processor
JPH0962639A (en) Inter-processor communication method of parallel computer
CN108667750B (en) Virtual resource management method and device
CN117130571A (en) Display method, device, chip and storage medium based on multi-core heterogeneous system
CN116860391A (en) GPU computing power resource scheduling method, device, equipment and medium
CN114998500B (en) Rendering system and method based on SOC platform
CN115309539A (en) Video memory allocation method and system and non-transitory storage medium
CN105549911B (en) The data transmission method and device of NVRAM
CN114943885A (en) Synchronous cache acceleration method and system based on training task
CN111813541B (en) Task scheduling method, device, medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant