CN112328532B - Method and device for multi-GPU communication, storage medium and electronic device - Google Patents

Method and device for multi-GPU communication, storage medium and electronic device Download PDF

Info

Publication number
CN112328532B
CN112328532B CN202011202092.3A CN202011202092A CN112328532B CN 112328532 B CN112328532 B CN 112328532B CN 202011202092 A CN202011202092 A CN 202011202092A CN 112328532 B CN112328532 B CN 112328532B
Authority
CN
China
Prior art keywords
gpu
image processing
slave
gpus
master
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011202092.3A
Other languages
Chinese (zh)
Other versions
CN112328532A (en
Inventor
龙斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha Jingmei Integrated Circuit Design Co ltd
Changsha Jingjia Microelectronics Co ltd
Original Assignee
Changsha Jingmei Integrated Circuit Design Co ltd
Changsha Jingjia Microelectronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha Jingmei Integrated Circuit Design Co ltd, Changsha Jingjia Microelectronics Co ltd filed Critical Changsha Jingmei Integrated Circuit Design Co ltd
Priority to CN202011202092.3A priority Critical patent/CN112328532B/en
Publication of CN112328532A publication Critical patent/CN112328532A/en
Application granted granted Critical
Publication of CN112328532B publication Critical patent/CN112328532B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the application provides a method and a device for multi-GPU communication, a storage medium and an electronic device, wherein the method is used for carrying out communication between a master GPU and a plurality of slave GPUs, and the master GPU and each slave GPU carry out one-to-one communication through a high-speed interconnection bus, and comprises the steps of receiving an image processing task; dividing the image processing task to obtain a plurality of image processing subtasks; distributing the plurality of image processing subtasks according to the number of the secondary GPUs; receiving a processing result of the image processing task from the GPU; and merging the processing results and then splicing to obtain a target image. By adopting the scheme in the application, the GPU is realized to cooperatively complete complex drawing and calculation tasks.

Description

Method and device for multi-GPU communication, storage medium and electronic device
Technical Field
The present application relates to computer image processing technology, and in particular, to a method and apparatus for multiple GPU communication, a storage medium, and an electronic apparatus.
Background
In the field of image processing, a plurality of GPUs are generally used to cooperate with each other to perform a part of the task of image processing.
As image processing tasks become more complex, the overall performance of GPUs tends to decrease.
Aiming at the problem that the overall performance of the GPU is not high when complex image processing tasks are processed in the related art, no effective solution exists at present.
Disclosure of Invention
The embodiment of the application provides a method and a device for multi-GPU communication, a storage medium and an electronic device, which are used for at least solving the problem that the overall performance of the GPU is not high when complex image processing tasks are processed in the related art.
According to a first aspect of an embodiment of the present application, there is provided a method for communicating between a master GPU and a plurality of slave GPUs, the master GPU and each of the slave GPUs communicating one-to-one via a high-speed interconnect bus, the method comprising: receiving an image processing task; dividing the image processing task to obtain a plurality of image processing subtasks; distributing the plurality of image processing subtasks according to the number of the secondary GPUs; receiving a processing result of the image processing task from the GPU; and merging the processing results and then splicing to obtain a target image.
According to a second aspect of the embodiments of the present application, there is provided a method for multi-GPU communication, for communicating between a master GPU and a plurality of slave GPUs, the master GPU and each of the slave GPUs communicating one-to-one via a high-speed interconnect bus, the method comprising: receiving a plurality of image processing subtasks obtained by dividing an image processing task by the main GPU; processing the plurality of image processing subtasks in a preset mode; and sending the processing result of the image processing task to the main GPU.
According to a third aspect of embodiments of the present application, there is provided an apparatus for multi-GPU communication, for communicating between a master GPU and a plurality of slave GPUs, the master GPU and each of the slave GPUs communicating one-to-one via a high-speed interconnect bus, the apparatus comprising: the first receiving module is used for receiving the image processing task; the segmentation module is used for segmenting the image processing task to obtain a plurality of image processing subtasks; the distribution module is used for distributing the plurality of image processing subtasks according to the number of the secondary GPUs; the first receiving module is further configured to receive a processing result of the slave GPU on the image processing task; and the splicing module is used for splicing the processing results after merging to obtain a target image.
According to a fourth aspect of embodiments of the present application, there is provided an apparatus for multi-GPU communication for communicating between a master GPU and a plurality of slave GPUs, the master GPU and each of the slave GPUs communicating one-to-one via a high-speed interconnect bus, the apparatus comprising: the second receiving module is used for receiving a plurality of image processing subtasks obtained by dividing the image processing tasks by the main GPU; the processing module is used for processing the plurality of image processing subtasks according to a preset mode; and the sending module is used for sending the processing result of the image processing task to the main GPU.
According to a fifth aspect of embodiments of the present application, there is provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
According to a sixth aspect of embodiments of the present application, there is provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
By adopting the method and the device for multi-GPU communication, the storage medium and the electronic device provided by the embodiment of the application, the automatic drawing task cutting is completed by receiving the drawing tasks sent by the upper layer system at the master GPU, partial drawing tasks are distributed to the plurality of slave GPUs, the loads of the plurality of slave GPUs are managed at the same time, and after the drawing of the slave GPUs is completed, the drawing results are received, combined and spliced to form the whole image output. The method and the device achieve the aim of cooperatively completing complex drawing and calculation tasks by the GPU, and achieve the technical effect of improving the overall performance of the GPU.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a schematic diagram of a system structure according to an embodiment of the present application;
FIG. 2 is a flow chart of a method of multi-GPU communication in an embodiment of the present application;
FIG. 3 is a schematic diagram of a device structure for multiple GPU communication in an embodiment of the present application;
FIG. 4 is a flow chart of a method of multi-GPU communication in an embodiment of the present application;
FIG. 5 is a schematic diagram of a device structure for multiple GPU communication in an embodiment of the present application;
FIG. 6 is a schematic diagram illustrating an image processing task uniformly distributed according to a drawing area in an embodiment of the present application;
fig. 7 is a schematic diagram of non-uniform distribution of image processing tasks according to a drawing area in an embodiment of the present application.
Detailed Description
In the process of implementing the application, the inventor finds that graphics are drawn by a plurality of GPUs in image processing and then displayed, however, the GPUs cannot cooperate in parallel to process tasks, so that the overall performance of the GPUs is not high.
In view of the foregoing, in an embodiment of the present application, a method for communicating between a master GPU and a plurality of slave GPUs, where the master GPU and each of the slave GPUs perform one-to-one communication through a high-speed interconnection bus is provided, the method includes: receiving an image processing task; dividing the image processing task to obtain a plurality of image processing subtasks; distributing the plurality of image processing subtasks according to the number of the secondary GPUs; receiving a processing result of the image processing task from the GPU; and merging the processing results and then splicing to obtain a target image. In the application, a scheme of interconnecting and cooperatively working a plurality of GPUs is adopted, so that the aim of cooperatively completing complex drawing and calculation tasks is fulfilled.
In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of exemplary embodiments of the present application is given with reference to the accompanying drawings, and it is apparent that the described embodiments are only some of the embodiments of the present application and not exhaustive of all the embodiments. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other.
Example 1
The method embodiment provided in the first embodiment of the present application may be performed in a multi-GPU system or a system with a similar architecture. Taking a system running on multiple GPUs as an example, fig. 1 is a schematic structural diagram of a GPU system implemented by a method for multiple GPU communication according to an embodiment of the present invention. Shown in fig. 1 is a GPU system architecture of a master GPU and four slave GPUs. And the master GPU and the slave GPU are in one-to-one communication through a high-speed interconnection bus. And the master GPU receives a drawing task sent by an upper PCIE bus, transmits the drawing task to the other three slave GPUs through a high-speed interconnection bus, and the three slave GPUs are also communicated with each other through the high-speed interconnection bus, so that data required in the drawing process can be shared between the master GPU and the slave GPUs through a shared RAM and a shared Cache.
In this embodiment, a method for communicating between a master GPU and a plurality of slave GPUs is provided, where the master GPU and each of the slave GPUs communicate one-to-one through a high-speed interconnection bus, as shown in fig. 2, and the process includes the following steps:
step S201, receiving an image processing task;
step S202, dividing the image processing task to obtain a plurality of image processing subtasks;
step S203, distributing the plurality of image processing subtasks according to the number of the secondary GPUs;
step S204, receiving a processing result of the image processing task from the GPU;
and step S205, combining the processing results and then splicing to obtain a target image.
According to the multi-GPU communication method, the master GPU receives the drawing tasks sent by the upper layer system, automatic drawing task cutting is completed, partial drawing tasks are distributed to the plurality of slave GPUs, loads of the plurality of slave GPUs are managed at the same time, and when the slave GPUs draw, drawing results are received, combined and spliced to form a whole image to be output. The method and the device achieve the aim of cooperatively completing complex drawing and calculation tasks by the GPU, and achieve the technical effect of improving the overall performance of the GPU.
The image processing task is received by the master GPU in step S201.
In one embodiment, the master GPU receives an image processing task issued by a CPU that is an upper system.
In a preferred embodiment, the image processing task comprises: drawing tasks.
In step S202, the main GPU divides the image processing task to obtain a plurality of image processing subtasks.
In a specific embodiment, the master GPU performs segmentation according to the drawing tasks sent by the upper layer system, and distributes the drawing tasks to a plurality of slave GPUs uniformly according to the number of the slave GPUs.
In a preferred embodiment, the transmission of the drawing tasks when uniformly distributed to the plurality of slave GPUs is through a high-speed interconnect bus.
In the step S203, the master GPU allocates the plurality of image processing subtasks according to the number of the slave GPUs.
In one embodiment, the master GPU receives upper system drawing tasks, packages the drawing data required by the slave GPU, and sends the data packets to the slave GPU.
In a preferred embodiment, the slave GPU parses the data packet after receiving the data packet, and stores the data in a designated location. And after the receiving is finished, sending a response information packet to the main GPU.
In the step S204, the master GPU receives the processing result of the image processing task from the slave GPU, and also performs the processing of the image task in the master GPU.
In a preferred embodiment, the image processing task comprises: drawing tasks.
In one embodiment, the master GPU sends the relevant drawing commands to the slave GPU for processing.
In a preferred embodiment, the drawing task is automatically executed and completed in the slave GPU according to the drawing command and the drawing data, and a drawing completion signal is sent to the master GPU after the drawing task is completed.
In step S205, the master GPU merges the processing results and then splices the processing results to obtain a target image.
In a specific embodiment, the processing result is combined with the image processing result of the main GPU to obtain the target image.
In a preferred embodiment, after receiving all the drawing results sent by the slave GPUs, the master GPU splices the drawing results with the drawing results to form a complete image and outputs the complete image to a display port.
In a preferred embodiment, the master GPU continuously and repeatedly sends different drawing data and drawing commands to the slave GPU for multiple times, so as to drive the slave GPU to draw different graphics in the same sub-picture. From the aspect of display effect, namely, one graph is drawn in the same picture, and then other graphs are drawn, so that the effects of overlapping, perspective and the like of multiple graphs are realized.
As an optional implementation manner of the present application, the assigning the plurality of image processing subtasks according to the number of the slave GPUs includes: sending a drawing data packet in the image processing subtasks to be distributed by the secondary GPU to the secondary GPU; and receiving an information response packet under the condition that the slave GPU finishes receiving the drawing data packet.
In particular implementations, the master GPU receives upper system drawing tasks, packages drawing data required by the slave GPU, and sends the data packages to the slave GPU. After the slave GPU receives the data packet, the slave GPU analyzes the data packet, stores the data in a designated position, and after the data packet is received, sends a response information packet to the master GPU and executes a work task.
As an optional implementation manner of the present application, the distributing the plurality of image processing subtasks according to the number of the slave GPUs further includes: sending an image processing task command to the slave GPU; receiving a completion response packet of the image processing task under the condition that the slave GPU executes the completion of the image processing task according to the image processing task command and the drawing data packet; sending an image processing task result permission receiving request to the slave GPU; and receiving a processing result of the image processing task according to the permission receiving request.
In implementation, the master GPU sends a drawing command to the slave GPU, and the slave GPU automatically executes and completes drawing tasks according to the drawing command and drawing data. And after the drawing task is completed, sending a drawing completion signal to the master GPU.
As an optional implementation manner of the present application, the distributing the plurality of image processing subtasks according to the number of the slave GPUs further includes: dividing a drawing area according to the coordinates of the target image and the number of the secondary GPUs and the main GPU to obtain a target area; and sending the drawing data packets and the image processing task commands belonging to different drawing areas to the corresponding slave GPU.
In implementation, the master GPU sends a drawing result permission receiving request to the slave GPU, and the slave GPU sends a drawing result to the master GPU.
As an optional implementation manner of the present application, the one-to-one communication between the master GPU and each of the slave GPUs through the high-speed interconnection bus includes: one-to-one communication is carried out between each slave GPU through a high-speed interconnection bus; and/or sharing data required in the process of executing the image processing task between the master GPU and each slave GPU through a shared RAM; and/or, sharing data required in the process of executing the image processing task between the master GPU and each slave GPU through a shared Cache.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
Example two
In this embodiment, a method for communicating between a master GPU and a plurality of slave GPUs is provided, where the master GPU and each of the slave GPUs communicate one-to-one through a high-speed interconnection bus, as shown in fig. 4, and the process includes the following steps:
step S401, receiving a plurality of image processing subtasks obtained by dividing an image processing task by the main GPU;
step S402, processing the plurality of image processing subtasks according to a preset mode;
step S403, sending a processing result of the image processing task to the master GPU.
According to the multi-GPU communication method, the master GPU receives the drawing tasks sent by the upper layer system, automatic drawing task cutting is completed, partial drawing tasks are distributed to the plurality of slave GPUs, loads of the plurality of slave GPUs are managed at the same time, and when the slave GPUs draw, drawing results are received, combined and spliced to form a whole image to be output. The method and the device achieve the aim of cooperatively completing complex drawing and calculation tasks, and achieve the technical effect of improving the overall performance of the GPU.
In the step S401, a plurality of image processing subtasks obtained by dividing the image processing task by the master GPU are received from the plurality of slave GPUs.
In a specific embodiment, the master GPU performs segmentation according to the drawing tasks sent by the upper layer system, and distributes the drawing tasks to a plurality of slave GPUs uniformly according to the number of the slave GPUs.
In a preferred embodiment, the transmission of the drawing tasks when uniformly distributed to the plurality of slave GPUs is through a high-speed interconnect bus.
The plurality of image processing subtasks are processed in the plurality of slave GPUs in a preset manner in the step S402.
In one embodiment, the master GPU receives upper system drawing tasks, packages the drawing data required by the slave GPU, and sends the data packets to the slave GPU.
In a preferred embodiment, the slave GPU parses the data packet after receiving the data packet, and stores the data in a designated location. And after the receiving is finished, sending a response information packet to the main GPU.
And in the step S403, the processing result of the image processing task is sent to the master GPU at the slave GPU.
In a preferred embodiment, the image processing task comprises: drawing tasks.
In one embodiment, the master GPU sends the relevant drawing commands to the slave GPU for processing.
In a preferred embodiment, the drawing task is automatically executed and completed in the slave GPU according to the drawing command and the drawing data, and a drawing completion signal is sent to the master GPU after the drawing task is completed.
In a preferred embodiment, after receiving all the drawing results sent by the slave GPUs, the master GPU splices the drawing results with the drawing results to form a complete image and outputs the complete image to a display port.
In a preferred embodiment, the master GPU continuously and repeatedly sends different drawing data and drawing commands to the slave GPU for multiple times, so as to drive the slave GPU to draw different graphics in the same sub-picture. From the aspect of display effect, namely, one graph is drawn in the same picture, and then other graphs are drawn, so that the effects of overlapping, perspective and the like of multiple graphs are realized.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
Example III
The embodiment also provides a device for multiple GPUs communication, which is used for implementing the above embodiment and the preferred implementation, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
FIG. 3 is a block diagram of an apparatus for multi-GPU communication for communication between a master GPU and a plurality of slave GPUs, the master GPU and each of the slave GPUs in one-to-one communication via a high speed interconnect bus, according to an embodiment of the present invention, as shown in FIG. 3, the apparatus comprising:
a first receiving module 30 for receiving an image processing task;
a segmentation module 31, configured to segment the image processing task to obtain a plurality of image processing subtasks;
an allocation module 32, configured to allocate the plurality of image processing subtasks according to the number of the slave GPUs;
the first receiving module 30 is further configured to receive a processing result of the image processing task by the slave GPU;
and the stitching module 33 is configured to stitch the processing results after merging to obtain a target image.
The image processing task is received by the master GPU in the first receiving module 30.
In one embodiment, the master GPU receives an image processing task issued by a CPU that is an upper system.
In a preferred embodiment, the image processing task comprises: drawing tasks.
The main GPU in the segmentation module 31 segments the image processing task to obtain a plurality of image processing subtasks.
In a specific embodiment, the master GPU performs segmentation according to the drawing tasks sent by the upper layer system, and distributes the drawing tasks to a plurality of slave GPUs uniformly according to the number of the slave GPUs.
In a preferred embodiment, the transmission of the drawing tasks when uniformly distributed to the plurality of slave GPUs is through a high-speed interconnect bus.
The master GPU in the allocation module 32 allocates the plurality of image processing subtasks according to the number of the slave GPUs.
In one embodiment, the master GPU receives upper system drawing tasks, packages the drawing data required by the slave GPU, and sends the data packets to the slave GPU.
In a preferred embodiment, the slave GPU parses the data packet after receiving the data packet, and stores the data in a designated location. And after the receiving is finished, sending a response information packet to the main GPU.
The primary GPU in the first receiving module 30 receives the processing result of the secondary GPU on the image processing task, and the primary GPU also performs processing on the image task.
In a preferred embodiment, the image processing task comprises: drawing tasks.
In one embodiment, the master GPU sends the relevant drawing commands to the slave GPU for processing.
In a preferred embodiment, the drawing task is automatically executed and completed in the slave GPU according to the drawing command and the drawing data, and a drawing completion signal is sent to the master GPU after the drawing task is completed.
The main GPU in the stitching module 33 merges the processing results and then stitches the processing results to obtain a target image.
In a specific embodiment, the processing result is combined with the image processing result of the main GPU to obtain the target image.
In a preferred embodiment, after receiving all the drawing results sent by the slave GPUs, the master GPU splices the drawing results with the drawing results to form a complete image and outputs the complete image to a display port.
In a preferred embodiment, the master GPU continuously and repeatedly sends different drawing data and drawing commands to the slave GPU for multiple times, so as to drive the slave GPU to draw different graphics in the same sub-picture. From the aspect of display effect, namely, one graph is drawn in the same picture, and then other graphs are drawn, so that the effects of overlapping, perspective and the like of multiple graphs are realized.
Example IV
The embodiment also provides a device for multiple GPUs communication, which is used for implementing the above embodiment and the preferred implementation, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
FIG. 5 is a block diagram of an apparatus for multi-GPU communication for communication between a master GPU and a plurality of slave GPUs, the master GPU and each of the slave GPUs in one-to-one communication via a high speed interconnect bus, according to an embodiment of the present invention, as shown in FIG. 5, the apparatus comprising:
the second receiving module 50 is configured to receive a plurality of image processing subtasks obtained by dividing an image processing task by the master GPU;
a processing module 51, configured to process the plurality of image processing subtasks in a preset manner;
and the sending module 52 is configured to send a processing result of the image processing task to the master GPU.
The second receiving module 50 receives, from the multiple slave GPUs, multiple image processing subtasks obtained by dividing the image processing task by the master GPU.
In a specific embodiment, the master GPU performs segmentation according to the drawing tasks sent by the upper layer system, and distributes the drawing tasks to a plurality of slave GPUs uniformly according to the number of the slave GPUs.
In a preferred embodiment, the transmission of the drawing tasks when uniformly distributed to the plurality of slave GPUs is through a high-speed interconnect bus.
The processing module 51 processes the plurality of image processing subtasks in the plurality of slave GPUs according to a preset manner.
In one embodiment, the master GPU receives upper system drawing tasks, packages the drawing data required by the slave GPU, and sends the data packets to the slave GPU.
In a preferred embodiment, the slave GPU parses the data packet after receiving the data packet, and stores the data in a designated location. And after the receiving is finished, sending a response information packet to the main GPU.
The sending module 52 sends the processing result of the image processing task to the master GPU at the slave GPU.
In a preferred embodiment, the image processing task comprises: drawing tasks.
In one embodiment, the master GPU sends the relevant drawing commands to the slave GPU for processing.
In a preferred embodiment, the drawing task is automatically executed and completed in the slave GPU according to the drawing command and the drawing data, and a drawing completion signal is sent to the master GPU after the drawing task is completed.
In a preferred embodiment, after receiving all the drawing results sent by the slave GPUs, the master GPU splices the drawing results with the drawing results to form a complete image and outputs the complete image to a display port.
In a preferred embodiment, the master GPU continuously and repeatedly sends different drawing data and drawing commands to the slave GPU for multiple times, so as to drive the slave GPU to draw different graphics in the same sub-picture. From the aspect of display effect, namely, one graph is drawn in the same picture, and then other graphs are drawn, so that the effects of overlapping, perspective and the like of multiple graphs are realized.
In order to better understand the above-mentioned method flow of multiple GPU communication, the following description is given with reference to the preferred embodiments, but the technical solution of the embodiments of the present invention is not limited thereto.
In the application, a master GPU packages working data to be executed by a slave GPU and directly sends the data package to the slave GPU through a high-speed inter-chip interconnection bus; analyzing the data packet received by the GPU and executing a work task; and after the task is executed, the slave GPU packages the task result data and returns the task result data to the master GPU through the high-speed inter-chip interconnection bus.
Taking an image processing task as an example of a drawing task, the GPU system includes a master GPU and three slave GPUs for detailed description.
The master GPU is used for dividing according to the upper layer sending drawing tasks, the drawing tasks are uniformly distributed to a plurality of slave GPUs according to the number of the slave GPUs, and the transmission process adopts a high-speed interconnection bus, and specifically comprises the following steps:
step S1, a master GPU receives an upper system drawing task, packages drawing data required by a slave GPU and sends the data package to the slave GPU;
s2, analyzing the data packet received by the GPU and storing the data in a designated position; after the receiving is finished, sending a response information packet to the main GPU and executing a work task;
step S3, the master GPU sends a drawing command to a command packet of the slave GPU;
step S4, the slave GPU automatically executes and completes the drawing task according to the drawing command and the drawing data, and after the drawing task is completed, a response packet of a drawing completion signal is sent to the master GPU;
step S5, the master GPU sends a data packet allowing the drawing result to be received to the slave GPU;
step S6, the slave GPU sends a data packet of a drawing result to the master GPU;
and S7, after all the drawing results sent by the slave GPUs are completed, the master GPU splices the drawing results into a complete image and outputs the complete image to the display port.
In specific implementation, the steps S2 to S6 may be irregular and repeated in multiple cycles, that is, the phenomenon that the master GPU continuously and repeatedly sends different drawing data and drawing commands to the slave GPU multiple times is possible, so as to drive the slave GPU to draw different graphics in the same sub-frame. Namely, one graph is drawn in the same picture, and then other graphs are drawn, so that the effects of overlapping, perspective and the like of multiple graphs are realized.
The drawing data packet format sequentially comprises: a header as a start delimiter, a packet attribute field, a destination address, a data length, data, a check code, a trailer as an end delimiter.
The packet header is a series of consecutive specific characters, and the characters do not appear continuously in the data packet, so as to avoid parsing errors during data packet receiving.
The Bao Shuxing field indicates the format of the data in the current data packet, which may be command, data, and response.
The command is used for indicating that the current data packet is a drawing command, and the slave equipment needs to automatically analyze and complete the drawing command.
The data is used for indicating the received GPU, the current transmission is drawing data or drawing results, and the receiving equipment only needs to analyze the data packet and then store the data packet to the designated destination address.
The response means that the data in the current data packet is: the current receiving device response information specifically includes: the reception completion and verification information is indicative of the current reception completion from the GPU. And when the data check received from the device fails, the transmitting device needs to decide whether to retransmit or not.
The permission to send refers to permission of the current device to receive data.
The status information refers to status information of the current receiving device.
The destination address refers to a destination address where data in the current data packet needs to be stored.
The data length refers to the number of real data in the current data packet,
The data refers to data bits in a data packet, and the specific meaning is analyzed according to a packet attribute field;
the check code refers to that the data is inevitably error-free in the data transmission process due to the adoption of an external interconnection bus, and the data must be checked.
The tail refers to continuous specific characters similar to the header, and marks the end of the data packet.
As shown in fig. 6, the master GPU divides the drawing area into four areas according to the number of GPUs according to the image coordinates, sends drawing data and drawing commands belonging to different drawing areas to a designated slave GPU, and sends drawing results of management areas to the master GPU after a certain slave GPU completes drawing tasks; and collecting the drawing results of all the slave GPUs by the master GPU, splicing the drawing results with the drawing results of the own GPU, recovering the drawing results into images required by the upper system, and displaying and outputting the images.
As shown in fig. 7, the drawing task is a case of being allocated by image density and complexity. Because the actual drawing tasks are not uniformly distributed in the image area, in some cases, most of the drawing tasks may be concentrated in a specific area, and at this time, the master GPU needs to wait for all the slave GPUs and the self GPUs to draw, so as to collect and splice the final display images. The longest time at this time, the drawing performance, is determined by the most complex area. At this time, the master GPU will reasonably arrange the drawing work of each slave GPU according to the drawing complexity. As shown in fig. 7, when the drawing tasks are all concentrated in the upper half area of the image, the master GPU will autonomously allocate three GPUs to simultaneously draw the upper half area, so as to balance the drawing load of each GPU as a whole, thereby achieving the purpose of improving the overall performance.
An embodiment of the invention also provides a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:
s1, receiving an image processing task;
s2, dividing the image processing task to obtain a plurality of image processing subtasks;
s3, distributing the plurality of image processing subtasks according to the number of the secondary GPUs;
s4, receiving a processing result of the slave GPU on the image processing task;
and S5, merging the processing results and then splicing to obtain a target image.
And/or the number of the groups of groups,
s1, receiving a plurality of image processing subtasks obtained by dividing an image processing task by the main GPU;
s2, processing the plurality of image processing subtasks according to a preset mode;
and S3, sending the processing result of the image processing task to the main GPU.
Optionally, the storage medium is further arranged to store a computer program for performing the steps of:
s31, sending a drawing data packet in the image processing subtasks to be distributed by the slave GPU to the slave GPU;
s32, receiving an information response packet when the slave GPU finishes receiving the drawing data packet.
Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.
An embodiment of the invention also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
s1, receiving an image processing task;
s2, dividing the image processing task to obtain a plurality of image processing subtasks;
s3, distributing the plurality of image processing subtasks according to the number of the secondary GPUs;
s4, receiving a processing result of the slave GPU on the image processing task;
and S5, merging the processing results and then splicing to obtain a target image.
And/or the number of the groups of groups,
s1, receiving a plurality of image processing subtasks obtained by dividing an image processing task by the main GPU;
s2, processing the plurality of image processing subtasks according to a preset mode;
and S3, sending the processing result of the image processing task to the main GPU.
Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments and optional implementations, and this embodiment is not described herein.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (6)

1. A method of multi-GPU communication for communicating between a master GPU and a plurality of slave GPUs, the master GPU and each of the slave GPUs in one-to-one communication via a high speed interconnect bus, the method comprising:
receiving an image processing task;
dividing the image processing task to obtain a plurality of image processing subtasks;
distributing the plurality of image processing subtasks according to the number of the secondary GPUs;
receiving a processing result of the image processing task from the GPU;
combining the processing results and then splicing to obtain a target image;
the assigning the plurality of image processing subtasks according to the number of the slave GPUs includes:
sending a drawing data packet in the image processing subtasks to be distributed by the secondary GPU to the secondary GPU;
receiving an information response packet under the condition that the slave GPU finishes receiving the drawing data packet;
the master GPU continuously and repeatedly sends different drawing data and drawing commands to the slave GPU for a plurality of times so as to drive the slave GPU to draw different graphics in the same picture;
the one-to-one communication between the master GPU and each of the slave GPUs through the high-speed interconnection bus includes:
one-to-one communication is carried out between each slave GPU through a high-speed interconnection bus;
and/or sharing data required in the process of executing the image processing task between the master GPU and each slave GPU through a shared RAM;
and/or, sharing data required in the process of executing the image processing task between the master GPU and each slave GPU through a shared Cache.
2. The method of claim 1, wherein said assigning said plurality of image processing sub-tasks according to said number of slave GPUs further comprises:
sending an image processing task command to the slave GPU;
receiving a completion response packet of the image processing task under the condition that the slave GPU executes the completion of the image processing task according to the image processing task command and the drawing data packet;
sending an image processing task result permission receiving request to the slave GPU;
and receiving a processing result of the image processing task according to the permission receiving request.
3. The method of claim 1, wherein said assigning said plurality of image processing sub-tasks according to said number of slave GPUs further comprises:
dividing a drawing area according to the coordinates of the target image and the number of the secondary GPUs and the main GPU to obtain a target area;
and sending the drawing data packets and the image processing task commands belonging to different drawing areas to the corresponding slave GPU.
4. The method of claim 1, wherein the merging the processing results and stitching to obtain the target image includes:
and merging the processing result with the image processing result of the main GPU to obtain a target image.
5. A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the method of any of claims 1 to 4 when run.
6. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the method of any of the claims 1 to 4.
CN202011202092.3A 2020-11-02 2020-11-02 Method and device for multi-GPU communication, storage medium and electronic device Active CN112328532B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011202092.3A CN112328532B (en) 2020-11-02 2020-11-02 Method and device for multi-GPU communication, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011202092.3A CN112328532B (en) 2020-11-02 2020-11-02 Method and device for multi-GPU communication, storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN112328532A CN112328532A (en) 2021-02-05
CN112328532B true CN112328532B (en) 2024-02-09

Family

ID=74324011

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011202092.3A Active CN112328532B (en) 2020-11-02 2020-11-02 Method and device for multi-GPU communication, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN112328532B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051212B (en) * 2021-03-02 2023-12-05 长沙景嘉微电子股份有限公司 Graphics processor, data transmission method, data transmission device, electronic equipment and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101025821A (en) * 2006-02-21 2007-08-29 辉达公司 Asymmetric multi-GPU processing
TW201432566A (en) * 2013-02-04 2014-08-16 Hon Hai Prec Ind Co Ltd Expansion card of graphic processing unit and expanding method
CN107027042A (en) * 2017-04-19 2017-08-08 中国电子科技集团公司电子科学研究院 A kind of panorama live video stream processing method and processing device based on many GPU
CN107122244A (en) * 2017-04-25 2017-09-01 华中科技大学 A kind of diagram data processing system and method based on many GPU
CN108229687A (en) * 2016-12-14 2018-06-29 腾讯科技(深圳)有限公司 Data processing method, data processing equipment and electronic equipment
CN109255439A (en) * 2017-07-12 2019-01-22 北京图森未来科技有限公司 A kind of DNN model training method and device that multiple GPU are parallel
CN109408449A (en) * 2017-08-15 2019-03-01 Arm有限公司 Data processing system
CN110716805A (en) * 2019-09-27 2020-01-21 上海依图网络科技有限公司 Task allocation method and device of graphic processor, electronic equipment and storage medium
CN110717853A (en) * 2019-12-12 2020-01-21 武汉精立电子技术有限公司 Optical image processing system based on embedded GPU
CN110874811A (en) * 2018-08-29 2020-03-10 英特尔公司 Position-based rendering apparatus and method for multi-die/GPU graphics processing
CN111045623A (en) * 2019-11-21 2020-04-21 中国航空工业集团公司西安航空计算技术研究所 Method for processing graphics commands in multi-GPU (graphics processing Unit) splicing environment
CN111289975A (en) * 2020-01-21 2020-06-16 博微太赫兹信息科技有限公司 Rapid imaging processing system for multi-GPU parallel computing

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080094402A1 (en) * 2003-11-19 2008-04-24 Reuven Bakalash Computing system having a parallel graphics rendering system employing multiple graphics processing pipelines (GPPLS) dynamically controlled according to time, image and object division modes of parallel operation during the run-time of graphics-based applications running on the computing system
US20080211817A1 (en) * 2003-11-19 2008-09-04 Reuven Bakalash Internet-based application profile database server system for updating graphic application profiles (GAPS) stored within the multi-mode parallel graphics rendering system of client machines running one or more graphic applications
US8319781B2 (en) * 2007-11-23 2012-11-27 Pme Ip Australia Pty Ltd Multi-user multi-GPU render server apparatus and methods
US20120001905A1 (en) * 2010-06-30 2012-01-05 Ati Technologies, Ulc Seamless Integration of Multi-GPU Rendering

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101025821A (en) * 2006-02-21 2007-08-29 辉达公司 Asymmetric multi-GPU processing
TW201432566A (en) * 2013-02-04 2014-08-16 Hon Hai Prec Ind Co Ltd Expansion card of graphic processing unit and expanding method
CN108229687A (en) * 2016-12-14 2018-06-29 腾讯科技(深圳)有限公司 Data processing method, data processing equipment and electronic equipment
CN107027042A (en) * 2017-04-19 2017-08-08 中国电子科技集团公司电子科学研究院 A kind of panorama live video stream processing method and processing device based on many GPU
CN107122244A (en) * 2017-04-25 2017-09-01 华中科技大学 A kind of diagram data processing system and method based on many GPU
CN109255439A (en) * 2017-07-12 2019-01-22 北京图森未来科技有限公司 A kind of DNN model training method and device that multiple GPU are parallel
CN109408449A (en) * 2017-08-15 2019-03-01 Arm有限公司 Data processing system
CN110874811A (en) * 2018-08-29 2020-03-10 英特尔公司 Position-based rendering apparatus and method for multi-die/GPU graphics processing
CN110716805A (en) * 2019-09-27 2020-01-21 上海依图网络科技有限公司 Task allocation method and device of graphic processor, electronic equipment and storage medium
CN111045623A (en) * 2019-11-21 2020-04-21 中国航空工业集团公司西安航空计算技术研究所 Method for processing graphics commands in multi-GPU (graphics processing Unit) splicing environment
CN110717853A (en) * 2019-12-12 2020-01-21 武汉精立电子技术有限公司 Optical image processing system based on embedded GPU
CN111289975A (en) * 2020-01-21 2020-06-16 博微太赫兹信息科技有限公司 Rapid imaging processing system for multi-GPU parallel computing

Also Published As

Publication number Publication date
CN112328532A (en) 2021-02-05

Similar Documents

Publication Publication Date Title
US11256749B2 (en) Graph data processing method and apparatus, and system
US11593644B2 (en) Method and apparatus for determining memory requirement in a network
CN112784989B (en) Inference system, inference method, electronic device, and computer storage medium
CN105979007A (en) Acceleration resource processing method and device and network function virtualization system
EP3253027A1 (en) Resource allocation method and apparatus for virtual machines
US11983564B2 (en) Scheduling of a plurality of graphic processing units
CN111860853B (en) Online prediction system, device, method and electronic device
CN112328532B (en) Method and device for multi-GPU communication, storage medium and electronic device
CN106980571A (en) The construction method and equipment of a kind of test use cases
CN110223216A (en) A kind of data processing method based on parallel PLB, device and computer storage medium
CN109284180A (en) A kind of method for scheduling task, device, electronic equipment and storage medium
CN112905342A (en) Resource scheduling method, device, equipment and computer readable storage medium
CN107168777A (en) The dispatching method and device of resource in distributed system
CN109727376A (en) Generate the method, apparatus and selling apparatus of configuration file
CN110544159B (en) Map information processing method and device, readable storage medium and electronic equipment
CN116245997A (en) Three-dimensional model dynamic rendering parallel acceleration method and system based on supercomputer
CN114780230A (en) Memory allocation method, memory deployment method and related device
CN110064198A (en) Processing method and processing device, storage medium and the electronic device of resource
CN111813541B (en) Task scheduling method, device, medium and equipment
CN114998500B (en) Rendering system and method based on SOC platform
CN105549911B (en) The data transmission method and device of NVRAM
US20170090820A1 (en) Method and device for operating a many-core system
CN109062702B (en) Computing resource allocation method, related device and readable storage medium
CN112685172A (en) Data processing method and system
KR101989033B1 (en) Appratus for managing platform and method for using the same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant