CN112328532A

CN112328532A - Multi-GPU communication method and device, storage medium and electronic device

Info

Publication number: CN112328532A
Application number: CN202011202092.3A
Authority: CN
Inventors: 龙斌
Original assignee: Changsha Jingmei Integrated Circuit Design Co ltd; Changsha Jingjia Microelectronics Co ltd
Current assignee: Changsha Jingmei Integrated Circuit Design Co ltd; Changsha Jingjia Microelectronics Co ltd
Priority date: 2020-11-02
Filing date: 2020-11-02
Publication date: 2021-02-05
Anticipated expiration: 2040-11-02
Also published as: CN112328532B

Abstract

The embodiment of the application provides a method and a device for multi-GPU communication, a storage medium and an electronic device, wherein the method is used for communication between a master GPU and a plurality of slave GPUs, the master GPU and each slave GPU are in one-to-one communication through a high-speed interconnection bus, and the one-to-one communication comprises the steps of receiving image processing tasks; dividing the image processing task to obtain a plurality of image processing subtasks; distributing the plurality of image processing subtasks according to the number of the slave GPUs; receiving a processing result of the slave GPU on the image processing task; and merging the processing results and splicing to obtain a target image. By adopting the scheme in the application, the GPU is used for cooperatively finishing complex drawing and computing tasks.

Description

Multi-GPU communication method and device, storage medium and electronic device

Technical Field

The present application relates to computer image processing technologies, and in particular, to a method and apparatus for multi-GPU communication, a storage medium, and an electronic apparatus.

Background

In the field of image processing, multiple GPUs are commonly used to cooperate with each other to perform a part of image processing tasks.

As image processing tasks become more complex, the overall performance of the GPU is on the decline.

Aiming at the problem that the overall performance of a GPU is not high when complex image processing tasks are processed in the related art, an effective solution does not exist at present.

Disclosure of Invention

The embodiment of the application provides a multi-GPU communication method and device, a storage medium and an electronic device, so as to at least solve the problem that the overall performance of a GPU is not high when complex image processing tasks are processed in the related art.

According to a first aspect of embodiments of the present application, there is provided a multi-GPU communication method for performing communication between a master GPU and a plurality of slave GPUs, the master GPU performing one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, the method comprising: receiving an image processing task; dividing the image processing task to obtain a plurality of image processing subtasks; distributing the plurality of image processing subtasks according to the number of the slave GPUs; receiving a processing result of the slave GPU on the image processing task; and merging the processing results and splicing to obtain a target image.

According to a second aspect of the embodiments of the present application, there is provided a method for multi-GPU communication, for performing communication between a master GPU and a plurality of slave GPUs, the master GPU performing one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, the method comprising: receiving a plurality of image processing subtasks obtained by dividing the image processing task by the main GPU; processing the plurality of image processing subtasks according to a preset mode; and sending the processing result of the image processing task to the main GPU.

According to a third aspect of the embodiments of the present application, there is provided a multi-GPU communication apparatus for performing communication between a master GPU and a plurality of slave GPUs, the master GPU performing one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, the apparatus comprising: the first receiving module is used for receiving the image processing task; the segmentation module is used for segmenting the image processing task to obtain a plurality of image processing subtasks; the distribution module is used for distributing the image processing subtasks according to the number of the secondary GPUs; the first receiving module is further configured to receive a processing result of the slave GPU on the image processing task; and the splicing module is used for merging the processing results and splicing to obtain a target image.

According to a fourth aspect of the embodiments of the present application, there is provided a multi-GPU communication apparatus for performing communication between a master GPU and a plurality of slave GPUs, the master GPU performing one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, the apparatus comprising: the second receiving module is used for receiving a plurality of image processing subtasks obtained by dividing the image processing task by the main GPU; the processing module is used for processing the image processing subtasks according to a preset mode; and the sending module is used for sending the processing result of the image processing task to the main GPU.

According to a fifth aspect of embodiments of the present application, there is provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above-mentioned method embodiments when executed.

According to a sixth aspect of embodiments of the present application, there is provided an electronic apparatus, comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the steps of any of the above method embodiments.

By adopting the multi-GPU communication method and device, the storage medium and the electronic device, the main GPU receives the drawing tasks sent by the upper layer system, the automatic cutting of the drawing tasks is completed, partial drawing tasks are distributed to the multiple secondary GPUs, the loads of the multiple secondary GPUs are managed simultaneously, and after the secondary GPUs finish drawing, the drawing results are received, merged and spliced into a whole image to be output. The purpose that the GPU completes complex drawing and computing tasks in a coordinated mode is achieved, and the technical effect of improving the overall performance of the GPU is achieved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a schematic diagram of a system architecture in an embodiment of the present application;

FIG. 2 is a schematic flowchart of a method for multi-GPU communication according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of an apparatus for multi-GPU communication according to an embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating a method for multi-GPU communication according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of an apparatus for multi-GPU communication according to an embodiment of the present disclosure;

FIG. 6 is a diagram illustrating an example of uniform distribution of image processing tasks according to drawing areas;

fig. 7 is a schematic diagram illustrating non-uniform distribution of image processing tasks according to drawing areas in an embodiment of the present application.

Detailed Description

In the process of implementing the present application, the inventors found that, in image processing, graphics are drawn by multiple GPUs and then displayed, however, the multiple GPUs cannot cooperatively process tasks in parallel, which results in low overall performance of the GPUs.

In view of the foregoing problems, an embodiment of the present application provides a multi-GPU communication method for performing communication between a master GPU and a plurality of slave GPUs, where the master GPU performs one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, and the method includes: receiving an image processing task; dividing the image processing task to obtain a plurality of image processing subtasks; distributing the plurality of image processing subtasks according to the number of the slave GPUs; receiving a processing result of the slave GPU on the image processing task; and merging the processing results and splicing to obtain a target image. In the application, a scheme of interconnection and cooperative work of a plurality of GPUs is adopted, and the purpose of cooperatively finishing complex drawing and computing tasks is achieved.

In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Example one

The method provided by the first embodiment of the present application may be executed in a system with multiple GPUs or a system with a similar architecture. Taking a system operating on multiple GPUs as an example, fig. 1 is a schematic structural diagram of a GPU system implemented by the method for communicating multiple GPUs according to the embodiment of the present invention. Fig. 1 shows a GPU architecture of one master GPU and four slave GPUs. And the master GPU and the slave GPU are in one-to-one communication through a high-speed interconnection bus. And the main GPU receives the drawing tasks sent by the PCIE bus at the upper layer and transmits the drawing tasks to the other three slave GPUs through the high-speed interconnection bus, the three slave GPUs are also communicated through the high-speed interconnection bus, and the data required in the drawing process can be shared between the main GPU and the slave GPUs through the shared RAM and the shared Cache.

In this embodiment, a method for multi-GPU communication is provided, which is used for performing communication between a master GPU and a plurality of slave GPUs, where the master GPU performs one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, as shown in fig. 2, and the process includes the following steps:

step S201, receiving an image processing task;

step S202, the image processing task is divided to obtain a plurality of image processing subtasks;

step S203, distributing the image processing subtasks according to the number of the slave GPUs;

step S204, receiving the processing result of the image processing task from the GPU;

and S205, merging the processing results and splicing to obtain a target image.

According to the multi-GPU communication method, the drawing tasks sent by the upper-layer system are received by the main GPU, automatic drawing task cutting is completed, partial drawing tasks are distributed to the multiple auxiliary GPUs, the loads of the multiple auxiliary GPUs are managed simultaneously, and after the auxiliary GPUs finish drawing, drawing results are received, merged and spliced into a whole image to be output. The purpose that the GPU completes complex drawing and computing tasks in a coordinated mode is achieved, and the technical effect of improving the overall performance of the GPU is achieved.

In step S201, the image processing task is received by the master GPU.

In a specific embodiment, the master GPU receives an image processing task issued by an upper system, i.e., a CPU.

In a preferred embodiment, the image processing task comprises: and (5) drawing tasks.

In step S202, the master GPU divides the image processing task to obtain a plurality of image processing subtasks.

In a specific embodiment, the master GPU performs partitioning according to a drawing task sent by an upper system, and evenly distributes the drawing task to a plurality of slave GPUs according to the number of the slave GPUs.

In a preferred embodiment, the transmission of the drawing task when the drawing task is uniformly distributed to a plurality of slave GPUs is realized by a high-speed interconnection bus.

In step S203, the master GPU allocates the plurality of image processing subtasks according to the number of the slave GPUs.

In one embodiment, the master GPU receives the upper system drawing task, packages the drawing data required by the slave GPU, and sends the data package to the slave GPU.

In a preferred embodiment, the slave GPU parses the data packet after receiving the data packet, and stores the data packet in a specified location. And after receiving, sending a response information packet to the main GPU.

In step S204, the master GPU receives the processing result of the image processing task from the slave GPU, and the master GPU also performs the processing of the image processing task.

In one embodiment, the master GPU sends the associated drawing commands to the slave GPU for processing.

In a preferred embodiment, the slave GPU automatically performs a drawing task according to a drawing command and drawing data, and sends a drawing completion signal to the master GPU after the drawing task is completed.

In step S205, the master GPU merges and splices the processing results to obtain a target image.

In a specific embodiment, the processing result and the image processing result of the main GPU are merged and then spliced to obtain a target image.

In a preferred embodiment, after receiving all the drawing results sent by the slave GPU, the master GPU splices the drawing results and its own drawing results into a complete image and outputs the complete image to the display port.

In a preferred embodiment, the master GPU repeatedly sends different drawing data and drawing commands to the slave GPU for a plurality of times in succession, so as to drive the slave GPU to draw different graphics in the same sub-screen. In terms of display effect, one graphic is drawn in the same sub-screen, and then other graphics need to be drawn, so as to achieve effects such as overlapping and perspective of multiple graphics.

As an optional implementation manner of the present application, the allocating the plurality of image processing subtasks according to the number of the slave GPUs includes: sending a drawing data packet in the image processing subtask to be distributed by the slave GPU to the slave GPU; and receiving an information response packet when the slave GPU finishes receiving the drawing data packet.

In specific implementation, the master GPU receives the upper system drawing task, packages the drawing data required by the slave GPU, and sends the data package to the slave GPU. And after receiving the data packet, the slave GPU analyzes the data packet, stores the data to a specified position, and sends a response information packet to the master GPU and executes a work task.

As an optional implementation manner of the present application, the allocating the plurality of image processing subtasks according to the number of the slave GPUs further includes: sending an image processing task command to the slave GPU; receiving a completion response packet of the image processing task under the condition that the slave GPU completes the image processing task according to the image processing task command and the drawing data packet; sending an image processing task result permission receiving request to the slave GPU; and receiving a processing result of the image processing task according to the permission receiving request.

During specific implementation, the master GPU sends a drawing command to the slave GPU, and the slave GPU automatically executes to complete a drawing task according to the drawing command and drawing data. And sending a drawing completion signal to the master GPU after the drawing task is completed.

As an optional implementation manner of the present application, the allocating the plurality of image processing subtasks according to the number of the slave GPUs further includes: dividing a drawing area according to the coordinates of the target image and the number of the slave GPUs and the master GPU to obtain a target area; and sending the drawing data packets and the image processing task commands belonging to different drawing areas to the corresponding slave GPU.

In a specific implementation, the master GPU sends the drawing result to the slave GPU, allowing the receiving of the request, and the slave GPU sends the drawing result to the master GPU.

As an optional implementation manner of the present application, the performing, by the master GPU, one-to-one communication with each of the slave GPUs through the high-speed interconnection bus includes: performing one-to-one communication between each slave GPU through a high-speed interconnection bus; and/or the master GPU and each slave GPU share data required in the process of executing the image processing task through a shared RAM; and/or sharing data required in the process of executing the image processing task between the master GPU and each slave GPU through a shared Cache.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example two

In this embodiment, a method for multi-GPU communication is provided, which is used for performing communication between a master GPU and a plurality of slave GPUs, where the master GPU performs one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, as shown in fig. 4, the process includes the following steps:

step S401, receiving a plurality of image processing subtasks obtained by the main GPU after the image processing task is divided;

step S402, processing the image processing subtasks according to a preset mode;

step S403, sending the processing result of the image processing task to the master GPU.

According to the multi-GPU communication method, the drawing tasks sent by the upper-layer system are received by the main GPU, automatic drawing task cutting is completed, partial drawing tasks are distributed to the multiple auxiliary GPUs, the loads of the multiple auxiliary GPUs are managed simultaneously, and after the auxiliary GPUs finish drawing, drawing results are received, merged and spliced into a whole image to be output. The purpose of completing complex drawing and computing tasks in a coordinated mode is achieved through the GPU configuration method and the GPU configuration method, and the technical effect of improving the overall performance of the GPU is achieved.

In step S401, a plurality of image processing subtasks obtained by dividing the image processing task by the master GPU are received in the plurality of slave GPUs.

In step S402, the plurality of image processing sub-tasks are processed in the plurality of slave GPUs according to a preset manner.

In step S403, the slave GPU sends the processing result of the image processing task to the master GPU.

EXAMPLE III

In this embodiment, a multi-GPU communication device is also provided, and the device is used for implementing the above embodiments and preferred embodiments, which have already been described and will not be described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 3 is a block diagram of an apparatus for multi-GPU communication according to an embodiment of the present invention, configured to perform communication between a master GPU and a plurality of slave GPUs, the master GPU performing one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, as shown in fig. 3, the apparatus including:

a first receiving module 30, configured to receive an image processing task;

a segmentation module 31, configured to segment the image processing task to obtain a plurality of image processing subtasks;

an allocation module 32, configured to allocate the plurality of image processing subtasks according to the number of the slave GPUs;

the first receiving module 30 is further configured to receive a processing result of the slave GPU on the image processing task;

and the splicing module 33 is configured to merge and splice the processing results to obtain a target image.

The image processing task is received by the master GPU in the first receiving module 30.

In the segmentation module 31, the main GPU segments the image processing task to obtain a plurality of image processing subtasks.

The master GPU in the allocation module 32 allocates the plurality of image processing subtasks according to the number of the slave GPUs.

In the first receiving module 30, the master GPU receives the processing result of the image processing task from the slave GPU, and the master GPU also performs the processing of the image processing task.

And the master GPU in the stitching module 33 merges the processing results and then stitches the processing results to obtain a target image.

Example four

Fig. 5 is a block diagram of an apparatus for multi-GPU communication according to an embodiment of the present invention, configured to perform communication between a master GPU and a plurality of slave GPUs, the master GPU performing one-to-one communication with each of the slave GPUs through a high-speed interconnection bus, as shown in fig. 5, the apparatus including:

a second receiving module 50, configured to receive a plurality of image processing subtasks obtained by splitting the image processing task by the master GPU;

a processing module 51, configured to process the multiple image processing sub-tasks according to a preset manner;

a sending module 52, configured to send the processing result of the image processing task to the master GPU.

The second receiving module 50 receives a plurality of image processing subtasks obtained by dividing the image processing task by the master GPU from the plurality of slave GPUs.

The processing module 51 processes the image processing sub-tasks in the slave GPUs according to a preset manner.

The sending module 52 sends the processing result of the image processing task to the master GPU in the slave GPU.

In order to better understand the method flow of the multi-GPU communication, the following explains the technical solutions with reference to the preferred embodiments, but is not limited to the technical solutions of the embodiments of the present invention.

In the application, the master GPU packs the working data to be executed by the slave GPU and directly sends the data packet to the slave GPU through the high-speed inter-chip interconnection bus; the slave GPU analyzes the data packet after receiving the data packet and executes a work task; and after the task execution is finished, the slave GPU packs the task result data and returns the task result data to the master GPU through the high-speed inter-slice interconnection bus.

In the following, taking an image processing task as an example of a drawing task, the GPU system includes a master GPU and three slave GPUs for detailed description.

The method comprises the following steps that a primary GPU sends drawing tasks to be divided according to an upper layer, the drawing tasks are uniformly distributed to a plurality of secondary GPUs according to the number of the secondary GPUs, and a transmission process adopts a high-speed interconnection bus, and specifically comprises the following steps:

step S1, the master GPU receives the upper layer system drawing task, packs the drawing data needed by the slave GPU and sends the data packet to the slave GPU;

step S2, analyzing the data packet after receiving the data packet from the GPU, and storing the data to a specified position; after receiving, sending a response information packet to the main GPU and executing a work task;

step S3, the master GPU sends a drawing command to a command packet of the slave GPU;

step S4, the slave GPU automatically executes and completes the drawing task according to the drawing command and the drawing data, and sends a response packet of a drawing completion signal to the master GPU after the drawing task is completed;

step S5, the master GPU sends a drawing result to the slave GPU to allow receiving the requested data packet;

step S6, sending a data packet of the drawing result from the GPU to the master GPU;

and step S7, after completing all the drawing results sent by the slave GPU, the master GPU splices the drawing results into a complete image and outputs the complete image to the display port.

In particular, the steps S2 to S6 may be irregular, or repeated for multiple times, that is, different drawing data and drawing commands may be sent to the slave GPU repeatedly and continuously for multiple times, so as to drive the slave GPU to draw different graphics in the same sub-frame. That is, in the same picture, one graph is drawn first, and then other graphs need to be drawn, so as to realize the effects of overlapping and perspective of multiple graphs.

The above drawing data packet format sequentially includes: the system comprises a header as a starting delimiter, a packet attribute field, a destination address, data length, data, a check code and a packet tail as an ending delimiter.

The header is a string of continuous specific characters, and the characters do not continuously appear in the data packet, so as to avoid parsing errors when the data packet is received.

The packet attribute field indicates the format of data in the current data packet, and may be command, data, and response.

The command is used for indicating that the current data packet is a drawing command, and the slave equipment needs to automatically analyze and complete the drawing command.

The data is used for indicating the received GPU, the currently transmitted GPU is drawing data or drawing results, and the receiving equipment only needs to complete the analysis of the data packet and then store the data packet to a specified destination address.

The response means that the data in the current data packet is: the current receiving device response information specifically includes: the receiving completion and the verification information indicate that the receiving from the GPU is completed currently. And when the data check received from the device fails, the transmitting device needs to decide whether to retransmit at this time.

The permission to send refers to permission of the current device to receive data.

The state information refers to the state information of the current receiving device.

The destination address refers to a destination address where data in the current data packet needs to be stored.

The data length refers to the number of real data in the current data packet,

The data refers to data bits in a data packet, and the specific meaning is analyzed according to packet attribute fields;

the check code means that data must be checked because an external interconnection bus is adopted and errors are difficult to avoid in the data transmission process.

The packet tail is a continuous specific character similar to the packet head and identifies the end of the data packet.

As shown in fig. 6, the master GPU divides the drawing area into four areas according to the number of GPUs according to the image coordinates, sends drawing data and drawing commands belonging to different drawing areas to the designated slave GPUs, and sends the drawing result of the self-managed area to the master GPU after one slave GPU completes the drawing task; and the master GPU collects the drawing results of all the slave GPUs and simultaneously splices the drawing results with the drawing results of the self GPU to restore the drawing results into images required by the upper system for display and output.

As shown in fig. 7, the drawing task is assigned according to image density and complexity. Because the real drawing tasks are not uniformly distributed in the image area, most of the drawing tasks may be concentrated in a specific area at some time, and at this time, the master GPU needs to wait for all the slave GPUs and the self GPU to finish drawing so as to collect and splice the final display image. The longest time at this time, the drawing performance, is determined by the most complex area. At this time, the master GPU reasonably arranges the drawing work of each slave GPU according to the drawing complexity. As shown in fig. 7, when the drawing tasks are all concentrated on the upper half area of the image, the master GPU will autonomously allocate three GPUs to draw the upper half area at the same time, so as to balance the drawing load of each GPU as a whole, thereby achieving the purpose of improving the overall performance.

Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s1, receiving an image processing task;

s2, dividing the image processing task to obtain a plurality of image processing subtasks;

s3, distributing the image processing subtasks according to the number of the slave GPUs;

s4, receiving the processing result of the image processing task from the GPU;

and S5, merging the processing results and splicing to obtain the target image.

And/or the presence of a gas in the gas,

s1, receiving a plurality of image processing subtasks obtained by dividing the image processing task by the main GPU;

s2, processing the image processing subtasks according to a preset mode;

s3, sending the processing result of the image processing task to the main GPU.

Optionally, the storage medium is further arranged to store a computer program for performing the steps of:

s31, sending the drawing data packet in the image processing subtask to be distributed by the slave GPU to the slave GPU;

s32, receiving an information response packet when the slave GPU has received the drawing data packet.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, receiving an image processing task;

s4, receiving the processing result of the image processing task from the GPU;

and S5, merging the processing results and splicing to obtain the target image.

And/or the presence of a gas in the gas,

s2, processing the image processing subtasks according to a preset mode;

s3, sending the processing result of the image processing task to the main GPU.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for multi-GPU communication, for communicating between a master GPU and a plurality of slave GPUs, the master GPU in one-to-one communication with each of the slave GPUs via a high-speed interconnect bus, the method comprising:

receiving an image processing task;

dividing the image processing task to obtain a plurality of image processing subtasks;

distributing the plurality of image processing subtasks according to the number of the slave GPUs;

receiving a processing result of the slave GPU on the image processing task;

and merging the processing results and splicing to obtain a target image.

2. The method of claim 1, wherein said allocating the plurality of image processing subtasks according to the number of slave GPUs comprises:

sending a drawing data packet in the image processing subtask to be distributed by the slave GPU to the slave GPU;

and receiving an information response packet when the slave GPU finishes receiving the drawing data packet.

3. The method of claim 2, wherein said allocating the plurality of image processing subtasks according to the number of slave GPUs further comprises:

sending an image processing task command to the slave GPU;

receiving a completion response packet of the image processing task under the condition that the slave GPU completes the image processing task according to the image processing task command and the drawing data packet;

sending an image processing task result permission receiving request to the slave GPU;

and receiving a processing result of the image processing task according to the permission receiving request.

4. The method of claim 1, wherein said allocating the plurality of image processing subtasks according to the number of slave GPUs further comprises:

dividing a drawing area according to the coordinates of the target image and the number of the slave GPUs and the master GPU to obtain a target area;

and sending the drawing data packets and the image processing task commands belonging to different drawing areas to the corresponding slave GPU.

5. The method of claim 1, wherein the one-to-one communication between the master GPU and each of the slave GPUs via a high speed interconnect bus comprises:

performing one-to-one communication between each slave GPU through a high-speed interconnection bus;

and/or the master GPU and each slave GPU share data required in the process of executing the image processing task through a shared RAM;

and/or sharing data required in the process of executing the image processing task between the master GPU and each slave GPU through a shared Cache.

6. The method according to claim 1, wherein the merging and splicing the processing results to obtain the target image comprises:

and merging the processing result and the image processing result of the main GPU and splicing to obtain a target image.

7. A method for multi-GPU communication, for communicating between a master GPU and a plurality of slave GPUs, the master GPU in one-to-one communication with each of the slave GPUs via a high-speed interconnect bus, the method comprising:

receiving a plurality of image processing subtasks obtained by dividing the image processing task by the main GPU;

processing the plurality of image processing subtasks according to a preset mode;

and sending the processing result of the image processing task to the main GPU.

8. The method of claim 7, wherein the receiving a plurality of image processing subtasks resulting from the splitting of the image processing task by the master GPU comprises:

receiving a drawing data packet in an image processing subtask to be distributed;

under the condition that the drawing data packet is received completely, sending an information response packet to the main GPU;

executing and completing the image processing task according to the image processing task command and the drawing data packet;

sending a response packet of the image processing task to the master GPU when the image processing task is completed;

and sending the processing result of the image processing task to the main GPU under the condition that the image processing task result permission receiving request is received.

9. An apparatus for multi-GPU communication, wherein the apparatus is configured to communicate between a master GPU and a plurality of slave GPUs, and wherein the master GPU communicates with each of the slave GPUs in a one-to-one manner via a high-speed interconnect bus, the apparatus comprising:

the first receiving module is used for receiving the image processing task;

the segmentation module is used for segmenting the image processing task to obtain a plurality of image processing subtasks;

the distribution module is used for distributing the image processing subtasks according to the number of the secondary GPUs;

the first receiving module is further configured to receive a processing result of the slave GPU on the image processing task;

and the splicing module is used for merging the processing results and splicing to obtain a target image.

10. An apparatus for multi-GPU communication, wherein the apparatus is configured to communicate between a master GPU and a plurality of slave GPUs, and wherein the master GPU communicates with each of the slave GPUs in a one-to-one manner via a high-speed interconnect bus, the apparatus comprising:

the second receiving module is used for receiving a plurality of image processing subtasks obtained by dividing the image processing task by the main GPU;

the processing module is used for processing the image processing subtasks according to a preset mode;

and the sending module is used for sending the processing result of the image processing task to the main GPU.

11. A storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the method of any of claims 1 to 6 when executed, and/or to perform the method of any of claims 7 to 8.

12. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 6 and/or to perform the method of any of claims 7 to 8.