CN110415160B

CN110415160B - GPU (graphics processing Unit) topology partitioning method and device

Info

Publication number: CN110415160B
Application number: CN201910580776.8A
Authority: CN
Inventors: 王德奎
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2019-06-29
Filing date: 2019-06-29
Publication date: 2022-06-07
Anticipated expiration: 2039-06-29
Also published as: CN110415160A

Abstract

The invention discloses a GPU topology partitioning method and a device, comprising the following steps: determining interconnection bandwidth among the GPUs according to the physical topology information of the GPUs, and generating a GPU topological graph comprising the GPUs; randomly dividing a plurality of GPUs in a GPU topological graph into two partitions; calculating migration gains of all GPUs in the GPU topological graph, migrating the GPU with the highest migration gain in the partitions including more GPUs into the partitions including fewer GPUs, calculating the number of cross-partition connections of the current partition scheme, and removing the migrated GPU from the GPU topological graph; the above steps are repeated until all GPUs in the GPU topology are removed, and the partitioning scheme with the smallest number of connections across partitions is selected as the partitioning result. The method can optimize the topological partitions of the GPU from the bottom layer in a targeted manner according to different connection relations among the GPUs, reduce the transmission time consumption among the GPUs, and improve the calculation speed of artificial intelligence.

Description

GPU (graphics processing Unit) topology partitioning method and device

Technical Field

The present invention relates to the field of computers, and more particularly, to a method and an apparatus for GPU topology partitioning.

Background

In the fields of high-performance computing and artificial intelligence, GPUs are often used for computational acceleration. The GPU is used in a large scale due to its powerful computing power and low power consumption characteristics, and especially in the field of artificial intelligence of recent heat, most of model training is based on GPU operation, which can save a large amount of computing time, thereby accelerating model iteration. Because the cost of the GPU is high, more and more artificial intelligence developers want to fully improve the resource utilization rate of the GPU and exert the maximum value of the GPU under the limited GPU resources. However, most artificial intelligence developers lack underlying knowledge about the GPUs, the communication between the GPUs in the prior art is inefficient due to lack of underlying optimization, and the GPU partitioning lacks orderliness, which results in slowing down the computation speed of artificial intelligence.

Aiming at the problem that the GPU partition in the prior art is lack of orderliness, so that the slow artificial intelligence computing speed is caused, no effective solution is provided at present.

Disclosure of Invention

In view of this, an object of the embodiments of the present invention is to provide a method and an apparatus for GPU topology partitioning, which can optimize the topology partitioning of the GPU from the bottom layer in a targeted manner according to different connection relationships between GPUs, reduce the time consumption for transmission between GPUs, and improve the computation speed of artificial intelligence.

Based on the above object, a first aspect of the embodiments of the present invention provides a GPU topology partitioning method, including the following steps:

determining interconnection bandwidth among the GPUs according to the physical topology information of the GPUs, and generating a GPU topological graph comprising the GPUs;

randomly dividing a plurality of GPUs in a GPU topological graph into two partitions;

calculating migration gains of all GPUs in the GPU topological graph, migrating the GPU with the highest migration gain in the partitions including more GPUs into the partitions including fewer GPUs, calculating the number of cross-partition connections of the current partition scheme, and removing the migrated GPU from the GPU topological graph;

the above steps are repeated until all GPUs in the GPU topology are removed, and the partitioning scheme with the smallest number of connections across partitions is selected as the partitioning result.

In some embodiments, the physical topology information includes connection relationships between the multiple GPUs; the connection relationship among the GPUs comprises that the GPUs are connected through at least one of the following simultaneously: NVlink, PCIe bus, PCIe switch, PCIe host bridge, QPI.

In some embodiments, determining the interconnection bandwidth between the plurality of GPUs from the physical topology information of the plurality of GPUs comprises: and determining the rate of mutual information transmission among the GPUs according to the connection relation among the GPUs.

In some embodiments, further generating a GPU topology comprising a plurality of GPUs comprises:

taking a plurality of GPUs as a plurality of points;

taking the connection relation among the GPUs as a plurality of edges;

taking interconnection bandwidth among the GPUs as the weight of the edges;

the GPU topology is constructed according to the points, the edges and the weights of the edges.

In some embodiments, generating a GPU topology map comprising a plurality of GPUs further comprises: the computational power of the multiple GPUs is used as the weight of the multiple points, and the GPU topological graph is constructed according to the multiple points, the multiple edges, the weight of the multiple points and the weight of the multiple edges.

In some embodiments, calculating the migration gain for all GPUs in the GPU topology comprises:

for each point, determining a migration tendency FS of the point according to respective weights of an edge connected to the point across the partition and an edge connected to the point;

for each point, determining the retention tendency TE of the point according to the respective weights of the edge connected to the point and the edge connected to the point in the same partition;

migration gain was obtained using the migration tendency FS of each point minus the retention tendency TE.

In some embodiments, further comprising: randomly determining one partition in response to the number of CPUs in the two partitions being the same or non-randomly determining one partition as a partition including more GPUs according to a predetermined rule; randomly determining one GPU in response to the simultaneous existence of two or more GPUs with the highest migration gains in parallel or non-randomly determining one GPU as the GPU with the highest migration gain according to a preset rule; the one or more partitioning schemes are randomly determined in response to the two or more partitioning schemes having the smallest number of cross-partition connections existing at the same time or non-randomly determined according to a predetermined rule as the partitioning scheme having the smallest number of cross-partition connections.

In some embodiments, the method further comprises:

after obtaining the partitioning results, a partitioned GPU topology map is generated for one or more partitions in the partitioning results to perform topology partitioning again.

A second aspect of the present invention provides a GPU topology partitioning apparatus, including:

the modeling module is used for determining the interconnection bandwidth among the GPUs according to the physical topology information of the GPUs and generating a GPU topological graph comprising the GPUs;

the initialization module is used for randomly dividing a plurality of GPUs in the GPU topological graph into two partitions;

an iteration module, configured to calculate migration gains of all GPUs in the GPU topology, migrate a GPU with the highest migration gain in partitions including more GPUs to partitions including fewer GPUs, calculate the number of cross-partition connections of a current partition scheme, and remove a migrated GPU from the GPU topology;

and the sorting module is used for repeating the previous step until all GPUs in the GPU topological graph are removed, and selecting the partition scheme with the minimum number of cross-partition connection as a partition result.

A third aspect of an embodiment of the present invention provides an artificial intelligence computing device, including:

a plurality of GPUs;

a processor; and

a memory storing processor-executable program code that, when executed, performs the above-described GPU topology partitioning method to partition the plurality of GPUs and arrange artificial intelligence computation tasks in units of each of the partitions.

The invention has the following beneficial technical effects: according to the GPU topology partitioning method and device, the interconnection bandwidth among the GPUs is determined according to the physical topology information of the GPUs, and a GPU topology graph comprising the GPUs is generated; randomly dividing a plurality of GPUs in a GPU topological graph into two partitions; calculating migration gains of all GPUs in the GPU topological graph, migrating the GPU with the highest migration gain in the partitions including more GPUs into the partitions including fewer GPUs, calculating the number of cross-partition connections of the current partition scheme, and removing the migrated GPU from the GPU topological graph; the previous step is repeated until all the GPUs in the GPU topological graph are removed, and the partitioning scheme with the minimum number of cross-partition connection is selected as the partitioning result, so that the topological partitions of the GPUs can be optimized from the bottom layer in a targeted mode according to different connection relations among the GPUs, the transmission time consumption among the GPUs is reduced, and the artificial intelligence computing speed is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flowchart of a GPU topology partitioning method according to the present invention;

fig. 2 is a schematic diagram of a GPU topology connection relationship in an embodiment of the GPU topology partitioning method provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.

In view of the above, a first aspect of the embodiments of the present invention proposes an embodiment of a method for optimizing topology partitions of a GPU from the bottom layer in a targeted manner for different connection relationships between GPUs. Fig. 1 is a schematic flowchart illustrating a GPU topology partitioning method provided by the present invention.

The GPU topology partitioning method, as shown in fig. 1, includes the following steps:

step S101: determining interconnection bandwidth among the GPUs according to the physical topology information of the GPUs, and generating a GPU topological graph comprising the GPUs;

step S103: randomly dividing a plurality of GPUs in the GPU topological graph into two partitions;

step S105: calculating migration gains of all GPUs in the GPU topological graph, migrating the GPU with the highest migration gain in the partitions including more GPUs into the partitions including fewer GPUs, calculating the number of cross-partition connections of the current partition scheme, and removing the migrated GPU from the GPU topological graph;

step S107: the above steps are repeated until all GPUs in the GPU topology are removed, and the partitioning scheme with the smallest number of connections across partitions is selected as the partitioning result.

The invention provides a GPU topological partitioning method, which achieves the effects of reducing the minimum communication bandwidth among different partitions and the maximum communication bandwidth in the partitions by performing communication modeling on a plurality of GPU cards belonging to different servers and using a partitioning algorithm to realize GPU topological partitioning. The application meaning is that an artificial intelligence computing load can be arranged in the same partition, and therefore time cost consumed by transmission of the artificial intelligence computing load between the GPUs is the minimum.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like. Embodiments of the computer program may achieve the same or similar effects as any of the preceding method embodiments to which it corresponds.

Obviously, the bandwidths of the GPUs are different in different connection modes. Generally, the bandwidth of the connection over NVlink is highest; different NVlink connection bandwidths are different; the connection bandwidth through the PCIe bus and the PCIe host bridge within the NUMA node is relatively low. QPI here refers to a connection scheme that is interconnected across NUMA nodes over a PCIe bus and using SMP, with lower connection bandwidth. The physical topology information may be provided by the driver.

In some embodiments, determining, according to the physical topology information of the GPUs, the interconnection bandwidth between the GPUs is: and determining the rate of mutual information transmission among the GPUs according to the connection relation among the GPUs.

In some embodiments, generating a GPU topology map comprising a plurality of GPUs comprises:

taking a plurality of GPUs as a plurality of points;

taking the connection relation among the GPUs as a plurality of edges;

taking interconnection bandwidth among the GPUs as the weight of the edges;

In some embodiments, further generating a GPU topology map comprising a plurality of GPUs further comprises: the computational power of the multiple GPUs is used as the weight of the multiple points, and the GPU topological graph is constructed according to the multiple points, the multiple edges, the weight of the multiple points and the weight of the multiple edges.

In some embodiments, calculating the migration gain comprises:

The migration tendency FS represents the inner edge of the partition that the point can obtain from migration; retention tendency TE represents the inner edge of the partition where the point can be lost from migration. The weights of the edges within these partitions are also taken into account to improve the accuracy of the model.

In some embodiments, one partition is randomly determined in response to the number of CPUs being the same in both partitions or non-randomly determined according to a predetermined rule as a partition including more GPUs; in response to the simultaneous existence of two or more GPUs with the highest migration gains in parallel, randomly determining one GPU or non-randomly determining one GPU as the GPU with the highest migration gain according to a preset rule; the one or more partitioning schemes are randomly determined in response to the two or more partitioning schemes having the smallest number of cross-partition connections existing at the same time or non-randomly determined according to a predetermined rule as the partitioning scheme having the smallest number of cross-partition connections.

In some embodiments, the method further comprises:

The re-partitioning is suitable for the situation that the total amount of the GPU is large and the concurrent computing task is large.

The method disclosed according to an embodiment of the present invention may also be implemented as a computer program executed by a CPU, which may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method disclosed in the embodiments of the present invention. The above-described method steps and system elements may also be implemented using a controller and a computer-readable storage medium for storing a computer program for causing the controller to implement the functions of the above-described steps or elements.

The detailed embodiments of the present invention are further illustrated below with reference to specific examples.

Firstly, installing a driver on a physical server configured with a plurality of GPUs to acquire physical connection relations among the GPUs and between the GPUs and the CPUs:

wherein, X represents the GPU or the CPU device itself; SYS is formed by connecting two GPUs/GPUs and a CPU through PCIe buses and SMP interconnection (QPI/UPI) among NUMA nodes; PHB is formed by connecting two GPUs/GPUs and a CPU through a PCIe bus and a PCIe host bridge; the PXB is characterized in that two GPUs/GPUs and a CPU are connected through a plurality of PCIe switches; PIX is formed by connecting two GPUs/GPUs and a CPU through a PCIe switch; NODE is that two GPUs/GPUs and a CPU are connected through a PCIe bus and a PCIe host bridge in a NUMA NODE; NV # is two GPUs/GPUs connected to the CPU via an NvLink, # denotes a number.

The topological connection relation graph shown in fig. 2 can be determined according to the obtained physical connections, wherein the connections between the GPUs only reserve the NvLink connections with higher bandwidth. Since the control data with limited size is mainly transmitted between the GPU and the CPU, the embodiment of the present invention only focuses on the connection between the two GPUs, and does not focus on the connection between the GPU and the CPU. The embodiment of the invention further uses p2p Bandwidth Latency Test to measure the Bandwidth, which is shown in the following table:

	GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7
									GPU0	743	96	96	48	48	19	18	19
GPU1	96	744	48	48	19	96	18	19
									GPU2	96	48	746	96	18	18	48	18
GPU3	48	48	96	747	18	19	18	96
									GPU4	48	19	18	19	754	96	96	48
GPU5	18	96	18	18	96	744	48	48
									GPU6	18	18	48	18	96	48	749	96
GPU7	19	19	18	96	48	48	96	745

in the embodiment of the present invention, it is default that the computing power of each GPU card is the same, i.e. the point weights are all 1.

The partitioning scheme is determined according to the GPU topological graph by using the Fiducia-Mattheys-Sanchs algorithm. The GPU topology may be expressed in a data-only manner as follows:

8

28

96 2 0 1

96 2 0 2

48 2 0 3

48 2 0 4

19 2 0 5

18 2 0 6

19 2 0 7

48 2 1 2

48 2 1 3

19 2 1 4

96 2 1 5

18 2 1 6

19 2 1 7

96 2 2 3

18 2 2 4

18 2 2 5

48 2 2 6

18 2 2 7

18 2 3 4

19 2 3 5

18 2 3 6

48 2 3 7

96 2 4 5

48 2 4 6

48 2 4 7

48 2 5 6

48 2 5 7

96 2 6 7

1

the number of GPUs in the first row represents that 8 GPUs are used in the embodiment of the invention; the second row represents the number of GPU-to-GPU connections, with 8 GPU interconnects having 28 connections; lines 3 to 30 are description information of each connection, the first column is a weight (i.e., a bandwidth between GPUs), the second column is a number of endpoints (the number of points of each edge, which is uniformly 2 in the GPU interconnection of the embodiment of the present invention), the third column is a number of a connected start GPU, and the fourth column is a number of a connected end GPU; lines 31 through 36 are the weights for each GPU, all 1 in the present embodiment.

The Fiducia-Mattheys-Sanchs algorithm is an improvement over the Fiducia-Mattheys algorithm. The traditional Fiducia-Mattheys algorithm only supports two partitions, while the Fiducia-Mattheys-Sanchs algorithm used by the embodiment of the invention supports multiple partitions, so that the embodiment of the invention has better expansibility.

The calculation mode of the Fiducia-Mattheys-Sanchs algorithm is as follows:

firstly, a plurality of GPUs in a GPU topological graph are randomly divided into two partitions, namely 4 in each partition.

The migration gains for all GPUs in the GPU topology are then calculated. The migration gain is the migration tendency FS minus the retention tendency TE; for each point, the migration tendency FS is according to the respective weights of the edges connected to the point and the edges connected to the point across the partition, and the retention tendency TE is according to the respective weights of the edges connected to the point and the edges connected to the point within the same partition.

After calculating the migration gains of all the points, the point with the highest migration gain in the partition including more points is migrated into the partition including fewer points, the number of cross-partition connections of the current partition scheme is calculated, and the migrated point is removed from the GPU topological graph.

And recalculating the migration gain of the rest points according to the situation after the point migration, and repeating the steps to obtain a partition scheme until all the points are migrated. At this time, the smallest number of connections across partitions of the partitioning scheme is counted as the partitioning result.

The pseudo-code implementation of the above algorithm is as follows:

as can be seen from the foregoing embodiments, in the GPU topology partitioning method provided by the embodiments of the present invention, interconnection bandwidths among a plurality of GPUs are determined according to physical topology information of the plurality of GPUs, and a GPU topology map including the plurality of GPUs is further generated; randomly dividing a plurality of GPUs in a GPU topological graph into two partitions; calculating migration gains of all GPUs in the GPU topological graph, migrating the GPU with the highest migration gain in the partitions including more GPUs into the partitions including fewer GPUs, calculating the number of cross-partition connections of the current partition scheme, and removing the migrated GPU from the GPU topological graph; the previous step is repeated until all the GPUs in the GPU topological graph are removed, and the partitioning scheme with the minimum number of cross-partition connection is selected as the partitioning result, so that the topological partitions of the GPUs can be optimized from the bottom layer in a targeted mode according to different connection relations among the GPUs, the transmission time consumption among the GPUs is reduced, and the artificial intelligence computing speed is improved.

It should be noted that, the steps in the embodiments of the GPU topology partitioning method may be mutually intersected, replaced, added, or deleted, and therefore, these reasonable permutation and combination transformations also belong to the scope of the present invention, and should not limit the scope of the present invention to the described embodiments.

In view of the above, a second aspect of the embodiments of the present invention proposes an embodiment of an apparatus capable of optimizing topology partitions of GPUs from the bottom layer in a targeted manner for different connection relationships between the GPUs. The GPU topology partitioning device comprises:

The various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.

In view of the above, a third aspect of the embodiments of the present invention proposes an embodiment of an artificial intelligence computing device capable of optimizing topology partitions of GPUs from the bottom layer in a targeted manner for different connection relationships between the GPUs. The artificial intelligence computing device includes:

a plurality of GPUs;

a processor; and

It can be seen from the foregoing embodiments that, in the GPU topology partitioning apparatus and the artificial intelligence computing device provided by the embodiments of the present invention, the interconnection bandwidth between the multiple GPUs is determined according to the physical topology information of the multiple GPUs, and the GPU topology map including the multiple GPUs is further generated; randomly dividing a plurality of GPUs in a GPU topological graph into two partitions; calculating migration gains of all GPUs in the GPU topological graph, migrating the GPU with the highest migration gain in the partitions including more GPUs into the partitions including fewer GPUs, calculating the number of cross-partition connections of the current partition scheme, and removing the migrated GPU from the GPU topological graph; the previous step is repeated until all the GPUs in the GPU topological graph are removed, and the partitioning scheme with the minimum number of cross-partition connection is selected as the partitioning result, so that the topological partitions of the GPUs can be optimized from the bottom layer in a targeted mode according to different connection relations among the GPUs, the transmission time consumption among the GPUs is reduced, and the artificial intelligence computing speed is improved.

It should be particularly noted that the above embodiments of the GPU topological partitioning apparatus and the artificial intelligence computing device use the embodiment of the GPU topological partitioning method to specifically describe the working process of each module, and those skilled in the art can easily think that these modules are applied to other embodiments of the GPU topological partitioning method. Of course, since the steps in the GPU topology partitioning method embodiment can be mutually intersected, replaced, added, and deleted, these reasonable permutation, combination and transformation shall also belong to the scope of the present invention for the GPU topology partitioning apparatus and the artificial intelligence computing device, and shall not limit the scope of the present invention to the embodiment.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of an embodiment of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims

1. A GPU topological partitioning method is characterized by comprising the following steps:

determining interconnection bandwidth among the GPUs according to the physical topology information of the GPUs, and generating a GPU topology graph comprising the GPUs, wherein the step of generating the GPU topology graph comprising the GPUs comprises the following steps: taking the GPUs as a plurality of points, taking the connection relation among the GPUs as a plurality of edges, taking the interconnection bandwidth among the GPUs as the weight of the edges, and constructing the GPU topological graph according to the weight of the points, the edges and the edges;

randomly dividing a plurality of GPUs in the GPU topological graph into two partitions;

calculating migration gains of all GPUs in the GPU topology graph, migrating a GPU with the highest migration gain in partitions comprising more GPUs into partitions comprising fewer GPUs, calculating the number of cross-partition connections of a current partition scheme, and removing the migrated GPU from the GPU topology graph, wherein calculating the migration gains of all GPUs in the GPU topology graph comprises: for each point, determining a migration tendency FS of the point according to respective weights of an edge connected to the point and an edge connected to the point across the partition, for each point, determining a retention tendency TE of the point according to respective weights of an edge connected to the point and an edge connected to the point within the same partition, and obtaining the migration gain by subtracting the retention tendency TE from the migration tendency FS of each point;

repeating the previous step until all GPUs in the GPU topological graph are removed, and selecting the partitioning scheme with the minimum number of cross-partition connections as a partitioning result.

2. The method of claim 1, wherein the physical topology information includes connection relationships between multiple GPUs; the connection relationship between the multiple GPUs includes simultaneous connection by at least one of: NVlink, PCIe bus, PCIe switch, PCIe host bridge, QPI.

3. The method of claim 2, wherein determining the interconnection bandwidth between a plurality of GPUs from the physical topology information of the plurality of GPUs comprises: and determining the rate of mutual information transmission among the GPUs according to the connection relation among the GPUs.

4. The method of claim 1, wherein generating the GPU topology graph comprising a plurality of GPUs further comprises: and taking the computing power of the plurality of GPUs as the weight of the plurality of points, and constructing the GPU topological graph according to the plurality of points, the plurality of edges, the weight of the plurality of points and the weight of the plurality of edges.

5. The method of claim 1, further comprising:

randomly determining one partition in response to the number of CPUs in two partitions being the same or non-randomly determining one partition as the partition including more GPUs according to a predetermined rule;

in response to the simultaneous existence of two or more GPUs with the highest parallel migration gain, randomly determining one GPU or non-randomly determining one GPU as the GPU with the highest migration gain according to a preset rule;

randomly determining one or more of the partition schemes in response to the existence of the smallest number of two or more of the cross-partition connections at the same time or non-randomly determining one or more of the partition schemes according to a predetermined rule as the smallest number of the cross-partition connections.

6. The method of claim 1, further comprising:

7. A GPU topology partitioning apparatus, comprising:

the modeling module is used for determining interconnection bandwidth among the GPUs according to the physical topology information of the GPUs and generating a GPU topology graph comprising the GPUs, wherein the generation of the GPU topology graph comprising the GPUs comprises the following steps: taking the GPUs as a plurality of points, taking the connection relation among the GPUs as a plurality of edges, taking the interconnection bandwidth among the GPUs as the weight of the edges, and constructing the GPU topological graph according to the weight of the points, the edges and the edges;

an iteration module to calculate migration gains for all GPUs in the GPU topology graph, migrate a GPU with a highest migration gain in partitions including more GPUs into partitions including fewer GPUs, calculate a number of cross-partition connections for a current partition scheme, and remove a migrated GPU from the GPU topology graph, wherein calculating the migration gains for all GPUs in the GPU topology graph comprises: for each point, determining a migration tendency FS of the point according to respective weights of an edge connected to the point and an edge connected to the point across the partition, for each point, determining a retention tendency TE of the point according to respective weights of an edge connected to the point and an edge connected to the point within the same partition, and obtaining the migration gain by subtracting the retention tendency TE from the migration tendency FS of each point;

8. An artificial intelligence computing device, comprising:

a plurality of GPUs;

a processor; and

a memory storing processor-executable program code which, when executed, performs the GPU topology partitioning method of any of claims 1-6 to partition a plurality of GPUs and arrange artificial intelligence computing tasks in units of each of the partitions.