CN105045670A

CN105045670A - Method and system for balancing loads of central processing units and graphic processing units

Info

Publication number: CN105045670A
Application number: CN201510552837.1A
Authority: CN
Inventors: 张广勇; 王明清; 高永虎; 卢晓伟; 王娅娟
Original assignee: Inspur Beijing Electronic Information Industry Co Ltd
Current assignee: Inspur Beijing Electronic Information Industry Co Ltd
Priority date: 2015-09-01
Filing date: 2015-09-01
Publication date: 2015-11-11

Abstract

The invention discloses a method and system for balancing loads of central processing units and graphic processing units. At least one same CPU and at least one same GPU are numbered, the CPUs and the GPUs are grouped according to the numbering sequence according to computed memory space needed for each task, each task is partitioned to one CPU group or one GPU group to finish computing of the task, due to the fact that the CPUs or the GPUs in one group are the same, in other words, the CPUs or the GPUs in one group have the same memory space, the same computing capacity is achieved, partitioned task data are distributed to all the CPUs or all the GPUs in one group, the CPUs or the GPUs in one group can simultaneously process the same task data, all the tasks can be computed, and then load balance between the CPUs and the GPUs is optimized.

Description

The method and system of central processing unit and graphic process unit load balancing

Technical field

The present invention relates to computer technology, espespecially a kind of central processing unit (CentralProcessingUnit, be called for short: CPU) and graphic process unit (GraphicsProcessingUnit, abbreviation: the GPU) method and system of load balancing.

Background technology

At present, along with the development of high-performance calculation application software, apply more and more higher to the demand of calculated performance, the computation schema of what increasing high-performance calculation application software adopted is CPU and GPU isomery cooperated computing, namely on traditional multiple nuclear CPU framework basis, GPU architecture processor is expanded, the architecture platform of composition CPU and GPU mixing, designs corresponding software solution based on this architecture platform, makes CPU and GPU effectively can carry out cooperated computing.Wherein, the problem of load balancing of CPU and GPU is that can the calculated performance being related to CPU and GPU mixed architecture platform perform to ultimate attainment key factor.

In prior art, be multiple computing node by CPU and GPU isomery cooperated computing assemblage classification, distributed computing method are adopted between node, CPU and GPU heterogeneous Computing is adopted in computing node, and be shared storage model for what adopt in equipment, when design (calculated) load is balanced, only need the load balancing between guarantee equipment.

But, adopt prior art, for CPU and GPU cooperated computing, because CPU and GPU computing power differs greatly, if give the task or data volume that CPU equipment is identical with GPU classification of equipment, then can increase the difficulty of CPU equipment and GPU equipment room load balancing.

Summary of the invention

In order to solve the problems of the technologies described above, the invention provides the method and system of a kind of central processing unit and graphic process unit load balancing, grouping can be carried out finish the work calculating to CPU equipment and GPU equipment, thus optimize the load balancing of CPU equipment and GPU equipment room.

First aspect, the method for central processing unit provided by the invention and graphic process unit load balancing, is applied to and comprises in CPU and the GPU isomery cooperated computing cluster of multiple computing node, comprising:

The GPU equipment identical with at least one at least one identical CPU equipment is numbered respectively, and wherein, all CPU of each computing node are as a CPU equipment, and one piece of GPU of each computing node is as a GPU equipment;

According to the memory headroom calculating each task needs, described CPU equipment and described GPU equipment are divided into groups according to number order respectively;

Each task is allocated to respectively a CPU device packets or a GPU device packets calculating.

Second aspect, the system of central processing unit provided by the invention and graphic process unit load balancing, is applied to and comprises in CPU and the GPU isomery cooperated computing cluster of multiple computing node, comprising: numbered cell, grouped element and computing unit;

Described numbered cell, is numbered respectively for the GPU equipment identical with at least one at least one identical CPU equipment, and wherein, all CPU of each computing node are as a CPU equipment, and one piece of GPU of each computing node is as a GPU equipment;

Described grouped element, for according to the memory headroom calculating each task needs, divides into groups according to number order respectively by described CPU equipment and described GPU equipment;

Described computing unit, for being allocated to a CPU device packets or a GPU device packets calculating respectively by each task.

Compared with prior art, the method and system of central processing unit provided by the invention and graphic process unit load balancing, be applied to and comprise in CPU and the GPU isomery cooperated computing cluster of multiple computing node, be numbered respectively by the GPU equipment identical with at least one at least one identical CPU equipment, again according to the memory headroom that each required by task of calculating is wanted, described CPU equipment and described GPU equipment are divided into groups according to number order respectively, then complete the calculating of each task to a CPU device packets or GPU device packets each task division, due to the CPU equipment in a group or GPU equipment all identical, namely there is identical memory headroom, therefore all there is identical computing power, the task data of division is distributed to each CPU equipment or each GPU equipment in group, make them can process identical task data simultaneously, thus complete the calculating of all tasks, and then optimize CPU and GPU equipment room load balancing.

Other features and advantages of the present invention will be set forth in the following description, and, partly become apparent from instructions, or understand by implementing the present invention.Object of the present invention and other advantages realize by structure specifically noted in instructions, claims and accompanying drawing and obtain.

Accompanying drawing explanation

Accompanying drawing is used to provide the further understanding to technical solution of the present invention, and forms a part for instructions, is used from and explains technical scheme of the present invention, do not form the restriction to technical solution of the present invention with the embodiment one of the application.

The process flow diagram of the embodiment of the method one of the central processing unit that Fig. 1 provides for the embodiment of the present invention and graphic process unit load balancing;

The central processing unit that Fig. 2 embodiment of the present invention provides and graphic process unit cooperated computing task division schematic diagram;

The system architecture diagram of the central processing unit that Fig. 3 provides for the embodiment of the present invention and graphic process unit load balancing.

Embodiment

For making the object, technical solutions and advantages of the present invention clearly understand, hereinafter will be described in detail to embodiments of the invention by reference to the accompanying drawings.It should be noted that, when not conflicting, the embodiment in the application and the feature in embodiment can combination in any mutually.

Can perform in the computer system of such as one group of computer executable instructions in the step shown in the process flow diagram of accompanying drawing.Further, although show logical order in flow charts, in some cases, can be different from the step shown or described by order execution herein.

Embodiments of the invention can be applicable on the platform of CPU and the GPU isomery cooperated computing cluster comprising multiple computing node, wherein CPU is identical polycaryon processor, is mainly used in the I/O operation, network service, part core calculations etc. of the logic control of program, data; What GPU was identical is many-core processor, is mainly used in the calculating of core missions, but not as limit.

The method that the embodiment of the present invention relates to, be intended to solve in prior art because CPU and GPU computing power differs greatly, if when giving the task or data volume that CPU equipment is identical with GPU classification of equipment, then increase the technical matters of the difficulty of CPU and GPU equipment room load balancing.

The process flow diagram of the embodiment of the method one of the central processing unit that Fig. 1 provides for the embodiment of the present invention and graphic process unit load balancing; The central processing unit that Fig. 2 provides for the embodiment of the present invention and graphic process unit cooperated computing task division schematic diagram; As Figure 1-Figure 2, comprising:

S101, the GPU equipment identical with at least one at least one identical CPU equipment are numbered respectively, and wherein, all CPU of each computing node are as a CPU equipment, and one piece of GPU card of each computing node is as a GPU equipment.

Concrete, one or more identical CPU and one or more identical GPU is comprised in CPU and GPU isomery cooperated computing cluster, wherein, all CPU on each computing node can regard an equipment as, every block GPU card can regard an equipment as, all CPU equipment and GPU equipment are numbered respectively, in order to better this programme is described, spy illustrates, suppose total N number of CPU on N number of node, then total N number of CPU equipment on N number of node, each node there is M block GPU card, then total M*N GPU equipment on N number of node, to each CPU device numbering, can be 1, 2, N, to each GPU device numbering, can be 1,2 ..., M*N, can also adopt letter or other coding, and only needing to ensure each CPU equipment and GPU equipment can have the numbering that can be identified, but is not limited to this.

S102, the memory headroom needed according to each task of calculating, divide into groups according to number order respectively by described CPU equipment and described GPU equipment.

Concrete, the data structure member of each task comprises task size, task T.T., unit task averaging time, relative performance speed-up ratio index etc., wherein some task can not be segmented, on a CPU equipment or a GPU equipment, available memory headroom does not meet its request memory calculated, multiple CPU equipment or GPU equipment is just needed jointly to calculate task, now according to the size calculating the memory headroom that each task needs, CPU equipment in multiple computing node can be divided into a CPU device packets according to number order, such as, if arrange 4 CPU equipment to be divided into a CPU device packets, the CPU equipment of No. 1-4 a CPU device packets can be assigned to according to number order, the CPU equipment of No. 5-8 assigns to a CPU device packets, because all CPU equipment is identical, namely there is identical memory headroom, the computing power of the CPU equipment in such group is consistent, thus identical task can be processed simultaneously, equally can according to the size calculating the memory headroom that each task needs, can be a GPU device packets by the identical GPU classification of equipment in multiple computing node, because all GPU equipment is identical, namely there is identical memory headroom, the computing power of the GPU equipment in such group is consistent, thus identical task can be processed simultaneously, such grouping effectively can utilize all CPU equipment and the computing power of GPU equipment, avoid and in CPU and GPU isomery cooperated computing cluster, adopt CPU and GPU heterogeneous Computing in a computing node in prior art and causing of task completes inefficient technical matters.

S103, each task is allocated to respectively a CPU device packets or a GPU device packets calculates.

Concrete, can adopt is multipoint interface (MultiPointInterface, be called for short: MPI) communication module dynamically divides each task, can by host process according to each CPU device packets or the existing task of GPU device packets as with reference to information, in system operation, according to the load condition of each CPU device packets or GPU device packets, adjust the distribution of task at any time, be allocated to a CPU device packets respectively or a GPU device packets calculates, make each computing node keep the balance of load as far as possible.

The method of the central processing unit that the embodiment of the present invention provides and graphic process unit load balancing, be applied to and comprise in CPU and the GPU isomery cooperated computing cluster of multiple computing node, be numbered respectively by the GPU equipment identical with at least one at least one identical CPU, again according to the memory headroom that each required by task of calculating is wanted, described CPU equipment and described GPU equipment are divided into groups according to number order respectively, then complete the calculating of each task to a CPU device packets or GPU device packets each task division, due to the CPU equipment in each group or GPU equipment identical, namely there is identical memory headroom, therefore all there is identical computing power, the task data of division is distributed to each CPU equipment or each GPU equipment in group, make them can process identical task data simultaneously, thus complete the calculating of all tasks, and then optimize CPU and GPU equipment room load balancing.

Further, on the basis of above-described embodiment, above-mentioned steps 102, comprising: according to the memory headroom of described CPU equipment and the size of memory headroom calculating each task needs, calculate the number of the described CPU equipment that each task needs;

CPU equipment task needed, as a CPU device packets, according to the number of the CPU number of devices calculated and total CPU equipment, obtains the grouping number of CPU equipment, is divided into groups by described CPU equipment according to described grouping number according to number order.

Concrete, according to the size of the memory headroom that memory headroom and each task of described CPU equipment need, can suppose that the number calculating the described CPU equipment that each task needs is GC, according to following formula: GC=(Mcom+MemC-1)/MemC, wherein, the size of the memory headroom of a computing node is MemC, the size of the memory headroom that each task needs is Mcom, the memory headroom that described each required by task is wanted is identical, then can calculate the number of the CPU equipment of each CPU device packets, if there is N number of computing node, each computing node has a CPU equipment, then all CPU equipment can be divided into N/GC group, when each CPU device packets calculates same task, can again to each this task data of CPU classification of equipment, because the CPU equipment in group is identical, namely there is identical memory headroom, the computing power of the CPU equipment now in group is consistent, static division can be adopted, computing velocity is faster.

Further, on the basis of above-described embodiment, above-mentioned steps 102, comprising: according to the size of the memory headroom of GPU and the memory headroom of each task needs, calculates the number of the described GPU equipment that each task needs;

GPU equipment task needed, as a GPU device packets, according to the number of the GPU number of devices calculated and total GPU equipment, obtains the grouping number of GPU equipment; According to described grouping number, described GPU equipment is divided into groups according to number order.

Concrete, according to the space size of the internal memory that memory headroom and each task of described GPU equipment need, can suppose that the number calculating the described GPU equipment that each task computation needs is GG, then can according to following formula: GG=(Mcom+MemG-1)/MemG, wherein, the memory size of a computing node is MemG, the size of the memory headroom that each task needs is Mcom, the memory headroom that described each required by task is wanted is identical, then can calculate the number of the GPU equipment of each GPU device packets, if there is N number of computing node, M GPU equipment is had in each computing node, then all GPU equipment can be divided into M*N/GC group, each GPU device packets calculates same task, can again to each this task data of GPU classification of equipment, because the GPU equipment in group is identical, the computing power of the GPU equipment now in group is consistent, static division can be adopted, computing velocity is faster.

Further, on the basis of above-described embodiment, above-mentioned steps 103, comprise: the numbering of each task is informed to be divided into CPU device packets or the group leader of GPU device packets, group leader is broadcast to each group member in grouping the mission number being divided into this grouping again, and the described CPU equipment or the described GPU equipment that receive same task numbering complete the calculating of corresponding task.

Be specially, be numbered to each task, can be specifically numeral number or alpha code, as long as there is unique information identified, then a group leader is set in each CPU device packets or GPU device packets, in general selection group, the minimum equipment of numbering is group leader, by host process, the numbering of each task is allocated to the group leader of each CPU device packets or GPU device packets, by group leader, the numbering of each task is broadcast to each group member, this broadcast message can comprise mission number information, the information such as the space that task needs, when the CPU equipment or GPU equipment that receive same task numbering then calculate this task simultaneously, until can immediately to host process request next task after finishing the work, till all tasks are disposed.

The system architecture diagram of the central processing unit that Fig. 3 provides for the embodiment of the present invention and graphic process unit load balancing, as shown in Figure 3, be applied to and comprise in CPU and the GPU isomery cooperated computing cluster (not shown) of multiple computing node, the system of central processing unit and graphic process unit load balancing, comprising: numbered cell 10, grouped element 20 and computing unit 30;

Wherein, numbered cell 10, is numbered respectively for the GPU equipment identical with at least one at least one identical CPU equipment, and wherein, all CPU of each computing node are as a CPU equipment, and one piece of GPU of each computing node is as a GPU equipment;

Grouped element 20, for according to the memory headroom calculating each task needs, divides into groups according to number order respectively by described CPU equipment and described GPU equipment;

Computing unit 30, for giving a CPU device packets or a GPU device packets by each task division.

The system of the central processing unit that the embodiment of the present invention provides and graphic process unit load balancing, be applied to and comprise in CPU and the GPU isomery cooperated computing cluster of multiple computing node, comprise: numbered cell, grouped element and computing unit, the GPU equipment identical with at least one at least one identical CPU equipment by numbered cell is numbered respectively, the memory headroom that grouped element is wanted according to each required by task of calculating, described CPU equipment and described GPU equipment are divided into groups according to number order respectively, then each task is differentiated and is allocated to the calculating that a CPU device packets or GPU device packets complete each task by computing unit, because the CPU equipment in each group is identical with GPU equipment, namely there is identical memory headroom, therefore there is identical computing power, the task data of division is distributed to each CPU equipment or each GPU equipment in group, make them can process identical task data simultaneously, thus complete the calculating of all tasks, and then optimize CPU and GPU equipment room load balancing.

Further, above-mentioned grouped element 20 also comprises: CPU device packets unit 210;

Described CPU device packets unit 210: for according to the memory headroom of described CPU equipment and the size calculating the memory headroom that each task needs, calculate the number of the described CPU equipment that each task needs;

CPU equipment task needed, as a CPU device packets, according to the number of the number calculated and total CPU equipment, obtains the grouping number of CPU equipment, is divided into groups by described CPU equipment according to described grouping number according to number order.

The system of the central processing unit that the embodiment of the present invention provides and graphic process unit load balancing, can embodiment to perform the above method, and it realizes principle and technique effect is similar, does not repeat them here.

Further, above-mentioned grouped element 20 also comprises: GPU device packets unit 220;

Described GPU device packets unit 220: for according to the memory headroom of described GPU equipment and the size calculating the memory headroom that each task needs, calculate the number of the described GPU equipment that each task needs;

GPU equipment task needed, as a GPU device packets, according to the number of the GPU number of devices calculated and total GPU equipment, obtains the grouping number of GPU equipment, is divided into groups by described GPU equipment according to described grouping number according to number order.

Further, described computing unit 30 is for being allocated to a CPU device packets respectively by each task or a GPU device packets specifically comprises:

The numbering of each task is informed to be divided into CPU device packets or the group leader of GPU device packets, group leader is broadcast to the mission number being divided into this grouping each group member in grouping again, and the described CPU equipment or the described GPU equipment that receive same task numbering complete the calculating of corresponding task.

Although the embodiment disclosed by the present invention is as above, the embodiment that described content only adopts for ease of understanding the present invention, and be not used to limit the present invention.Those of skill in the art belonging to any the present invention; under the prerequisite not departing from the spirit and scope disclosed by the present invention; any amendment and change can be carried out in the form implemented and details; but scope of patent protection of the present invention, the scope that still must define with appending claims is as the criterion.

Claims

1. a method for central processing unit and graphic process unit load balancing, is characterized in that, is applied to and comprises in CPU and the GPU isomery cooperated computing cluster of multiple computing node, comprising:

2. the method for central processing unit according to claim 1 and graphic process unit load balancing, is characterized in that, the described memory headroom according to calculating each task needs, is divided into groups by described CPU equipment, comprising according to number order:

According to the memory headroom of described CPU equipment and the size calculating the memory headroom that each task needs, calculate the number of the described CPU equipment that each task needs;

3. the method for central processing unit according to claim 1 and graphic process unit load balancing, is characterized in that, the described memory headroom according to calculating each task needs, is divided into groups by described GPU equipment, comprising according to number order:

According to the memory headroom of described GPU equipment and the size calculating the memory headroom that each task needs, calculate the number of the described GPU equipment that each task needs;

4. central processing unit described according to claim 1 and the method for graphic process unit load balancing, is characterized in that, describedly each task is allocated to respectively a CPU device packets or a GPU device packets calculates, and comprising:

The numbering of each task is informed to be divided into CPU device packets or the group leader of GPU device packets, group leader is broadcast to each group member in grouping the mission number being divided into this grouping again, and the described CPU equipment or the described GPU equipment that receive same task numbering complete the calculating of corresponding task.

5. a system for central processing unit and graphic process unit load balancing, is characterized in that, is applied to and comprises in CPU and the GPU isomery cooperated computing cluster of multiple computing node, comprising: numbered cell, grouped element and computing unit;

6. the system of central processing unit according to claim 5 and graphic process unit load balancing, is characterized in that, described grouped element also comprises: CPU device packets unit;

Described CPU device packets unit: for according to the memory headroom of described CPU equipment and the size calculating the memory headroom that each task needs, calculate the number of the described CPU equipment that each task needs;

7. the system of central processing unit according to claim 5 and graphic process unit load balancing, is characterized in that, described grouped element also comprises: GPU device packets unit;

Described GPU device packets unit: for according to the memory headroom of described GPU equipment and the size calculating the memory headroom that each task needs, calculate the number of the described GPU equipment that each task needs;

8. the system of central processing unit according to claim 7 and graphic process unit load balancing, is characterized in that, described computing unit is used for each task to be allocated to respectively a CPU device packets or a GPU device packets and calculates and specifically comprise: