CN106776455A - A kind of method and device of many GPU communications of unit - Google Patents
A kind of method and device of many GPU communications of unit Download PDFInfo
- Publication number
- CN106776455A CN106776455A CN201611149576.XA CN201611149576A CN106776455A CN 106776455 A CN106776455 A CN 106776455A CN 201611149576 A CN201611149576 A CN 201611149576A CN 106776455 A CN106776455 A CN 106776455A
- Authority
- CN
- China
- Prior art keywords
- gpu
- direct
- predetermined communication
- communication data
- connected relation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004891 communication Methods 0.000 title claims abstract description 107
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000012546 transfer Methods 0.000 claims abstract description 33
- 241001269238 Data Species 0.000 claims abstract description 24
- 230000005540 biological transmission Effects 0.000 claims description 23
- 238000005516 engineering process Methods 0.000 claims description 10
- 238000001514 detection method Methods 0.000 claims description 8
- 230000009977 dual effect Effects 0.000 claims 1
- 230000008569 process Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000007935 neutral effect Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention discloses the method and device of many GPU communications of unit, the method includes:Determine the direct-connected relation datas of GPU;The GPU that predetermined communication data and needs are communicated is determined according to data broadcasting, and the GPU comprising predetermined communication data is divided into first set in the GPU that needs are communicated, other GPU are divided into second set;GPU in first set transmits predetermined communication data according to the direct-connected relation datas of GPU to the GPU for having direct-connected relation with it in second set, and by during the GPU with predetermined communication data moves on to first set in second set, there is the remaining GPU with GPU in first set in the absence of direct-connected relation in second set is empty or second set;When there is residue GPU, CPU in second set predetermined communication data is transmitted to remaining GPU;Avoiding the data transfer between all GPU will cause CPU as bottleneck by CPU.
Description
Technical field
The present invention relates to technical field of data processing, the method and device of more particularly to a kind of many GPU communications of unit.
Background technology
It is tall and handsome since 2006 (to contain 128 streaming multiprocessings up to (NVIDIA) company release graphic process unit G80
Device) since, graphic process unit (GPU, Graphic Processing Unit) in the application of some Large-scale parallel computings,
Performance is improved up to more than 100 times for CPU.GPU possesses more transistors, for data processing rather than picture
CPU goes processing data cache and instruction to control like that, it means that GPU has huge computation capability.At GPU many-cores
Reason device computing resource density is higher, and with calculating performance higher, double smart performances are more than 1TFlops.
With the development of high-performance calculation application software, using the demand more and more higher to calculating performance, CPU+GPU isomeries
Cooperated computing brings the advantages such as performance higher, lower cost, increasing high-performance relative to traditional CPU cluster
Calculate computation schema of the application software using CPU+GPU isomery cooperated computings.
CPU+GPU isomery cooperated computing frameworks in a calculate node as shown in figure 1, employ CPU+GPU isomeries
Calculation.In the very huge application scenarios of some amounts of calculation, such as training of deep learning neutral net, multiple GPU must
Data transmission bauds that must be between co-ordination, therefore many GPU is very big to the performance impact of whole application.How existing hard
It is a problem that efficient data transfer is completed on the basis of part framework.
The content of the invention
It is an object of the invention to provide a kind of method and device of many GPU communications of unit, using GPU Direct technologies, keep away
Having exempted from the data transfer between all GPU will cause CPU as bottleneck by CPU, while being entered according to specific hardware topology
The rational path planning of row, realizes the high-speed communication between many GPU.
In order to solve the above technical problems, the present invention provides a kind of unit method that many GPU communicate, methods described includes:
Whole GPU are detected, the direct-connected relation datas of GPU are determined;
The GPU that predetermined communication data and needs are communicated is determined according to data broadcasting, and the GPU for being communicated will be needed
In be divided into first set comprising the GPU of the predetermined communication data, the GPU not comprising the predetermined communication data is divided into the
Two set;
GPU in the first set is according to the direct-connected relation datas of the GPU to having directly with it in the second set
Even the GPU of relation transmits the predetermined communication data, described predetermined by having in the second set after data transfer is completed
The GPU of communication data is moved in the first set, is existed in the second set is for empty or described second set
Untill the remaining GPU that GPU in the first set does not exist direct-connected relation;
When there is the remaining GPU in the second set, CPU transmits the predetermined communication number to the remaining GPU
According to.
Optionally, whole GPU are detected, determines the direct-connected relation datas of GPU, including:
The combination of all 2 pieces of GPU is traveled through using double circulation, the number of direct-connected relation is whether there is between obtaining any 2 GPU
According to table.
Optionally, carried out data transmission by GPU Direct technologies between two GPU with direct-connected relation.
Optionally, CPU transmits the predetermined communication data to the remaining GPU, including:
Predetermined GPU in timestep in the first set by the internal memory of the predetermined communication data transfer to CPU,
And by the predetermined communication data transfer to the remaining GPU from the internal memory.
The present invention also provides a kind of device of many GPU communications of unit, including:
Direct-connected relation detection module, for being detected to whole GPU, determines the direct-connected relation datas of GPU;
Set division module, for determining predetermined communication data according to data broadcasting and needing the GPU for being communicated, and will
The GPU comprising the predetermined communication data is divided into first set in the GPU that needs are communicated, not comprising the predetermined communication
The GPU of data is divided into second set;
Direct-connected data transmission module, for the GPU in the first set according to the direct-connected relation datas of the GPU to described
The GPU for having direct-connected relation with it in second set transmits the predetermined communication data, after data transfer is completed, described second
During the GPU with the predetermined communication data will be moved into the first set in set, until the second set for empty or
Untill there is the remaining GPU for not existing direct-connected relation with GPU in the first set in second set described in person;
Cpu data transport module, for when there is the remaining GPU in the second set, CPU to be to the residue
GPU transmits the predetermined communication data.
Optionally, the direct-connected relation detection module is specially the combination that all 2 pieces of GPU are traveled through using double circulation, obtains
To the module of the tables of data that whether there is direct-connected relation between any 2 GPU.
Optionally, direct-connected data transmission module includes:
Direct-connected data transmission unit, for the GPU in the first set according to the direct-connected relation datas of the GPU to described
The GPU for having direct-connected relation with it in second set passes through predetermined communication data described in GPU Direct technical transmissions.
Optionally, cpu data transport module be specially when exist in the second set the remaining GPU and
During timestep, the predetermined GPU in the first set by the internal memory of the predetermined communication data transfer to CPU, and from described
By the predetermined communication data transfer to the module in the remaining GPU in internal memory.
A kind of method of many GPU communications of unit provided by the present invention, the method includes:Determine the direct-connected relation datas of GPU;
The GPU that predetermined communication data and needs are communicated is determined according to data broadcasting, and will need to be included in advance in the GPU for being communicated
The GPU for determining communication data is divided into first set, and other GPU are divided into second set;GPU in first set is straight according to GPU
Even relation data transmits predetermined communication data to the GPU for having direct-connected relation with it in second set, and will have in second set
The GPU for having predetermined communication data is moved on in first set, is existed and first set in second set is for empty or second set
Remaining GPUs of the middle GPU in the absence of direct-connected relation;Communicated to remaining GPU transmission is predetermined when there is residue GPU, CPU in second set
Data;
It can be seen that, the method utilizes GPU Direct technologies, it is to avoid the data transfer between all GPU will be by CPU
Causing CPU turns into bottleneck, while carrying out rational path planning according to specific hardware topology, realizes that the high speed between many GPU is led to
Letter;The present invention also provides a kind of device of many GPU communications of unit, with above-mentioned beneficial effect, will not be repeated here.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this
Inventive embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis
The accompanying drawing of offer obtains other accompanying drawings.
The CPU+GPU isomery cooperated computing configuration diagrams that Fig. 1 is provided by the embodiment of the present invention;
Fig. 2 by many GPU communications of unit that the embodiment of the present invention is provided method flow chart;
Fig. 3 plans schematic diagram by many GPU communication paths that the embodiment of the present invention is provided;
Fig. 4 by many GPU communications of unit that the embodiment of the present invention is provided system structured flowchart.
Specific embodiment
Core of the invention is to provide a kind of method and device of many GPU communications of unit, using GPU Direct technologies, keeps away
Having exempted from the data transfer between all GPU will cause CPU as bottleneck by CPU, while being entered according to specific hardware topology
The rational path planning of row, realizes the high-speed communication between many GPU.
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
A part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art
The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
The present embodiment realizes the high speed in a calculate node between multiple GPU using the direct-connected methods without CPU of GPU
Communication, such that it is able to avoid the data transfer between all GPU that CPU will be caused by CPU as bottleneck;And according to difference
Hardware topology select suitable communication path, reach the optimization for calculating performance.The control process of the method can be
Realized in GPU many-core processors.Wherein, GPU many-core processors possess superpower computing capability, are mainly used in the meter of core missions
Calculate.Specifically refer to Fig. 2, Fig. 2 by many GPU communications of unit that the embodiment of the present invention is provided method flow chart;The side
Method can include:
S100, whole GPU are detected, determine the direct-connected relation datas of GPU;
Specifically, the hardware connection of detection whole GPU, so that it is determined that the direct-connected relation datas of GPU.Determine that there is direct-connected pass
The GPU of system.The direct-connected relation datas of the GPU can be stored in table form, it is also possible to be stored with the relation of mapping table.
The direct-connected relation datas of GPU can only comprising the corresponding GPU with direct-connected relation, it is also possible to comprising the correspondence with direct-connected relation
The GPU and corresponding GPU without direct-connected relation.As long as can selected one by one according to the direct-connected relation datas of GPU
The GPU that there is direct-connected relation with it is assured that after GPU.Therefore, the present embodiment is not to the direct-connected relation datas of GPU
Content and form is defined.
In order to further improve the direct-connected relation detections of GPU, it is preferred that whole GPU are detected, the direct-connected passes of GPU are determined
Coefficient evidence can include:
The combination of all 2 pieces of GPU is traveled through using double circulation, the number of direct-connected relation is whether there is between obtaining any 2 GPU
According to table.
Specifically, for having the node of N block GPU cards, the combination of all 2 pieces of cards is traveled through with double circulation to take office
With the presence or absence of the tables of data of direct-connected relation between 2 GPU of meaning.
S110, predetermined communication data and the GPU that is communicated of needs are determined according to data broadcasting, and will need to be communicated
GPU in be divided into first set comprising the GPU of the predetermined communication data, the GPU not comprising the predetermined communication data draws
It is divided into second set;
Specifically, refer to Fig. 3, GPU broadcast communication original states are represented, GPU0 is comprised only in first set, remaining is not
GPU containing predetermined communication data is in second set.
GPU in S120, the first set is according to the direct-connected relation datas of the GPU to having with it in the second set
The GPU for having direct-connected relation transmits the predetermined communication data, described by having in the second set after data transfer is completed
The GPU of predetermined communication data is moved in the first set, until the second set is in empty or described second set
Untill the remaining GPU for not existing direct-connected relation with GPU in the first set;
Specifically, any two can be by direct communication (such as GPU between having the GPU of direct-connected relation in the present embodiment
Direct technologies) carry out data transmission.Here the direct communication between GPU is often to be provided with multiple in a calculate node
GPU, for the huge application program of speed-up computation amount, such as training of deep learning neutral net.Can be by GPU between GPU
Direct technologies carry out direct data transfer and without being used as intermediary by CPU, greatly improve data transmission bauds.
Specifically, because in general the physical topology between GPU and CPU is tree-shaped therefore traditional transmission side data
Method also more uses the tree-shaped communication mode consistent with physical topology.But this kind of communication mode natively cause tree root node into
It is communication performance bottleneck, communication path here is planned to a kind of ring communication mode in logic independently of on physical topology,
Can solve the problem that the bottleneck problem of communication bandwidth.The present embodiment is divided and direct-connected according to the set after division and GPU by gathering
Relation data determines communication path;Suitable communication path planning is selected according to different hardware topologies, so as to realize
The many GPU of unit calculate the optimization of performance.Can be detected by the direct-connected relations of GPU, carry out active path planning, realized logical at a high speed
Letter.
The process of above-mentioned steps is exemplified below:
Assuming that there is N blocks GPU card needs to be communicated, it is 0,1 to be numbered ..., N-1.Data broadcasting is needed data D
(predetermined communication data) is transferred to remaining all GPU from GPU0.
Direct-connected relation table between any 2 GPU for building first, builds two set, and the GPU in a set has contained
There is data D, be designated as first set;What another was gathered does not contain data D, is designated as second set.In each timestep, the
GPU in one set transmits data D by direct-connected relation table to the GPU in second set, and after completing transmission, a part of GPU is (i.e.
GPU with data D) enter in first set from second set, until second set is empty set or second set and the
In the absence of untill direct-connected relation between one set.
S130, when there is the remaining GPU in the second set, CPU is described predetermined logical to the remaining GPU transmission
Letter data.
Wherein, if second set is not sky, first set must transmit predetermined communication number by CPU to second set
According to.
Specifically, there is the remaining GPU in working as the second set, and in timestep in the first set
Predetermined GPU by the internal memory of the predetermined communication data transfer to CPU, and by the predetermined communication data transfer from the internal memory
To in the remaining GPU.Here predetermined GPU can be any GPU for choosing in first set.Therefore, the present embodiment is not
Specific predetermined GPU is defined.And CPU here to the remaining GPU in second set when transmitting predetermined communication data, can be with
It is that CPU transmits predetermined communication data to a residue GPU in second set every time, and after data transfer is completed, this is had
The remaining GPU for having predetermined communication data moves to first set, untill second set is for sky.I.e. the present embodiment is at each
Timestep, arbitrarily chooses a GPU from first set, the internal memory of data transfer to host CPU, from host CPU
It is transferred in internal memory in second set in any one GPU, and updates first set and second set.Process before repeating, directly
To second set is for sky.
Said process can be illustrated according to Fig. 3, if it is GPU1 and GPU2 to have the GPU of direct-connected relation with GPU0, then existed
First timestep, GPU0 by predetermined communication data transfer to GPU1 and GPU2, after completing data transfer, by GPU1 and GPU2
First set is moved to, at this moment there is GPU1 the GPU of direct-connected relation to be for GPU4 and GPU5, GPU2 have the GPU of direct-connected relation
GPU6, then, in second timestep, GPU1 is by predetermined communication data transfer to GPU4 and GPU5, GPU2 by predetermined communication number
According to being transferred to GPU6.Now, if GPU4, GPU5 and GPU6 and GPU7 do not have direct-connected relation, then GPU7 is residue GPU, this
When, GPU0 is chosen by the internal memory of predetermined communication data transfer to CPU, and by the predetermined communication data transfer from the internal memory
To in the remaining GPU7, the transmission that predetermined communication data is realized in many GPU communications is finally completed.
Based on above-mentioned technical proposal, the method for many GPU communications of unit that the embodiment of the present invention is carried can be by checking GPU
Between direct-connected relation determine optimum data transmission path, it is to avoid the data transfer between all GPU will be caused by CPU
CPU turns into bottleneck, GPU bandwidth resources is fully used, and realizes the high speed data delivery between many GPU.
The device of GPU communications many to unit provided in an embodiment of the present invention is introduced below, and unit described below is more
The method that the device GPUs many with above-described unit of GPU communications communicate can be mutually to should refer to.
Refer to Fig. 4, Fig. 4 by many GPU communications of unit that the embodiment of the present invention is provided system structured flowchart;The dress
Putting to include:
Direct-connected relation detection module 100, for being detected to whole GPU, determines the direct-connected relation datas of GPU;
Set division module 200, for determining predetermined communication data according to data broadcasting and needing the GPU for being communicated,
And the GPU comprising the predetermined communication data is divided into first set in needing the GPU for being communicated, not comprising described predetermined
The GPU of communication data is divided into second set;
Direct-connected data transmission module 300, for the GPU in the first set according to the direct-connected relation datas of the GPU to
The GPU for having direct-connected relation with it in the second set transmits the predetermined communication data, described after data transfer is completed
During the GPU with the predetermined communication data will be moved into the first set in second set, until the second set is
Exist in the empty or second set with GPU in the first set in the absence of direct-connected relation remaining GPU untill;
Cpu data transport module 400, for when there is the remaining GPU in the second set, CPU to be to described surplus
Remaining GPU transmits the predetermined communication data.
Based on above-described embodiment, the direct-connected relation detection module 100 is specially all 2 pieces using double circulation traversal
The combination of GPU, whether there is the module of the tables of data of direct-connected relation between obtaining any 2 GPU.
Based on above-described embodiment, direct-connected data transmission module 300 can include:
Direct-connected data transmission unit, for the GPU in the first set according to the direct-connected relation datas of the GPU to described
The GPU for having direct-connected relation with it in second set passes through predetermined communication data described in GPU Direct technical transmissions.
Based on above-described embodiment, there is the residue in cpu data transport module 400 in specially working as the second set
GPU and in timestep, the predetermined GPU in the first set by the internal memory of the predetermined communication data transfer to CPU, and
By the predetermined communication data transfer to the module in the remaining GPU from the internal memory.
Each embodiment is described by the way of progressive in specification, and what each embodiment was stressed is and other realities
Apply the difference of example, between each embodiment identical similar portion mutually referring to.For device disclosed in embodiment
Speech, because it is corresponded to the method disclosed in Example, so description is fairly simple, related part is referring to method part illustration
.
Professional further appreciates that, with reference to the unit of each example of the embodiments described herein description
And algorithm steps, can be realized with electronic hardware, computer software or the combination of the two, in order to clearly demonstrate hardware and
The interchangeability of software, generally describes the composition and step of each example according to function in the above description.These
Function is performed with hardware or software mode actually, depending on the application-specific and design constraint of technical scheme.Specialty
Technical staff can realize described function to each specific application using distinct methods, but this realization should not
Think beyond the scope of this invention.
The step of method or algorithm for being described with reference to the embodiments described herein, directly can be held with hardware, processor
Capable software module, or the two combination is implemented.Software module can be placed in random access memory (RAM), internal memory, read-only deposit
Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology
In field in known any other form of storage medium.
The method and device of GPU communications many to unit provided by the present invention is described in detail above.Herein should
Principle of the invention and implementation method are set forth with specific case, the explanation of above example is only intended to help and manages
The solution method of the present invention and its core concept.It should be pointed out that for those skilled in the art, not departing from
On the premise of the principle of the invention, some improvement and modification can also be carried out to the present invention, these are improved and modification also falls into this hair
In bright scope of the claims.
Claims (8)
1. the method that a kind of many GPU of unit communicate, it is characterised in that methods described includes:
Whole GPU are detected, the direct-connected relation datas of GPU are determined;
The GPU that predetermined communication data and needs are communicated is determined according to data broadcasting, and will need to be wrapped in the GPU for being communicated
GPU containing the predetermined communication data is divided into first set, and the GPU not comprising the predetermined communication data is divided into the second collection
Close;
GPU in the first set has direct-connected pass in the second set according to the direct-connected relation datas of the GPU with it
The GPU of system transmits the predetermined communication data, after data transfer is completed, will have the predetermined communication in the second set
The GPU of data is moved in the first set, is existed and institute in the second set is for empty or described second set
State GPU in first set in the absence of direct-connected relation remaining GPU untill;
When there is the remaining GPU in the second set, CPU transmits the predetermined communication data to the remaining GPU.
2. method according to claim 1, it is characterised in that whole GPU are detected, the direct-connected relation numbers of GPU are determined
According to, including:
The combination of all 2 pieces of GPU is traveled through using double circulation, the data of direct-connected relation are whether there is between obtaining any 2 GPU
Table.
3. method according to claim 2, it is characterised in that pass through GPU between two GPU with direct-connected relation
Direct technologies carry out data transmission.
4. method according to claim 3, it is characterised in that CPU transmits the predetermined communication number to the remaining GPU
According to, including:
Predetermined GPU in timestep in the first set by the internal memory of the predetermined communication data transfer to CPU, and from
By in the predetermined communication data transfer to the remaining GPU in the internal memory.
5. the device that a kind of many GPU of unit communicate, it is characterised in that including:
Direct-connected relation detection module, for being detected to whole GPU, determines the direct-connected relation datas of GPU;
Set division module, for determining predetermined communication data according to data broadcasting and needing the GPU for being communicated, and will need
GPU comprising the predetermined communication data in the GPU for being communicated is divided into first set, not comprising the predetermined communication data
GPU be divided into second set;
Direct-connected data transmission module, for the GPU in the first set according to the direct-connected relation datas of the GPU to described second
The GPU for having direct-connected relation with it in set transmits the predetermined communication data, after data transfer is completed, the second set
In there is the predetermined communication data GPU will be moved into the first set, until the second set is sky or institute
State exist in second set with GPU in the first set in the absence of direct-connected relation remaining GPU untill;
Cpu data transport module, for when there is the remaining GPU in the second set, CPU to be passed to the remaining GPU
The defeated predetermined communication data.
6. device according to claim 5, it is characterised in that the direct-connected relation detection module is specially and is followed using dual
Ring travels through the combination of all 2 pieces of GPU, and the module of the tables of data of direct-connected relation is whether there is between obtaining any 2 GPU.
7. device according to claim 6, it is characterised in that direct-connected data transmission module includes:
Direct-connected data transmission unit, for the GPU in the first set according to the direct-connected relation datas of the GPU to described second
The GPU for having direct-connected relation with it in set passes through predetermined communication data described in GPU Direct technical transmissions.
8. device according to claim 7, it is characterised in that cpu data transport module is specially when the second set
In there is the remaining GPU and in timestep, the predetermined GPU in the first set is by the predetermined communication data transfer
To the internal memory of CPU, and by the predetermined communication data transfer to the module in the remaining GPU from the internal memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611149576.XA CN106776455B (en) | 2016-12-13 | 2016-12-13 | Single-machine multi-GPU communication method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611149576.XA CN106776455B (en) | 2016-12-13 | 2016-12-13 | Single-machine multi-GPU communication method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106776455A true CN106776455A (en) | 2017-05-31 |
CN106776455B CN106776455B (en) | 2020-08-21 |
Family
ID=58876868
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611149576.XA Active CN106776455B (en) | 2016-12-13 | 2016-12-13 | Single-machine multi-GPU communication method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106776455B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109933433A (en) * | 2019-03-19 | 2019-06-25 | 合肥中科类脑智能技术有限公司 | A kind of GPU resource scheduling system and its dispatching method |
CN110377537A (en) * | 2019-06-25 | 2019-10-25 | 苏州浪潮智能科技有限公司 | A kind of data transmission method, device and medium based on high speed signal switching chip |
CN110389928A (en) * | 2019-06-25 | 2019-10-29 | 苏州浪潮智能科技有限公司 | A kind of data transmission method, device and medium based on high speed signal switching chip |
CN110569312A (en) * | 2019-11-06 | 2019-12-13 | 创业慧康科技股份有限公司 | big data rapid retrieval system based on GPU and use method thereof |
CN113395216A (en) * | 2020-03-11 | 2021-09-14 | 辉达公司 | Techniques to transfer data between hardware devices |
CN114359015A (en) * | 2021-12-08 | 2022-04-15 | 北京百度网讯科技有限公司 | Data transmission method and device and graphic processing server |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070103475A1 (en) * | 2005-11-10 | 2007-05-10 | Via Technologies, Inc. | Interruptible GPU and method for processing multiple contexts and runlists |
CN102007479A (en) * | 2008-03-31 | 2011-04-06 | 先进微装置公司 | Peer-to-peer special purpose processor architecture and method |
CN103049421A (en) * | 2012-12-11 | 2013-04-17 | 百度在线网络技术(北京)有限公司 | Method and device for data transmission between central processing unit (CPU) and co-processors |
CN104035751A (en) * | 2014-06-20 | 2014-09-10 | 深圳市腾讯计算机系统有限公司 | Graphics processing unit based parallel data processing method and device |
CN105467443A (en) * | 2015-12-09 | 2016-04-06 | 中国科学院地质与地球物理研究所 | A three-dimensional anisotropy elastic wave numerical simulation method and system |
-
2016
- 2016-12-13 CN CN201611149576.XA patent/CN106776455B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070103475A1 (en) * | 2005-11-10 | 2007-05-10 | Via Technologies, Inc. | Interruptible GPU and method for processing multiple contexts and runlists |
CN102007479A (en) * | 2008-03-31 | 2011-04-06 | 先进微装置公司 | Peer-to-peer special purpose processor architecture and method |
CN103049421A (en) * | 2012-12-11 | 2013-04-17 | 百度在线网络技术(北京)有限公司 | Method and device for data transmission between central processing unit (CPU) and co-processors |
CN104035751A (en) * | 2014-06-20 | 2014-09-10 | 深圳市腾讯计算机系统有限公司 | Graphics processing unit based parallel data processing method and device |
CN105467443A (en) * | 2015-12-09 | 2016-04-06 | 中国科学院地质与地球物理研究所 | A three-dimensional anisotropy elastic wave numerical simulation method and system |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109933433A (en) * | 2019-03-19 | 2019-06-25 | 合肥中科类脑智能技术有限公司 | A kind of GPU resource scheduling system and its dispatching method |
CN109933433B (en) * | 2019-03-19 | 2021-06-25 | 合肥中科类脑智能技术有限公司 | GPU resource scheduling system and scheduling method thereof |
CN110377537A (en) * | 2019-06-25 | 2019-10-25 | 苏州浪潮智能科技有限公司 | A kind of data transmission method, device and medium based on high speed signal switching chip |
CN110389928A (en) * | 2019-06-25 | 2019-10-29 | 苏州浪潮智能科技有限公司 | A kind of data transmission method, device and medium based on high speed signal switching chip |
CN110569312A (en) * | 2019-11-06 | 2019-12-13 | 创业慧康科技股份有限公司 | big data rapid retrieval system based on GPU and use method thereof |
CN113395216A (en) * | 2020-03-11 | 2021-09-14 | 辉达公司 | Techniques to transfer data between hardware devices |
CN113395216B (en) * | 2020-03-11 | 2024-04-09 | 辉达公司 | Techniques for transferring data between hardware devices |
CN114359015A (en) * | 2021-12-08 | 2022-04-15 | 北京百度网讯科技有限公司 | Data transmission method and device and graphic processing server |
CN114359015B (en) * | 2021-12-08 | 2023-08-04 | 北京百度网讯科技有限公司 | Data transmission method, device and graphic processing server |
Also Published As
Publication number | Publication date |
---|---|
CN106776455B (en) | 2020-08-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106776455A (en) | A kind of method and device of many GPU communications of unit | |
CN106415515B (en) | Grouping is sent using the PIO of the optimization without SFENCE write-in sequence | |
CN106056529B (en) | Method and equipment for training convolutional neural network for picture recognition | |
CN103049241B (en) | A kind of method improving CPU+GPU isomery device calculated performance | |
CN108563808A (en) | The design method of heterogeneous reconfigurable figure computation accelerator system based on FPGA | |
CN105159610B (en) | Large-scale data processing system and method | |
CN103559084B (en) | A kind of virtual machine migration method at Energy-saving Data center | |
CN103763173B (en) | Data transmission method and calculate node | |
CN103197979B (en) | Method and device for realizing data interaction access among processes | |
CN103631878B (en) | A kind of massive data of graph structure processing method, device and system | |
CN105956659A (en) | Data processing device, data processing system and server | |
CN107301455A (en) | Mixing cube storage system and speed-up computation method for convolutional neural networks | |
CN107622519A (en) | Threedimensional model hybrid rending system and method based on mobile device | |
CN102761489B (en) | Inter-core communication method realizing data packet zero-copying based on pipelining mode | |
CN107122490A (en) | The data processing method and system of aggregate function in a kind of Querying by group | |
CN108932588A (en) | A kind of the GROUP OF HYDROPOWER STATIONS Optimal Scheduling and method of front and back end separation | |
CN106453618A (en) | Remote sensing image processing service cloud platform system based on G-Cloud cloud computing | |
CN107992572A (en) | A kind of distributed graph coloring algorithm based on Pregel | |
CN107402902A (en) | A kind of heterogeneous computing platforms and the accelerated method based on heterogeneous computing platforms | |
CN104699946A (en) | Game scene management method and device | |
CN107463448A (en) | A kind of deep learning weight renewing method and system | |
CN104125293B (en) | A kind of Cloud Server and its application method | |
CN107391402A (en) | A kind of data operating method, device and a kind of data operation card | |
CN103533090A (en) | Mapping method and device for simulating single physical network port into multiple logical network ports | |
CN106776023A (en) | A kind of self adaptation GPU unifications dyeing array task load equalization methods |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200723 Address after: 215100 No. 1 Guanpu Road, Guoxiang Street, Wuzhong Economic Development Zone, Suzhou City, Jiangsu Province Applicant after: SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd. Address before: 450018 Henan province Zheng Dong New District of Zhengzhou City Xinyi Road No. 278 16 floor room 1601 Applicant before: ZHENGZHOU YUNHAI INFORMATION TECHNOLOGY Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |