CN106776455A - A kind of method and device of many GPU communications of unit - Google Patents

A kind of method and device of many GPU communications of unit Download PDF

Info

Publication number
CN106776455A
CN106776455A CN201611149576.XA CN201611149576A CN106776455A CN 106776455 A CN106776455 A CN 106776455A CN 201611149576 A CN201611149576 A CN 201611149576A CN 106776455 A CN106776455 A CN 106776455A
Authority
CN
China
Prior art keywords
gpu
direct
predetermined communication
communication data
connected relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611149576.XA
Other languages
Chinese (zh)
Other versions
CN106776455B (en
Inventor
张清
龚湛
宋书涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201611149576.XA priority Critical patent/CN106776455B/en
Publication of CN106776455A publication Critical patent/CN106776455A/en
Application granted granted Critical
Publication of CN106776455B publication Critical patent/CN106776455B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses the method and device of many GPU communications of unit, the method includes:Determine the direct-connected relation datas of GPU;The GPU that predetermined communication data and needs are communicated is determined according to data broadcasting, and the GPU comprising predetermined communication data is divided into first set in the GPU that needs are communicated, other GPU are divided into second set;GPU in first set transmits predetermined communication data according to the direct-connected relation datas of GPU to the GPU for having direct-connected relation with it in second set, and by during the GPU with predetermined communication data moves on to first set in second set, there is the remaining GPU with GPU in first set in the absence of direct-connected relation in second set is empty or second set;When there is residue GPU, CPU in second set predetermined communication data is transmitted to remaining GPU;Avoiding the data transfer between all GPU will cause CPU as bottleneck by CPU.

Description

A kind of method and device of many GPU communications of unit
Technical field
The present invention relates to technical field of data processing, the method and device of more particularly to a kind of many GPU communications of unit.
Background technology
It is tall and handsome since 2006 (to contain 128 streaming multiprocessings up to (NVIDIA) company release graphic process unit G80 Device) since, graphic process unit (GPU, Graphic Processing Unit) in the application of some Large-scale parallel computings, Performance is improved up to more than 100 times for CPU.GPU possesses more transistors, for data processing rather than picture CPU goes processing data cache and instruction to control like that, it means that GPU has huge computation capability.At GPU many-cores Reason device computing resource density is higher, and with calculating performance higher, double smart performances are more than 1TFlops.
With the development of high-performance calculation application software, using the demand more and more higher to calculating performance, CPU+GPU isomeries Cooperated computing brings the advantages such as performance higher, lower cost, increasing high-performance relative to traditional CPU cluster Calculate computation schema of the application software using CPU+GPU isomery cooperated computings.
CPU+GPU isomery cooperated computing frameworks in a calculate node as shown in figure 1, employ CPU+GPU isomeries Calculation.In the very huge application scenarios of some amounts of calculation, such as training of deep learning neutral net, multiple GPU must Data transmission bauds that must be between co-ordination, therefore many GPU is very big to the performance impact of whole application.How existing hard It is a problem that efficient data transfer is completed on the basis of part framework.
The content of the invention
It is an object of the invention to provide a kind of method and device of many GPU communications of unit, using GPU Direct technologies, keep away Having exempted from the data transfer between all GPU will cause CPU as bottleneck by CPU, while being entered according to specific hardware topology The rational path planning of row, realizes the high-speed communication between many GPU.
In order to solve the above technical problems, the present invention provides a kind of unit method that many GPU communicate, methods described includes:
Whole GPU are detected, the direct-connected relation datas of GPU are determined;
The GPU that predetermined communication data and needs are communicated is determined according to data broadcasting, and the GPU for being communicated will be needed In be divided into first set comprising the GPU of the predetermined communication data, the GPU not comprising the predetermined communication data is divided into the Two set;
GPU in the first set is according to the direct-connected relation datas of the GPU to having directly with it in the second set Even the GPU of relation transmits the predetermined communication data, described predetermined by having in the second set after data transfer is completed The GPU of communication data is moved in the first set, is existed in the second set is for empty or described second set Untill the remaining GPU that GPU in the first set does not exist direct-connected relation;
When there is the remaining GPU in the second set, CPU transmits the predetermined communication number to the remaining GPU According to.
Optionally, whole GPU are detected, determines the direct-connected relation datas of GPU, including:
The combination of all 2 pieces of GPU is traveled through using double circulation, the number of direct-connected relation is whether there is between obtaining any 2 GPU According to table.
Optionally, carried out data transmission by GPU Direct technologies between two GPU with direct-connected relation.
Optionally, CPU transmits the predetermined communication data to the remaining GPU, including:
Predetermined GPU in timestep in the first set by the internal memory of the predetermined communication data transfer to CPU, And by the predetermined communication data transfer to the remaining GPU from the internal memory.
The present invention also provides a kind of device of many GPU communications of unit, including:
Direct-connected relation detection module, for being detected to whole GPU, determines the direct-connected relation datas of GPU;
Set division module, for determining predetermined communication data according to data broadcasting and needing the GPU for being communicated, and will The GPU comprising the predetermined communication data is divided into first set in the GPU that needs are communicated, not comprising the predetermined communication The GPU of data is divided into second set;
Direct-connected data transmission module, for the GPU in the first set according to the direct-connected relation datas of the GPU to described The GPU for having direct-connected relation with it in second set transmits the predetermined communication data, after data transfer is completed, described second During the GPU with the predetermined communication data will be moved into the first set in set, until the second set for empty or Untill there is the remaining GPU for not existing direct-connected relation with GPU in the first set in second set described in person;
Cpu data transport module, for when there is the remaining GPU in the second set, CPU to be to the residue GPU transmits the predetermined communication data.
Optionally, the direct-connected relation detection module is specially the combination that all 2 pieces of GPU are traveled through using double circulation, obtains To the module of the tables of data that whether there is direct-connected relation between any 2 GPU.
Optionally, direct-connected data transmission module includes:
Direct-connected data transmission unit, for the GPU in the first set according to the direct-connected relation datas of the GPU to described The GPU for having direct-connected relation with it in second set passes through predetermined communication data described in GPU Direct technical transmissions.
Optionally, cpu data transport module be specially when exist in the second set the remaining GPU and During timestep, the predetermined GPU in the first set by the internal memory of the predetermined communication data transfer to CPU, and from described By the predetermined communication data transfer to the module in the remaining GPU in internal memory.
A kind of method of many GPU communications of unit provided by the present invention, the method includes:Determine the direct-connected relation datas of GPU; The GPU that predetermined communication data and needs are communicated is determined according to data broadcasting, and will need to be included in advance in the GPU for being communicated The GPU for determining communication data is divided into first set, and other GPU are divided into second set;GPU in first set is straight according to GPU Even relation data transmits predetermined communication data to the GPU for having direct-connected relation with it in second set, and will have in second set The GPU for having predetermined communication data is moved on in first set, is existed and first set in second set is for empty or second set Remaining GPUs of the middle GPU in the absence of direct-connected relation;Communicated to remaining GPU transmission is predetermined when there is residue GPU, CPU in second set Data;
It can be seen that, the method utilizes GPU Direct technologies, it is to avoid the data transfer between all GPU will be by CPU Causing CPU turns into bottleneck, while carrying out rational path planning according to specific hardware topology, realizes that the high speed between many GPU is led to Letter;The present invention also provides a kind of device of many GPU communications of unit, with above-mentioned beneficial effect, will not be repeated here.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this Inventive embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis The accompanying drawing of offer obtains other accompanying drawings.
The CPU+GPU isomery cooperated computing configuration diagrams that Fig. 1 is provided by the embodiment of the present invention;
Fig. 2 by many GPU communications of unit that the embodiment of the present invention is provided method flow chart;
Fig. 3 plans schematic diagram by many GPU communication paths that the embodiment of the present invention is provided;
Fig. 4 by many GPU communications of unit that the embodiment of the present invention is provided system structured flowchart.
Specific embodiment
Core of the invention is to provide a kind of method and device of many GPU communications of unit, using GPU Direct technologies, keeps away Having exempted from the data transfer between all GPU will cause CPU as bottleneck by CPU, while being entered according to specific hardware topology The rational path planning of row, realizes the high-speed communication between many GPU.
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is A part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
The present embodiment realizes the high speed in a calculate node between multiple GPU using the direct-connected methods without CPU of GPU Communication, such that it is able to avoid the data transfer between all GPU that CPU will be caused by CPU as bottleneck;And according to difference Hardware topology select suitable communication path, reach the optimization for calculating performance.The control process of the method can be Realized in GPU many-core processors.Wherein, GPU many-core processors possess superpower computing capability, are mainly used in the meter of core missions Calculate.Specifically refer to Fig. 2, Fig. 2 by many GPU communications of unit that the embodiment of the present invention is provided method flow chart;The side Method can include:
S100, whole GPU are detected, determine the direct-connected relation datas of GPU;
Specifically, the hardware connection of detection whole GPU, so that it is determined that the direct-connected relation datas of GPU.Determine that there is direct-connected pass The GPU of system.The direct-connected relation datas of the GPU can be stored in table form, it is also possible to be stored with the relation of mapping table. The direct-connected relation datas of GPU can only comprising the corresponding GPU with direct-connected relation, it is also possible to comprising the correspondence with direct-connected relation The GPU and corresponding GPU without direct-connected relation.As long as can selected one by one according to the direct-connected relation datas of GPU The GPU that there is direct-connected relation with it is assured that after GPU.Therefore, the present embodiment is not to the direct-connected relation datas of GPU Content and form is defined.
In order to further improve the direct-connected relation detections of GPU, it is preferred that whole GPU are detected, the direct-connected passes of GPU are determined Coefficient evidence can include:
The combination of all 2 pieces of GPU is traveled through using double circulation, the number of direct-connected relation is whether there is between obtaining any 2 GPU According to table.
Specifically, for having the node of N block GPU cards, the combination of all 2 pieces of cards is traveled through with double circulation to take office With the presence or absence of the tables of data of direct-connected relation between 2 GPU of meaning.
S110, predetermined communication data and the GPU that is communicated of needs are determined according to data broadcasting, and will need to be communicated GPU in be divided into first set comprising the GPU of the predetermined communication data, the GPU not comprising the predetermined communication data draws It is divided into second set;
Specifically, refer to Fig. 3, GPU broadcast communication original states are represented, GPU0 is comprised only in first set, remaining is not GPU containing predetermined communication data is in second set.
GPU in S120, the first set is according to the direct-connected relation datas of the GPU to having with it in the second set The GPU for having direct-connected relation transmits the predetermined communication data, described by having in the second set after data transfer is completed The GPU of predetermined communication data is moved in the first set, until the second set is in empty or described second set Untill the remaining GPU for not existing direct-connected relation with GPU in the first set;
Specifically, any two can be by direct communication (such as GPU between having the GPU of direct-connected relation in the present embodiment Direct technologies) carry out data transmission.Here the direct communication between GPU is often to be provided with multiple in a calculate node GPU, for the huge application program of speed-up computation amount, such as training of deep learning neutral net.Can be by GPU between GPU Direct technologies carry out direct data transfer and without being used as intermediary by CPU, greatly improve data transmission bauds.
Specifically, because in general the physical topology between GPU and CPU is tree-shaped therefore traditional transmission side data Method also more uses the tree-shaped communication mode consistent with physical topology.But this kind of communication mode natively cause tree root node into It is communication performance bottleneck, communication path here is planned to a kind of ring communication mode in logic independently of on physical topology, Can solve the problem that the bottleneck problem of communication bandwidth.The present embodiment is divided and direct-connected according to the set after division and GPU by gathering Relation data determines communication path;Suitable communication path planning is selected according to different hardware topologies, so as to realize The many GPU of unit calculate the optimization of performance.Can be detected by the direct-connected relations of GPU, carry out active path planning, realized logical at a high speed Letter.
The process of above-mentioned steps is exemplified below:
Assuming that there is N blocks GPU card needs to be communicated, it is 0,1 to be numbered ..., N-1.Data broadcasting is needed data D (predetermined communication data) is transferred to remaining all GPU from GPU0.
Direct-connected relation table between any 2 GPU for building first, builds two set, and the GPU in a set has contained There is data D, be designated as first set;What another was gathered does not contain data D, is designated as second set.In each timestep, the GPU in one set transmits data D by direct-connected relation table to the GPU in second set, and after completing transmission, a part of GPU is (i.e. GPU with data D) enter in first set from second set, until second set is empty set or second set and the In the absence of untill direct-connected relation between one set.
S130, when there is the remaining GPU in the second set, CPU is described predetermined logical to the remaining GPU transmission Letter data.
Wherein, if second set is not sky, first set must transmit predetermined communication number by CPU to second set According to.
Specifically, there is the remaining GPU in working as the second set, and in timestep in the first set Predetermined GPU by the internal memory of the predetermined communication data transfer to CPU, and by the predetermined communication data transfer from the internal memory To in the remaining GPU.Here predetermined GPU can be any GPU for choosing in first set.Therefore, the present embodiment is not Specific predetermined GPU is defined.And CPU here to the remaining GPU in second set when transmitting predetermined communication data, can be with It is that CPU transmits predetermined communication data to a residue GPU in second set every time, and after data transfer is completed, this is had The remaining GPU for having predetermined communication data moves to first set, untill second set is for sky.I.e. the present embodiment is at each Timestep, arbitrarily chooses a GPU from first set, the internal memory of data transfer to host CPU, from host CPU It is transferred in internal memory in second set in any one GPU, and updates first set and second set.Process before repeating, directly To second set is for sky.
Said process can be illustrated according to Fig. 3, if it is GPU1 and GPU2 to have the GPU of direct-connected relation with GPU0, then existed First timestep, GPU0 by predetermined communication data transfer to GPU1 and GPU2, after completing data transfer, by GPU1 and GPU2 First set is moved to, at this moment there is GPU1 the GPU of direct-connected relation to be for GPU4 and GPU5, GPU2 have the GPU of direct-connected relation GPU6, then, in second timestep, GPU1 is by predetermined communication data transfer to GPU4 and GPU5, GPU2 by predetermined communication number According to being transferred to GPU6.Now, if GPU4, GPU5 and GPU6 and GPU7 do not have direct-connected relation, then GPU7 is residue GPU, this When, GPU0 is chosen by the internal memory of predetermined communication data transfer to CPU, and by the predetermined communication data transfer from the internal memory To in the remaining GPU7, the transmission that predetermined communication data is realized in many GPU communications is finally completed.
Based on above-mentioned technical proposal, the method for many GPU communications of unit that the embodiment of the present invention is carried can be by checking GPU Between direct-connected relation determine optimum data transmission path, it is to avoid the data transfer between all GPU will be caused by CPU CPU turns into bottleneck, GPU bandwidth resources is fully used, and realizes the high speed data delivery between many GPU.
The device of GPU communications many to unit provided in an embodiment of the present invention is introduced below, and unit described below is more The method that the device GPUs many with above-described unit of GPU communications communicate can be mutually to should refer to.
Refer to Fig. 4, Fig. 4 by many GPU communications of unit that the embodiment of the present invention is provided system structured flowchart;The dress Putting to include:
Direct-connected relation detection module 100, for being detected to whole GPU, determines the direct-connected relation datas of GPU;
Set division module 200, for determining predetermined communication data according to data broadcasting and needing the GPU for being communicated, And the GPU comprising the predetermined communication data is divided into first set in needing the GPU for being communicated, not comprising described predetermined The GPU of communication data is divided into second set;
Direct-connected data transmission module 300, for the GPU in the first set according to the direct-connected relation datas of the GPU to The GPU for having direct-connected relation with it in the second set transmits the predetermined communication data, described after data transfer is completed During the GPU with the predetermined communication data will be moved into the first set in second set, until the second set is Exist in the empty or second set with GPU in the first set in the absence of direct-connected relation remaining GPU untill;
Cpu data transport module 400, for when there is the remaining GPU in the second set, CPU to be to described surplus Remaining GPU transmits the predetermined communication data.
Based on above-described embodiment, the direct-connected relation detection module 100 is specially all 2 pieces using double circulation traversal The combination of GPU, whether there is the module of the tables of data of direct-connected relation between obtaining any 2 GPU.
Based on above-described embodiment, direct-connected data transmission module 300 can include:
Direct-connected data transmission unit, for the GPU in the first set according to the direct-connected relation datas of the GPU to described The GPU for having direct-connected relation with it in second set passes through predetermined communication data described in GPU Direct technical transmissions.
Based on above-described embodiment, there is the residue in cpu data transport module 400 in specially working as the second set GPU and in timestep, the predetermined GPU in the first set by the internal memory of the predetermined communication data transfer to CPU, and By the predetermined communication data transfer to the module in the remaining GPU from the internal memory.
Each embodiment is described by the way of progressive in specification, and what each embodiment was stressed is and other realities Apply the difference of example, between each embodiment identical similar portion mutually referring to.For device disclosed in embodiment Speech, because it is corresponded to the method disclosed in Example, so description is fairly simple, related part is referring to method part illustration .
Professional further appreciates that, with reference to the unit of each example of the embodiments described herein description And algorithm steps, can be realized with electronic hardware, computer software or the combination of the two, in order to clearly demonstrate hardware and The interchangeability of software, generally describes the composition and step of each example according to function in the above description.These Function is performed with hardware or software mode actually, depending on the application-specific and design constraint of technical scheme.Specialty Technical staff can realize described function to each specific application using distinct methods, but this realization should not Think beyond the scope of this invention.
The step of method or algorithm for being described with reference to the embodiments described herein, directly can be held with hardware, processor Capable software module, or the two combination is implemented.Software module can be placed in random access memory (RAM), internal memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In field in known any other form of storage medium.
The method and device of GPU communications many to unit provided by the present invention is described in detail above.Herein should Principle of the invention and implementation method are set forth with specific case, the explanation of above example is only intended to help and manages The solution method of the present invention and its core concept.It should be pointed out that for those skilled in the art, not departing from On the premise of the principle of the invention, some improvement and modification can also be carried out to the present invention, these are improved and modification also falls into this hair In bright scope of the claims.

Claims (8)

1. the method that a kind of many GPU of unit communicate, it is characterised in that methods described includes:
Whole GPU are detected, the direct-connected relation datas of GPU are determined;
The GPU that predetermined communication data and needs are communicated is determined according to data broadcasting, and will need to be wrapped in the GPU for being communicated GPU containing the predetermined communication data is divided into first set, and the GPU not comprising the predetermined communication data is divided into the second collection Close;
GPU in the first set has direct-connected pass in the second set according to the direct-connected relation datas of the GPU with it The GPU of system transmits the predetermined communication data, after data transfer is completed, will have the predetermined communication in the second set The GPU of data is moved in the first set, is existed and institute in the second set is for empty or described second set State GPU in first set in the absence of direct-connected relation remaining GPU untill;
When there is the remaining GPU in the second set, CPU transmits the predetermined communication data to the remaining GPU.
2. method according to claim 1, it is characterised in that whole GPU are detected, the direct-connected relation numbers of GPU are determined According to, including:
The combination of all 2 pieces of GPU is traveled through using double circulation, the data of direct-connected relation are whether there is between obtaining any 2 GPU Table.
3. method according to claim 2, it is characterised in that pass through GPU between two GPU with direct-connected relation Direct technologies carry out data transmission.
4. method according to claim 3, it is characterised in that CPU transmits the predetermined communication number to the remaining GPU According to, including:
Predetermined GPU in timestep in the first set by the internal memory of the predetermined communication data transfer to CPU, and from By in the predetermined communication data transfer to the remaining GPU in the internal memory.
5. the device that a kind of many GPU of unit communicate, it is characterised in that including:
Direct-connected relation detection module, for being detected to whole GPU, determines the direct-connected relation datas of GPU;
Set division module, for determining predetermined communication data according to data broadcasting and needing the GPU for being communicated, and will need GPU comprising the predetermined communication data in the GPU for being communicated is divided into first set, not comprising the predetermined communication data GPU be divided into second set;
Direct-connected data transmission module, for the GPU in the first set according to the direct-connected relation datas of the GPU to described second The GPU for having direct-connected relation with it in set transmits the predetermined communication data, after data transfer is completed, the second set In there is the predetermined communication data GPU will be moved into the first set, until the second set is sky or institute State exist in second set with GPU in the first set in the absence of direct-connected relation remaining GPU untill;
Cpu data transport module, for when there is the remaining GPU in the second set, CPU to be passed to the remaining GPU The defeated predetermined communication data.
6. device according to claim 5, it is characterised in that the direct-connected relation detection module is specially and is followed using dual Ring travels through the combination of all 2 pieces of GPU, and the module of the tables of data of direct-connected relation is whether there is between obtaining any 2 GPU.
7. device according to claim 6, it is characterised in that direct-connected data transmission module includes:
Direct-connected data transmission unit, for the GPU in the first set according to the direct-connected relation datas of the GPU to described second The GPU for having direct-connected relation with it in set passes through predetermined communication data described in GPU Direct technical transmissions.
8. device according to claim 7, it is characterised in that cpu data transport module is specially when the second set In there is the remaining GPU and in timestep, the predetermined GPU in the first set is by the predetermined communication data transfer To the internal memory of CPU, and by the predetermined communication data transfer to the module in the remaining GPU from the internal memory.
CN201611149576.XA 2016-12-13 2016-12-13 Single-machine multi-GPU communication method and device Active CN106776455B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611149576.XA CN106776455B (en) 2016-12-13 2016-12-13 Single-machine multi-GPU communication method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611149576.XA CN106776455B (en) 2016-12-13 2016-12-13 Single-machine multi-GPU communication method and device

Publications (2)

Publication Number Publication Date
CN106776455A true CN106776455A (en) 2017-05-31
CN106776455B CN106776455B (en) 2020-08-21

Family

ID=58876868

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611149576.XA Active CN106776455B (en) 2016-12-13 2016-12-13 Single-machine multi-GPU communication method and device

Country Status (1)

Country Link
CN (1) CN106776455B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933433A (en) * 2019-03-19 2019-06-25 合肥中科类脑智能技术有限公司 A kind of GPU resource scheduling system and its dispatching method
CN110377537A (en) * 2019-06-25 2019-10-25 苏州浪潮智能科技有限公司 A kind of data transmission method, device and medium based on high speed signal switching chip
CN110389928A (en) * 2019-06-25 2019-10-29 苏州浪潮智能科技有限公司 A kind of data transmission method, device and medium based on high speed signal switching chip
CN110569312A (en) * 2019-11-06 2019-12-13 创业慧康科技股份有限公司 big data rapid retrieval system based on GPU and use method thereof
CN113395216A (en) * 2020-03-11 2021-09-14 辉达公司 Techniques to transfer data between hardware devices
CN114359015A (en) * 2021-12-08 2022-04-15 北京百度网讯科技有限公司 Data transmission method and device and graphic processing server

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070103475A1 (en) * 2005-11-10 2007-05-10 Via Technologies, Inc. Interruptible GPU and method for processing multiple contexts and runlists
CN102007479A (en) * 2008-03-31 2011-04-06 先进微装置公司 Peer-to-peer special purpose processor architecture and method
CN103049421A (en) * 2012-12-11 2013-04-17 百度在线网络技术(北京)有限公司 Method and device for data transmission between central processing unit (CPU) and co-processors
CN104035751A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Graphics processing unit based parallel data processing method and device
CN105467443A (en) * 2015-12-09 2016-04-06 中国科学院地质与地球物理研究所 A three-dimensional anisotropy elastic wave numerical simulation method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070103475A1 (en) * 2005-11-10 2007-05-10 Via Technologies, Inc. Interruptible GPU and method for processing multiple contexts and runlists
CN102007479A (en) * 2008-03-31 2011-04-06 先进微装置公司 Peer-to-peer special purpose processor architecture and method
CN103049421A (en) * 2012-12-11 2013-04-17 百度在线网络技术(北京)有限公司 Method and device for data transmission between central processing unit (CPU) and co-processors
CN104035751A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Graphics processing unit based parallel data processing method and device
CN105467443A (en) * 2015-12-09 2016-04-06 中国科学院地质与地球物理研究所 A three-dimensional anisotropy elastic wave numerical simulation method and system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933433A (en) * 2019-03-19 2019-06-25 合肥中科类脑智能技术有限公司 A kind of GPU resource scheduling system and its dispatching method
CN109933433B (en) * 2019-03-19 2021-06-25 合肥中科类脑智能技术有限公司 GPU resource scheduling system and scheduling method thereof
CN110377537A (en) * 2019-06-25 2019-10-25 苏州浪潮智能科技有限公司 A kind of data transmission method, device and medium based on high speed signal switching chip
CN110389928A (en) * 2019-06-25 2019-10-29 苏州浪潮智能科技有限公司 A kind of data transmission method, device and medium based on high speed signal switching chip
CN110569312A (en) * 2019-11-06 2019-12-13 创业慧康科技股份有限公司 big data rapid retrieval system based on GPU and use method thereof
CN113395216A (en) * 2020-03-11 2021-09-14 辉达公司 Techniques to transfer data between hardware devices
CN113395216B (en) * 2020-03-11 2024-04-09 辉达公司 Techniques for transferring data between hardware devices
CN114359015A (en) * 2021-12-08 2022-04-15 北京百度网讯科技有限公司 Data transmission method and device and graphic processing server
CN114359015B (en) * 2021-12-08 2023-08-04 北京百度网讯科技有限公司 Data transmission method, device and graphic processing server

Also Published As

Publication number Publication date
CN106776455B (en) 2020-08-21

Similar Documents

Publication Publication Date Title
CN106776455A (en) A kind of method and device of many GPU communications of unit
CN106415515B (en) Grouping is sent using the PIO of the optimization without SFENCE write-in sequence
CN106056529B (en) Method and equipment for training convolutional neural network for picture recognition
CN103049241B (en) A kind of method improving CPU+GPU isomery device calculated performance
CN108563808A (en) The design method of heterogeneous reconfigurable figure computation accelerator system based on FPGA
CN105159610B (en) Large-scale data processing system and method
CN103559084B (en) A kind of virtual machine migration method at Energy-saving Data center
CN103763173B (en) Data transmission method and calculate node
CN103197979B (en) Method and device for realizing data interaction access among processes
CN103631878B (en) A kind of massive data of graph structure processing method, device and system
CN105956659A (en) Data processing device, data processing system and server
CN107301455A (en) Mixing cube storage system and speed-up computation method for convolutional neural networks
CN107622519A (en) Threedimensional model hybrid rending system and method based on mobile device
CN102761489B (en) Inter-core communication method realizing data packet zero-copying based on pipelining mode
CN107122490A (en) The data processing method and system of aggregate function in a kind of Querying by group
CN108932588A (en) A kind of the GROUP OF HYDROPOWER STATIONS Optimal Scheduling and method of front and back end separation
CN106453618A (en) Remote sensing image processing service cloud platform system based on G-Cloud cloud computing
CN107992572A (en) A kind of distributed graph coloring algorithm based on Pregel
CN107402902A (en) A kind of heterogeneous computing platforms and the accelerated method based on heterogeneous computing platforms
CN104699946A (en) Game scene management method and device
CN107463448A (en) A kind of deep learning weight renewing method and system
CN104125293B (en) A kind of Cloud Server and its application method
CN107391402A (en) A kind of data operating method, device and a kind of data operation card
CN103533090A (en) Mapping method and device for simulating single physical network port into multiple logical network ports
CN106776023A (en) A kind of self adaptation GPU unifications dyeing array task load equalization methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200723

Address after: 215100 No. 1 Guanpu Road, Guoxiang Street, Wuzhong Economic Development Zone, Suzhou City, Jiangsu Province

Applicant after: SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd.

Address before: 450018 Henan province Zheng Dong New District of Zhengzhou City Xinyi Road No. 278 16 floor room 1601

Applicant before: ZHENGZHOU YUNHAI INFORMATION TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant