CN105224502A - A kind of degree of depth learning method based on GPU and system - Google Patents

A kind of degree of depth learning method based on GPU and system Download PDF

Info

Publication number
CN105224502A
CN105224502A CN201510628858.7A CN201510628858A CN105224502A CN 105224502 A CN105224502 A CN 105224502A CN 201510628858 A CN201510628858 A CN 201510628858A CN 105224502 A CN105224502 A CN 105224502A
Authority
CN
China
Prior art keywords
gpu
cpu
neural network
network model
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510628858.7A
Other languages
Chinese (zh)
Inventor
张清
王娅娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201510628858.7A priority Critical patent/CN105224502A/en
Publication of CN105224502A publication Critical patent/CN105224502A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a kind of degree of depth learning method based on GPU and system, described system is one-of-a-kind system and comprises CPU and at least one GPU, and the method comprises: CPU transmission treats that training data is to each GPU; Each GPU treats training data described in utilizing, and forward-backward algorithm calculates the weight information of neural network model, and weight information is fed back to CPU; CPU upgrades neural network model according to weight information, and the neural network model after upgrading is transferred to each GPU, and circulation performs above-mentioned steps until complete the degree of depth learning process of neural network model.Above scheme performs forward-backward algorithm consuming time by the GPU with powerful computation capability and calculates, and have employed the collaborative deployment way of CPU and multiple GPU card, efficiently solve in prior art that to calculate long-acting rate consuming time low, system disposes the problem complicated, cost is high.

Description

A kind of degree of deep learning method based on GPU and system
Technical field
The present invention relates to high-performance calculation, degree of deep learning art and internet arena, particularly relate to a kind of degree of deep learning method based on GPU and system.
Background technology
Now, degree of depth study is a new field in machine learning research, and its motivation is the neural network set up, simulation human brain carries out analytic learning, and the mechanism that its imitates human brain carrys out decryption, such as image, sound and text.
2006, University of Toronto professor, machine learning field authority---GeoffreyHinton and his student delivered one section of article on top academic journals " science ", opened the tide of degree of depth study in academia and industry member.Since two thousand six, degree of depth study is persistently overheating in academia.Stanford University, New York University, Montreal, CAN university etc. become the important city of the depth of investigation study.2010, U.S. Department of Defense DARPA planned to subsidize degree of depth study project first, and participant has Stanford University, New York University and NEC American Studies institute.Supporting an important evidence of degree of depth study, is exactly that cerebral nervous system has abundant hierarchical structure really.A foremost example is exactly Hubel-Wiesel model, owing to disclosing the mechanism of optic nerve and once obtaining Nobel's medical science and physiology is encouraged.
Nowadays the well-known high-tech company having large data such as Google, Microsoft, Baidu falls over each other to drop into resource, capture the technology commanding elevation of degree of depth study, exactly because they all see at large data age, more complicated and more powerful depth model can disclose the complexity and abundant information that carry in mass data deeply, and predicts more accurately doing in future or unknown event.
At present, degree of depth study application comprises speech recognition, image recognition, natural language processing, search advertisements CTR estimate, very huge in the calculated amount of these application, it needs the extensive degree of depth to learn to calculate, but, in prior art, usually only utilize CPU to realize the calculating in degree of deep learning process, calculate length consuming time, efficiency is low.Further, existing degree of deep learning system needs deploying network devices to realize networking usually, but deploying network devices is very complicated and system cost is high.
Summary of the invention
In view of this, the invention provides a kind of degree of deep learning method based on GPU and system, to solve in prior art, to calculate long-acting rate consuming time low, and system disposes the problem complicated, cost is high.
For solving the problems of the technologies described above, the invention provides a kind of degree of deep learning method based on GPU, being applied to the degree of deep learning system based on GPU, described system is for one-of-a-kind system and described system comprises CPU and GPU described at least one, and the method comprises:
Described CPU transmission treats that training data is to each described GPU;
Treat training data described in each described GPU utilizes, forward-backward algorithm calculates the weight information of neural network model, and described weight information is fed back to described CPU;
Described CPU upgrades described neural network model according to described weight information, and the neural network model after upgrading is transferred to each described GPU, and circulation performs above-mentioned steps until complete the degree of deep learning process of described neural network model.
In said method, preferably, described CPU transmits data to be learned to each described GPU, comprising:
From SSD hard disk, treat that training data is to internal memory described in parallel reading;
Described in described internal memory, treat that training data transfers to each described GPU.
In said method, preferably, data transmission is carried out by PCIE interface between described CPU and each described GPU.
Present invention also offers a kind of degree of deep learning system based on GPU, described system is one-of-a-kind system, and this system comprises:
CPU and at least one GPU;
Wherein,
Described CPU is used for transmission and treats that training data is to each described GPU, and the weight information fed back according to described GPU upgrades neural network model, and the neural network model after upgrading is transferred to each described GPU;
Each described GPU treats training data described in utilizing, and forward-backward algorithm calculates the weight information of described neural network model, and described weight information is fed back to described CPU; Circulation performs above-mentioned steps until complete the degree of deep learning process of described neural network model.
In said system, preferably, also comprise:
SSD hard disk and internal memory;
Described CPU treats that training data is to described internal memory described in parallel reading from described SSD hard disk; Described in described internal memory, treat that training data transfers to each described GPU.
In said system, preferably, also comprise:
PCIE interface;
Data transmission is carried out by described PCIE interface between described CPU and each described GPU.
In said system, preferably, the quantity of described CPU is two, and at least one GPU described comprises eight GPU.
In said system, preferably, described eight GPU are specially four pieces of GPU cards, and GPU card described in every block comprises 2 GPU chips.
A kind of degree of deep learning method based on GPU provided by the invention have employed the collaborative high density account form of CPU and multiple GPU card above, particularly, perform forward-backward algorithm consuming time by the GPU with powerful computation capability to calculate, remaining parameter renewal calculating according to the algorithm characteristic of degree of depth study application, digital independent and distribution, neural network model upgrade to calculate and are then completed by CPU; Thus accelerate the processing time of data depth study application, improve counting yield.
A kind of degree of deep learning system based on GPU provided by the invention is one-of-a-kind system above, do not need deploying network devices to network, particularly, have employed the deployment way that CPU and multiple GPU card are collaborative, insert multiple GPU card in one-of-a-kind system, Hardware Subdivision management side just, cost is low.
To sum up, the invention provides the degree of deep learning method based on GPU and system that the many GPU of a kind of unit walk abreast, efficiently solve in prior art that to calculate long-acting rate consuming time low, system disposes the problem complicated, cost is high.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only embodiments of the invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to the accompanying drawing provided.
The process flow diagram of a kind of degree of deep learning method based on GPU that Fig. 1 provides for the embodiment of the present invention;
The data interaction figure based on Fig. 1 that Fig. 2 provides for the embodiment of the present invention;
A kind of hardware design Organization Chart that Fig. 3 provides for the embodiment of the present invention;
A kind of software design architecture figure that Fig. 4 provides for the embodiment of the present invention;
The structured flowchart schematic diagram of a kind of degree of deep learning system based on GPU that Fig. 5 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Core of the present invention is to provide a kind of degree of deep learning method based on GPU and system, and to solve in prior art, to calculate long-acting rate consuming time low, and system disposes the problem complicated, cost is high.
In order to make those skilled in the art person understand the present invention program better, below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.
The following technical scheme of the present invention is described for the study of the degree of depth of image data, and certainly, this only gives an example, and is not limited to image data, can also be other such as speech data, ad data etc.
With reference to figure 1, Fig. 1 shows the process flow diagram of a kind of degree of deep learning method based on GPU that the embodiment of the present invention provides, the method is carried on the degree of deep learning software system based on GPU, these software systems are applied to the degree of deep learning system (i.e. hardware system) based on GPU, system is one-of-a-kind system and system comprises CPU and at least one GPU, specifically can comprise the steps:
Step S100, CPU transmission treats that training data is to each GPU;
In the present invention, the degree of deep learning system based on GPU can also comprise SSD hard disk and internal memory, treats that training data is to internal memory with reference to figure 2, ReadData:CPU parallel reading from SSD hard disk; SendData: will treat that training data transfers to each GPU in internal memory.
Step S101, each GPU utilize and treat training data, and forward-backward algorithm calculates the weight information of neural network model, and weight information is fed back to CPU;
In Fig. 2, ForwardBackward: perform forward-backward algorithm parallel computation;
TransferWeight: the weight information calculated is fed back to CPU.
Step S102, CPU upgrade neural network model according to weight information, and the neural network model after upgrading is transferred to each GPU, and circulation performs above-mentioned steps until complete the degree of deep learning process of neural network model.
In Fig. 2, ReciveNewWeightandSendNewNet: the weight information receiving GPU feedback, and the neural network model after upgrading is transferred to each GPU;
ComputeUpdateValueandNetUpdate: upgrade neural network model according to weight information;
Send/ReciveNewNet: the neural network model after transmission/reception upgrades.
A kind of degree of deep learning method based on GPU provided by the invention have employed the collaborative high density account form of CPU and multiple GPU card above, particularly, perform forward-backward algorithm consuming time by the GPU with powerful computation capability to calculate, remaining parameter renewal calculating according to the algorithm characteristic of degree of depth study application, digital independent and distribution, neural network model upgrade to calculate and are then completed by CPU; Thus accelerate the processing time of data depth study application, improve counting yield.
Based on the technical scheme disclosed in the invention described above embodiment, in another embodiment of the present invention, utilize hard disk and internal memory will treat that training data transfers to each GPU for CPU in above-mentioned steps S100, in practical application, particularly, in Hardware Design, CPU end adopts secondary pattern, and the first order is hard disk, can adopt quick SSD hard disk, size can be 1TB, for depositing original image data to be trained; The second level is internal memory, can configure the large internal memory of 256GB, and internal memory deposits supplemental characteristic and the buffer memory image data of training pattern.
In addition, in the present embodiment, hardware system is that high IO handles up system: carry out data transmission by PCIE interface between above-mentioned CPU and each GPU, based on the large internal memory of 256GB and the 1TBSSD hard disk of said system configuration, CPU energy fast access data, CPU and every block GPU are that PCIE3.0 communicates, two GPU chips in every block GPU card are also PCIE3.0 communication, RDMA is adopted directly to communicate between GPU and GPU, such CPU and GPU, between GPU and GPU, communication reaches the highest, handles up with the high IO that this achieves system.
Finally, in the present invention, above-mentioned CPU can comprise one or more CPU, multiple CPU can share mutually to data processing amount, and in this enforcement, the quantity of CPU is preferably two, software systems are Caffe application unit many GPU card concurrent software, adopt Cifar-10 data test, software frame adopts MPI+Phread+CUDA hybrid parallel computation schema, realizes the parallel computation of unit many GPU card.CPU has held multiple MPI process, and process number is 1+GPU number, wherein first MPI Process flowchart, two CPU, other MPI process number is GPU number, each MPI Process flowchart GPU, first MPI process plays multiple PThread parallel thread, the corresponding PThread thread of each CPUcore.Software systems processing procedure adopts master slave mode, and be divided into a host process, control CPU holds; Multiplely to hold from Process flowchart GPU.
Based on technical scheme disclosed in above-described embodiment, in yet another embodiment of the invention, for the design of hardware system, a kind of unit two-way 8GPU Design Mode based on CPU+GPU isomery framework is proposed, particularly, on the basis of two CPU, at least one GPU comprises eight GPU, further, eight GPU are specially four pieces of GPU cards, and every block GPU card comprises two GPU chips.In practical application, with reference to figure 3, system only adopts a node, configure four pieces of NvidiaK80GPU cards, every block K80 has two GPU chips, altogether eight GPU chips, and configuring two CPU is two haswell framework E5-2670v3CPU, two CPU and eight GPU work in coordination with, and realize high density further calculate with this.
In fact, in actual applications, the quantity of at least one GPU above-mentioned is except eight, and can also be three, five or six etc., the present invention do considered critical.Above-mentionedly preferably pointing out that at least one GPU comprises eight GPU, is that the present embodiment adopts eight GPU to be to improve counting yield to greatest extent because at least one GPU above-mentioned comprises at most eight GPU in reality.
Setting about hard disk, internal memory and communication pattern please refer to above.
In addition, based on above-mentioned Hardware Design technical scheme, based on practical application, software system design is described in further detail:
With reference to figure 4, software system architecture designs: software systems adopt MPI+Phread+CUDA hybrid parallel computation schema, realize the parallel computation of unit many GPU card.CPU has held 9 MPI processes, process number is 1+GPU number, wherein first MPI Process flowchart, two pieces of CPU, each MPI Process flowchart GPU chip in other MPI process, first MPI process plays 24 PThread parallel threads, the corresponding PThread thread of each CPUcore.
Based on above-mentioned maitenance measurement, processing procedure design adopts master slave mode, and be divided into a host process, control CPU holds; Eight control eight GPU chips respectively from process.Follow-up more specifically content can with reference to above.
The degree of deep learning method based on GPU provided based on the invention described above embodiment is corresponding, and the embodiment of the present invention additionally provides a kind of degree of deep learning system based on GPU, and with reference to figure 5, system is one-of-a-kind system, and this system 500 can comprise following content:
CPU501 and at least one GPU502;
Wherein,
CPU501 is used for transmission and treats that training data is to each GPU502, and the weight information according to GPU502 feedback upgrades neural network model, and the neural network model after upgrading is transferred to each GPU502;
Each GPU502 is used for utilization and treats training data, and forward-backward algorithm calculates the weight information of neural network model, and weight information is fed back to CPU501; Circulation performs above-mentioned steps until complete the degree of deep learning process of neural network.
Said system 500 can also comprise: SSD hard disk and internal memory;
CPU501 is parallel from SSD hard disk reads data to be learned to internal memory; To treat that training data transfers to each GPU502 in internal memory.
Said system 500 can also comprise: PCIE interface;
Data transmission is carried out by PCIE interface between CPU501 and each GPU502.
In said system 500, the quantity of CPU501 is two, and at least one GPU502 comprises eight GPU502.
In said system 500, eight GPU502 are specially four pieces of GPU cards, and every block GPU card comprises 2 GPU chips.
To sum up, present invention achieves a kind of high density desktop type picture depth based on GPU study software and hardware integration system, this system has that high density calculates, high IO handles up, low cost, easily dispose feature.This system carries out dedicated custom Codesign according to picture depth study computation feature, whole system is one-of-a-kind system, adopt CPU and the collaborative high density of polylith GPU card to calculate, thus accelerate the picture depth study application processing time, promote counting yield.
It should be noted that, each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiment, between each embodiment identical similar part mutually see.For system class embodiment, due to itself and embodiment of the method basic simlarity, so describe fairly simple, relevant part illustrates see the part of embodiment of the method.
Above a kind of degree of deep learning method based on GPU provided by the present invention and system are described in detail.Apply specific case herein to set forth principle of the present invention and embodiment, the explanation of above embodiment just understands method of the present invention and core concept thereof for helping.It should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention, can also carry out some improvement and modification to the present invention, these improve and modify and also fall in the protection domain of the claims in the present invention.

Claims (8)

1. based on a degree of deep learning method of GPU, it is characterized in that, be applied to the degree of deep learning system based on GPU, described system is for one-of-a-kind system and described system comprises CPU and GPU described at least one, and the method comprises:
Described CPU transmission treats that training data is to each described GPU;
Treat training data described in each described GPU utilizes, forward-backward algorithm calculates the weight information of neural network model, and described weight information is fed back to described CPU;
Described CPU upgrades described neural network model according to described weight information, and the neural network model after upgrading is transferred to each described GPU, and circulation performs above-mentioned steps until complete the degree of deep learning process of described neural network model.
2. the method for claim 1, is characterized in that, treats that training data is to each described GPU, comprising described in described CPU transmission:
From SSD hard disk, treat that training data is to internal memory described in parallel reading;
Described in described internal memory, treat that training data transfers to each described GPU.
3. method as claimed in claim 1 or 2, is characterized in that, carry out data transmission between described CPU and each described GPU by PCIE interface.
4. based on a degree of deep learning system of GPU, it is characterized in that, described system is one-of-a-kind system, and this system comprises:
CPU and at least one GPU;
Wherein,
Described CPU is used for transmission and treats that training data is to each described GPU, and the weight information fed back according to described GPU upgrades neural network model, and the neural network model after upgrading is transferred to each described GPU;
Each described GPU treats training data described in utilizing, and forward-backward algorithm calculates the weight information of described neural network model, and described weight information is fed back to described CPU; Circulation performs above-mentioned steps until complete the degree of deep learning process of described neural network model.
5. system as claimed in claim 4, is characterized in that, also comprise:
SSD hard disk and internal memory;
Described CPU treats that training data is to described internal memory described in parallel reading from described SSD hard disk; Described in described internal memory, treat that training data transfers to each described GPU.
6. the system as described in claim 4 or 5, is characterized in that, also comprises:
PCIE interface;
Data transmission is carried out by described PCIE interface between described CPU and each described GPU.
7. system as claimed in claim 6, it is characterized in that, the quantity of described CPU is two, and at least one GPU described comprises eight GPU.
8. system as claimed in claim 7, it is characterized in that, described eight GPU are specially four pieces of GPU cards, and GPU card described in every block comprises 2 GPU chips.
CN201510628858.7A 2015-09-28 2015-09-28 A kind of degree of depth learning method based on GPU and system Pending CN105224502A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510628858.7A CN105224502A (en) 2015-09-28 2015-09-28 A kind of degree of depth learning method based on GPU and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510628858.7A CN105224502A (en) 2015-09-28 2015-09-28 A kind of degree of depth learning method based on GPU and system

Publications (1)

Publication Number Publication Date
CN105224502A true CN105224502A (en) 2016-01-06

Family

ID=54993481

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510628858.7A Pending CN105224502A (en) 2015-09-28 2015-09-28 A kind of degree of depth learning method based on GPU and system

Country Status (1)

Country Link
CN (1) CN105224502A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106201870A (en) * 2016-07-01 2016-12-07 浪潮电子信息产业股份有限公司 A kind of method and device testing GPU
WO2017148292A1 (en) * 2016-03-01 2017-09-08 华为技术有限公司 Cascade plate, and system and method for ssd remote sharing access
WO2018107934A1 (en) * 2016-12-14 2018-06-21 腾讯科技(深圳)有限公司 Data processing method and apparatus, and electronic device
CN109074514A (en) * 2016-05-13 2018-12-21 微软技术许可有限责任公司 Pass through the deep learning of the robot of example and experience
CN109213649A (en) * 2018-09-18 2019-01-15 郑州云海信息技术有限公司 GTX video card deep learning optimal inspection method, apparatus, terminal and storage medium
WO2019079994A1 (en) * 2017-10-25 2019-05-02 华为技术有限公司 Core scheduling method and terminal
CN109919310A (en) * 2019-01-15 2019-06-21 中国科学院信息工程研究所 A kind of GPU Memory Optimize Method and system towards deep learning training mission
CN110414668A (en) * 2019-06-29 2019-11-05 苏州浪潮智能科技有限公司 A kind of GPU deep learning method based on AEP memory, system and electronic equipment
CN110430444A (en) * 2019-08-12 2019-11-08 北京中科寒武纪科技有限公司 A kind of video stream processing method and system
CN110503194A (en) * 2019-08-09 2019-11-26 苏州浪潮智能科技有限公司 A kind of method and system of distributed parallel training
CN111722937A (en) * 2019-03-21 2020-09-29 阿里巴巴集团控股有限公司 Deep learning weight updating method and device
US11010681B2 (en) 2017-08-31 2021-05-18 Huawei Technologies Co., Ltd. Distributed computing system, and data transmission method and apparatus in distributed computing system
CN113033784A (en) * 2021-04-18 2021-06-25 沈阳雅译网络技术有限公司 Method for searching neural network structure for CPU and GPU equipment
CN113168396A (en) * 2018-11-05 2021-07-23 国际商业机器公司 Large model support in deep learning
WO2021208558A1 (en) * 2020-04-16 2021-10-21 苏州浪潮智能科技有限公司 Large deep learning model training method and system, device, and medium
US11687763B2 (en) 2018-10-19 2023-06-27 Fujitsu Limited Method, apparatus and computer program to carry out a training procedure in a convolutional neural network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120242672A1 (en) * 2011-03-21 2012-09-27 Apple Inc. Fast queries in a multithreaded queue of a graphics system
CN103488662A (en) * 2013-04-01 2014-01-01 哈尔滨工业大学深圳研究生院 Clustering method and system of parallelized self-organizing mapping neural network based on graphic processing unit
CN104036451A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Parallel model processing method and device based on multiple graphics processing units

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120242672A1 (en) * 2011-03-21 2012-09-27 Apple Inc. Fast queries in a multithreaded queue of a graphics system
CN103488662A (en) * 2013-04-01 2014-01-01 哈尔滨工业大学深圳研究生院 Clustering method and system of parallelized self-organizing mapping neural network based on graphic processing unit
CN104036451A (en) * 2014-06-20 2014-09-10 深圳市腾讯计算机系统有限公司 Parallel model processing method and device based on multiple graphics processing units

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHANSHAN ZHANG ET AL.: "ASYNCHRONOUS STOCHASTIC GRADIENT DESCENT FOR DNN TRAINING", 《IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017148292A1 (en) * 2016-03-01 2017-09-08 华为技术有限公司 Cascade plate, and system and method for ssd remote sharing access
US10901638B2 (en) 2016-03-01 2021-01-26 Huawei Technologies Co., Ltd. Cascading board and SSD shared remote access system and method
CN109074514A (en) * 2016-05-13 2018-12-21 微软技术许可有限责任公司 Pass through the deep learning of the robot of example and experience
CN106201870A (en) * 2016-07-01 2016-12-07 浪潮电子信息产业股份有限公司 A kind of method and device testing GPU
WO2018107934A1 (en) * 2016-12-14 2018-06-21 腾讯科技(深圳)有限公司 Data processing method and apparatus, and electronic device
US10943324B2 (en) 2016-12-14 2021-03-09 Tencent Technology (Shenzhen) Company Limited Data processing method, apparatus, and electronic device
US11010681B2 (en) 2017-08-31 2021-05-18 Huawei Technologies Co., Ltd. Distributed computing system, and data transmission method and apparatus in distributed computing system
WO2019079994A1 (en) * 2017-10-25 2019-05-02 华为技术有限公司 Core scheduling method and terminal
CN109213649A (en) * 2018-09-18 2019-01-15 郑州云海信息技术有限公司 GTX video card deep learning optimal inspection method, apparatus, terminal and storage medium
US11687763B2 (en) 2018-10-19 2023-06-27 Fujitsu Limited Method, apparatus and computer program to carry out a training procedure in a convolutional neural network
US11526759B2 (en) 2018-11-05 2022-12-13 International Business Machines Corporation Large model support in deep learning
CN113168396A (en) * 2018-11-05 2021-07-23 国际商业机器公司 Large model support in deep learning
US11915147B2 (en) 2018-11-05 2024-02-27 International Business Machines Corporation Large model support in deep learning
CN109919310A (en) * 2019-01-15 2019-06-21 中国科学院信息工程研究所 A kind of GPU Memory Optimize Method and system towards deep learning training mission
CN109919310B (en) * 2019-01-15 2021-05-18 中国科学院信息工程研究所 GPU memory optimization method and system for deep learning training task
CN111722937A (en) * 2019-03-21 2020-09-29 阿里巴巴集团控股有限公司 Deep learning weight updating method and device
CN110414668A (en) * 2019-06-29 2019-11-05 苏州浪潮智能科技有限公司 A kind of GPU deep learning method based on AEP memory, system and electronic equipment
CN110503194B (en) * 2019-08-09 2022-05-24 苏州浪潮智能科技有限公司 Distributed parallel training method and system
CN110503194A (en) * 2019-08-09 2019-11-26 苏州浪潮智能科技有限公司 A kind of method and system of distributed parallel training
CN110430444A (en) * 2019-08-12 2019-11-08 北京中科寒武纪科技有限公司 A kind of video stream processing method and system
WO2021208558A1 (en) * 2020-04-16 2021-10-21 苏州浪潮智能科技有限公司 Large deep learning model training method and system, device, and medium
CN113033784A (en) * 2021-04-18 2021-06-25 沈阳雅译网络技术有限公司 Method for searching neural network structure for CPU and GPU equipment

Similar Documents

Publication Publication Date Title
CN105224502A (en) A kind of degree of depth learning method based on GPU and system
Ding et al. Application of Internet of Things and virtual reality technology in college physical education
You et al. Scaling deep learning on GPU and knights landing clusters
US10614356B2 (en) Local multicast in single-host multi-GPU machine for distributed deep learning systems
CN106951926A (en) The deep learning systems approach and device of a kind of mixed architecture
Hoang et al. A novel CPU/GPU simulation environment for large-scale biologically realistic neural modeling
CN108460457A (en) A kind of more asynchronous training methods of card hybrid parallel of multimachine towards convolutional neural networks
CN103853618A (en) Resource allocation method with minimized cloud system cost based on expiration date drive
CN113469355B (en) Multi-model training pipeline in distributed system
EP4242844A3 (en) Distributing tensor computations across computing devices
Talbi et al. Metaheuristics on gpus
Yilmaz et al. Panel: The future of research in modeling & simulation
Plessl Bringing FPGAs to HPC production systems and codes
Freniere et al. The feasibility of Amazon's cloud computing platform for parallel, GPU-accelerated, multiphase-flow simulations
Liu et al. Analysis of the Relation between Artificial Intelligence and the Internet from the Perspective of Brain Science
Vlag et al. Exploring complex brain-simulation workloads on multi-GPU deployments
CN100531070C (en) Network resource scheduling simulation system
Zhang et al. A parallel strategy for convolutional neural network based on heterogeneous cluster for mobile information system
CN111695701B (en) System for realizing data set construction processing based on federal learning and construction generation method thereof
Shu et al. Design of deep learning accelerated algorithm for online recognition of industrial products defects
Cui et al. Cloud computing resource scheduling method research based on improved genetic algorithm
Nichols et al. MagmaDNN: accelerated deep learning using MAGMA
Ji et al. Optimized mapping spiking neural networks onto network-on-chip
CN103678888A (en) Cardiac blood flowing indicating and displaying method based on Euler fluid simulation algorithm
CN104090813A (en) Analysis modeling method for CPU (central processing unit) usage of virtual machines in cloud data center

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160106