CN105224502A

CN105224502A - A kind of degree of depth learning method based on GPU and system

Info

Publication number: CN105224502A
Application number: CN201510628858.7A
Authority: CN
Inventors: 张清; 王娅娟
Original assignee: Inspur Beijing Electronic Information Industry Co Ltd
Current assignee: Inspur Beijing Electronic Information Industry Co Ltd
Priority date: 2015-09-28
Filing date: 2015-09-28
Publication date: 2016-01-06

Abstract

The invention discloses a kind of degree of depth learning method based on GPU and system, described system is one-of-a-kind system and comprises CPU and at least one GPU, and the method comprises: CPU transmission treats that training data is to each GPU; Each GPU treats training data described in utilizing, and forward-backward algorithm calculates the weight information of neural network model, and weight information is fed back to CPU; CPU upgrades neural network model according to weight information, and the neural network model after upgrading is transferred to each GPU, and circulation performs above-mentioned steps until complete the degree of depth learning process of neural network model.Above scheme performs forward-backward algorithm consuming time by the GPU with powerful computation capability and calculates, and have employed the collaborative deployment way of CPU and multiple GPU card, efficiently solve in prior art that to calculate long-acting rate consuming time low, system disposes the problem complicated, cost is high.

Description

A kind of degree of deep learning method based on GPU and system

Technical field

The present invention relates to high-performance calculation, degree of deep learning art and internet arena, particularly relate to a kind of degree of deep learning method based on GPU and system.

Background technology

Now, degree of depth study is a new field in machine learning research, and its motivation is the neural network set up, simulation human brain carries out analytic learning, and the mechanism that its imitates human brain carrys out decryption, such as image, sound and text.

2006, University of Toronto professor, machine learning field authority---GeoffreyHinton and his student delivered one section of article on top academic journals " science ", opened the tide of degree of depth study in academia and industry member.Since two thousand six, degree of depth study is persistently overheating in academia.Stanford University, New York University, Montreal, CAN university etc. become the important city of the depth of investigation study.2010, U.S. Department of Defense DARPA planned to subsidize degree of depth study project first, and participant has Stanford University, New York University and NEC American Studies institute.Supporting an important evidence of degree of depth study, is exactly that cerebral nervous system has abundant hierarchical structure really.A foremost example is exactly Hubel-Wiesel model, owing to disclosing the mechanism of optic nerve and once obtaining Nobel's medical science and physiology is encouraged.

Nowadays the well-known high-tech company having large data such as Google, Microsoft, Baidu falls over each other to drop into resource, capture the technology commanding elevation of degree of depth study, exactly because they all see at large data age, more complicated and more powerful depth model can disclose the complexity and abundant information that carry in mass data deeply, and predicts more accurately doing in future or unknown event.

At present, degree of depth study application comprises speech recognition, image recognition, natural language processing, search advertisements CTR estimate, very huge in the calculated amount of these application, it needs the extensive degree of depth to learn to calculate, but, in prior art, usually only utilize CPU to realize the calculating in degree of deep learning process, calculate length consuming time, efficiency is low.Further, existing degree of deep learning system needs deploying network devices to realize networking usually, but deploying network devices is very complicated and system cost is high.

Summary of the invention

In view of this, the invention provides a kind of degree of deep learning method based on GPU and system, to solve in prior art, to calculate long-acting rate consuming time low, and system disposes the problem complicated, cost is high.

For solving the problems of the technologies described above, the invention provides a kind of degree of deep learning method based on GPU, being applied to the degree of deep learning system based on GPU, described system is for one-of-a-kind system and described system comprises CPU and GPU described at least one, and the method comprises:

Described CPU transmission treats that training data is to each described GPU;

Treat training data described in each described GPU utilizes, forward-backward algorithm calculates the weight information of neural network model, and described weight information is fed back to described CPU;

Described CPU upgrades described neural network model according to described weight information, and the neural network model after upgrading is transferred to each described GPU, and circulation performs above-mentioned steps until complete the degree of deep learning process of described neural network model.

In said method, preferably, described CPU transmits data to be learned to each described GPU, comprising:

From SSD hard disk, treat that training data is to internal memory described in parallel reading;

Described in described internal memory, treat that training data transfers to each described GPU.

In said method, preferably, data transmission is carried out by PCIE interface between described CPU and each described GPU.

Present invention also offers a kind of degree of deep learning system based on GPU, described system is one-of-a-kind system, and this system comprises:

CPU and at least one GPU;

Wherein,

Described CPU is used for transmission and treats that training data is to each described GPU, and the weight information fed back according to described GPU upgrades neural network model, and the neural network model after upgrading is transferred to each described GPU;

Each described GPU treats training data described in utilizing, and forward-backward algorithm calculates the weight information of described neural network model, and described weight information is fed back to described CPU; Circulation performs above-mentioned steps until complete the degree of deep learning process of described neural network model.

In said system, preferably, also comprise:

SSD hard disk and internal memory;

Described CPU treats that training data is to described internal memory described in parallel reading from described SSD hard disk; Described in described internal memory, treat that training data transfers to each described GPU.

In said system, preferably, also comprise:

PCIE interface;

Data transmission is carried out by described PCIE interface between described CPU and each described GPU.

In said system, preferably, the quantity of described CPU is two, and at least one GPU described comprises eight GPU.

In said system, preferably, described eight GPU are specially four pieces of GPU cards, and GPU card described in every block comprises 2 GPU chips.

A kind of degree of deep learning method based on GPU provided by the invention have employed the collaborative high density account form of CPU and multiple GPU card above, particularly, perform forward-backward algorithm consuming time by the GPU with powerful computation capability to calculate, remaining parameter renewal calculating according to the algorithm characteristic of degree of depth study application, digital independent and distribution, neural network model upgrade to calculate and are then completed by CPU; Thus accelerate the processing time of data depth study application, improve counting yield.

A kind of degree of deep learning system based on GPU provided by the invention is one-of-a-kind system above, do not need deploying network devices to network, particularly, have employed the deployment way that CPU and multiple GPU card are collaborative, insert multiple GPU card in one-of-a-kind system, Hardware Subdivision management side just, cost is low.

To sum up, the invention provides the degree of deep learning method based on GPU and system that the many GPU of a kind of unit walk abreast, efficiently solve in prior art that to calculate long-acting rate consuming time low, system disposes the problem complicated, cost is high.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only embodiments of the invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to the accompanying drawing provided.

The process flow diagram of a kind of degree of deep learning method based on GPU that Fig. 1 provides for the embodiment of the present invention;

The data interaction figure based on Fig. 1 that Fig. 2 provides for the embodiment of the present invention;

A kind of hardware design Organization Chart that Fig. 3 provides for the embodiment of the present invention;

A kind of software design architecture figure that Fig. 4 provides for the embodiment of the present invention;

The structured flowchart schematic diagram of a kind of degree of deep learning system based on GPU that Fig. 5 provides for the embodiment of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

Core of the present invention is to provide a kind of degree of deep learning method based on GPU and system, and to solve in prior art, to calculate long-acting rate consuming time low, and system disposes the problem complicated, cost is high.

In order to make those skilled in the art person understand the present invention program better, below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.

The following technical scheme of the present invention is described for the study of the degree of depth of image data, and certainly, this only gives an example, and is not limited to image data, can also be other such as speech data, ad data etc.

With reference to figure 1, Fig. 1 shows the process flow diagram of a kind of degree of deep learning method based on GPU that the embodiment of the present invention provides, the method is carried on the degree of deep learning software system based on GPU, these software systems are applied to the degree of deep learning system (i.e. hardware system) based on GPU, system is one-of-a-kind system and system comprises CPU and at least one GPU, specifically can comprise the steps:

Step S100, CPU transmission treats that training data is to each GPU;

In the present invention, the degree of deep learning system based on GPU can also comprise SSD hard disk and internal memory, treats that training data is to internal memory with reference to figure 2, ReadData:CPU parallel reading from SSD hard disk; SendData: will treat that training data transfers to each GPU in internal memory.

Step S101, each GPU utilize and treat training data, and forward-backward algorithm calculates the weight information of neural network model, and weight information is fed back to CPU;

In Fig. 2, ForwardBackward: perform forward-backward algorithm parallel computation;

TransferWeight: the weight information calculated is fed back to CPU.

Step S102, CPU upgrade neural network model according to weight information, and the neural network model after upgrading is transferred to each GPU, and circulation performs above-mentioned steps until complete the degree of deep learning process of neural network model.

In Fig. 2, ReciveNewWeightandSendNewNet: the weight information receiving GPU feedback, and the neural network model after upgrading is transferred to each GPU;

ComputeUpdateValueandNetUpdate: upgrade neural network model according to weight information;

Send/ReciveNewNet: the neural network model after transmission/reception upgrades.

Based on the technical scheme disclosed in the invention described above embodiment, in another embodiment of the present invention, utilize hard disk and internal memory will treat that training data transfers to each GPU for CPU in above-mentioned steps S100, in practical application, particularly, in Hardware Design, CPU end adopts secondary pattern, and the first order is hard disk, can adopt quick SSD hard disk, size can be 1TB, for depositing original image data to be trained; The second level is internal memory, can configure the large internal memory of 256GB, and internal memory deposits supplemental characteristic and the buffer memory image data of training pattern.

In addition, in the present embodiment, hardware system is that high IO handles up system: carry out data transmission by PCIE interface between above-mentioned CPU and each GPU, based on the large internal memory of 256GB and the 1TBSSD hard disk of said system configuration, CPU energy fast access data, CPU and every block GPU are that PCIE3.0 communicates, two GPU chips in every block GPU card are also PCIE3.0 communication, RDMA is adopted directly to communicate between GPU and GPU, such CPU and GPU, between GPU and GPU, communication reaches the highest, handles up with the high IO that this achieves system.

Finally, in the present invention, above-mentioned CPU can comprise one or more CPU, multiple CPU can share mutually to data processing amount, and in this enforcement, the quantity of CPU is preferably two, software systems are Caffe application unit many GPU card concurrent software, adopt Cifar-10 data test, software frame adopts MPI+Phread+CUDA hybrid parallel computation schema, realizes the parallel computation of unit many GPU card.CPU has held multiple MPI process, and process number is 1+GPU number, wherein first MPI Process flowchart, two CPU, other MPI process number is GPU number, each MPI Process flowchart GPU, first MPI process plays multiple PThread parallel thread, the corresponding PThread thread of each CPUcore.Software systems processing procedure adopts master slave mode, and be divided into a host process, control CPU holds; Multiplely to hold from Process flowchart GPU.

Based on technical scheme disclosed in above-described embodiment, in yet another embodiment of the invention, for the design of hardware system, a kind of unit two-way 8GPU Design Mode based on CPU+GPU isomery framework is proposed, particularly, on the basis of two CPU, at least one GPU comprises eight GPU, further, eight GPU are specially four pieces of GPU cards, and every block GPU card comprises two GPU chips.In practical application, with reference to figure 3, system only adopts a node, configure four pieces of NvidiaK80GPU cards, every block K80 has two GPU chips, altogether eight GPU chips, and configuring two CPU is two haswell framework E5-2670v3CPU, two CPU and eight GPU work in coordination with, and realize high density further calculate with this.

In fact, in actual applications, the quantity of at least one GPU above-mentioned is except eight, and can also be three, five or six etc., the present invention do considered critical.Above-mentionedly preferably pointing out that at least one GPU comprises eight GPU, is that the present embodiment adopts eight GPU to be to improve counting yield to greatest extent because at least one GPU above-mentioned comprises at most eight GPU in reality.

Setting about hard disk, internal memory and communication pattern please refer to above.

In addition, based on above-mentioned Hardware Design technical scheme, based on practical application, software system design is described in further detail:

With reference to figure 4, software system architecture designs: software systems adopt MPI+Phread+CUDA hybrid parallel computation schema, realize the parallel computation of unit many GPU card.CPU has held 9 MPI processes, process number is 1+GPU number, wherein first MPI Process flowchart, two pieces of CPU, each MPI Process flowchart GPU chip in other MPI process, first MPI process plays 24 PThread parallel threads, the corresponding PThread thread of each CPUcore.

Based on above-mentioned maitenance measurement, processing procedure design adopts master slave mode, and be divided into a host process, control CPU holds; Eight control eight GPU chips respectively from process.Follow-up more specifically content can with reference to above.

The degree of deep learning method based on GPU provided based on the invention described above embodiment is corresponding, and the embodiment of the present invention additionally provides a kind of degree of deep learning system based on GPU, and with reference to figure 5, system is one-of-a-kind system, and this system 500 can comprise following content:

CPU501 and at least one GPU502;

Wherein,

CPU501 is used for transmission and treats that training data is to each GPU502, and the weight information according to GPU502 feedback upgrades neural network model, and the neural network model after upgrading is transferred to each GPU502;

Each GPU502 is used for utilization and treats training data, and forward-backward algorithm calculates the weight information of neural network model, and weight information is fed back to CPU501; Circulation performs above-mentioned steps until complete the degree of deep learning process of neural network.

Said system 500 can also comprise: SSD hard disk and internal memory;

CPU501 is parallel from SSD hard disk reads data to be learned to internal memory; To treat that training data transfers to each GPU502 in internal memory.

Said system 500 can also comprise: PCIE interface;

Data transmission is carried out by PCIE interface between CPU501 and each GPU502.

In said system 500, the quantity of CPU501 is two, and at least one GPU502 comprises eight GPU502.

In said system 500, eight GPU502 are specially four pieces of GPU cards, and every block GPU card comprises 2 GPU chips.

To sum up, present invention achieves a kind of high density desktop type picture depth based on GPU study software and hardware integration system, this system has that high density calculates, high IO handles up, low cost, easily dispose feature.This system carries out dedicated custom Codesign according to picture depth study computation feature, whole system is one-of-a-kind system, adopt CPU and the collaborative high density of polylith GPU card to calculate, thus accelerate the picture depth study application processing time, promote counting yield.

It should be noted that, each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiment, between each embodiment identical similar part mutually see.For system class embodiment, due to itself and embodiment of the method basic simlarity, so describe fairly simple, relevant part illustrates see the part of embodiment of the method.

Above a kind of degree of deep learning method based on GPU provided by the present invention and system are described in detail.Apply specific case herein to set forth principle of the present invention and embodiment, the explanation of above embodiment just understands method of the present invention and core concept thereof for helping.It should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention, can also carry out some improvement and modification to the present invention, these improve and modify and also fall in the protection domain of the claims in the present invention.

Claims

1. based on a degree of deep learning method of GPU, it is characterized in that, be applied to the degree of deep learning system based on GPU, described system is for one-of-a-kind system and described system comprises CPU and GPU described at least one, and the method comprises:

Described CPU transmission treats that training data is to each described GPU;

2. the method for claim 1, is characterized in that, treats that training data is to each described GPU, comprising described in described CPU transmission:

3. method as claimed in claim 1 or 2, is characterized in that, carry out data transmission between described CPU and each described GPU by PCIE interface.

4. based on a degree of deep learning system of GPU, it is characterized in that, described system is one-of-a-kind system, and this system comprises:

CPU and at least one GPU;

Wherein,

5. system as claimed in claim 4, is characterized in that, also comprise:

SSD hard disk and internal memory;

6. the system as described in claim 4 or 5, is characterized in that, also comprises:

PCIE interface;

7. system as claimed in claim 6, it is characterized in that, the quantity of described CPU is two, and at least one GPU described comprises eight GPU.

8. system as claimed in claim 7, it is characterized in that, described eight GPU are specially four pieces of GPU cards, and GPU card described in every block comprises 2 GPU chips.