CN106650931A

CN106650931A - Hybrid precision deep learning algorithm

Info

Publication number: CN106650931A
Application number: CN201611132257.8A
Authority: CN
Inventors: 许建卫; 刘立; 窦晓光
Original assignee: Dawning Information Industry Beijing Co Ltd
Current assignee: Dawning Information Industry Beijing Co Ltd
Priority date: 2016-12-09
Filing date: 2016-12-09
Publication date: 2017-05-10

Abstract

The invention relates to the technical field of deep learning algorithms and specifically relates to a hybrid precision deep learning algorithm. The to-be-solved technical problem is ensuring calculation precision and improving calculation efficiency at the same time. The technical scheme includes steps of: S101, utilizing a single-precision many-core processor for forward propagation calculation and calculating the value of each nerve cell for each network layer; S102, utilizing the single-precision many-core processor for backward propagation calculation and calculating error residual values for each network layer; S103, utilizing the single-precision many-core processor for calculating weight increment; S104, updating the weight increment calculated by the single-precision many-core processor to weight increment calculated by a high-precision many-core processor and implementing calculation of an irritation. The invention is suitable for the deep leaning field.

Description

Mixed-precision deep learning algorithm

Technical field

The present invention relates to the technical field of deep learning algorithm, and in particular to a kind of mixed-precision deep learning algorithm.

Background technology

Deep learning is a new field in machine learning research, and its motivation is to set up, simulate human brain and carry out point The neutral net of analysis study, it imitates the mechanism of human brain to explain data, such as image, sound and text, its concept by Hinton et al. was proposed in 2006.Non-supervisory greed successively training algorithm is proposed based on depth confidence network (DBN), is solution The related optimization difficult problem of deep structure brings hope, subsequently proposes multilamellar autocoder deep structure.Additionally, Lecun et al. The convolutional neural networks of proposition are first real multiple structure learning algorithms, and its utilization space relativeness reduces number of parameters To improve training performance.Deep neural network is a kind of neutral net for possessing at least one hidden layer, and Fig. 1 is the signal of neuron Figure, schematic diagrams of the Fig. 2 for neural network structure.Similar with shallow-layer neutral net, deep neural network can also be complicated non-thread Sexual system provides modeling, but the level having more provides higher abstraction hierarchy for model, thus improves the ability of model.It is deep Degree study is a kind of typical computation-intensive problem, and amount of calculation is very big.Presently, there is the deep learning of double precision, There is the deep learning of single precision, although result is all convergence, but precision is different.For answering for this computation-intensive With one preferably selects to be exactly to accelerate using many-core, such as carries out speed-up computation using GPU (graphic process unit).For this For class acceleration components, the geard-down speed of single precision is often the twice of double precision, therefore, calculate double to reach using single precision The result of precision is a problem for being worth research.

On Modern architectures, 32 floating-point operations are typically at least than the fast twice of 64 floating-point operations.People are very early Find that to obtain to calculate using low accuracy computation when system of linear equations is solved and accelerate, and only with a small amount of high precision computation Reach with all with high-precision consistent result of calculation.1948, Wilkinson was carried in design, manufacture turing machine The iteration for going out to solve system of linear equations improves (iterative refinement).The iteration for solving system of linear equations Ax=b changes Enter thought to be described as：

x⁽⁰⁾=0

d^(s)=b-Ax^(s)compute residual in high precision

Ac^(s)=d^(s)solve equation system in low precision

x^(s+1)=x^(s)+c^(s)accumulate solution in high precision

Wilkinson and his partners prove：If matrix A is not height morbid state, continuously x is asked with mixed-precision Solution, can converge to the result all with high precision computation.For Solving Linear, more than 90% Floating-point Computation can be led to Too low precision is completing, and does not interfere with the precision of solution.The main thought that mixed-precision is calculated is by computation-intensive portion Divide using low precision, to a small amount of committed step using high accuracy, so as to reach one kind strategy for calculating acceleration.

The effectiveness that mixed-precision is calculated is to must assure that final computational accuracy is not lost.Langou etc. (2006 Year) dense matrix is applied on Cell and main flow CPU solves carried out comprehensive test to the effectiveness of mixed-precision algorithm. Goddeke etc. (2008) is solved in the multi grid of extensive matrix and is tested.These tests confirm mixed-precision Result of calculation is completely the same with the precision all with high precision computation result.

The content of the invention

For deficiency present in correlation technique, the technical problem to be solved is：There is provided one kind both to ensure Computational accuracy is constant, improves the mixed-precision deep learning algorithm of computational efficiency again.

To solve above-mentioned technical problem, the technical solution used in the present invention is：Mixed-precision deep learning algorithm, including with Lower step：S101, propagated forward calculating is carried out using single precision many-core processor, each nerve is calculated to each Internet The value of unit；S102, back-propagating calculating is carried out using single precision many-core processor, error residual is calculated to each Internet Value；S103, the increment that weights are calculated using single precision many-core processor；S104, the increasing that single precision many-core processor is calculated On the increment weights that amount right value update is calculated to high accuracy many-core processor, an iteration calculating is completed.

Preferably, the mixed-precision deep learning algorithm betides the iteration later stage of deep learning.

Preferably, it is in step 102, described to carry out back-propagating calculating, error residual value is calculated to each Internet, is had Body includes：The residual error of last layer is multiplied by into the weights of current layer, and multiplication is done with the gradient of current layer, draw all layers of error Residual value.

Preferably, in step 103, the increment for calculating weights is specifically included：Using what is calculated in back-propagating The residual values of each Internet, calculate weights variable quantity by gradient etc..

The present invention Advantageous Effects be：Mixed-precision deep learning algorithm in the present invention, first with single precision Propagated forward calculating, back-propagating calculating and the calculating of value increase are carried out, each nerve is calculated to each Internet The increment of the value, error residual value and weights of unit, then the single precision increment right value update that above-mentioned steps are calculated is to high accuracy On weights, an iteration calculating is completed；It is also required to change the spy of precision different from some optimized algorithms while efficiency is improved Levy, the present invention adopts high-precision calculating accelerated method using low precision, to a small amount of committed step to computation-intensive part, and which obtains The result for going out, and the precision of the result for all being drawn using high precision computation is completely the same；Deep learning in the present invention is adopted Many-core processor is being accelerated, and for many-core processor, single precision is often fast again than double precision, therefore mixes Accuracy computation can greatly improve calculating speed；Further, since single precision in algorithm, is utilized in a large number, therefore the corresponding traffic One times is reduced, in deep learning, the annexation between each layer is often full connection, therefore, reducing the traffic being capable of pole The expandability of big raising algorithm.

Description of the drawings

Fig. 1 is the schematic diagram of neuron in deep learning algorithm；

Fig. 2 is the schematic diagram of neural network structure in deep learning algorithm；

Fig. 3 is the schematic flow sheet of the mixed-precision deep learning algorithm that the embodiment of the present invention one is provided.

Specific embodiment

To make purpose, technical scheme and the advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is A part of embodiment of the present invention, rather than the embodiment of whole；Based on the embodiment in the present invention, ordinary skill people The every other embodiment obtained under the premise of creative work is not made by member, belongs to the scope of protection of the invention.

Fig. 3 is the schematic flow sheet of the mixed-precision deep learning algorithm that the embodiment of the present invention one is provided, as shown in figure 3, Mixed-precision deep learning algorithm, comprises the following steps：

S101, propagated forward calculating is carried out using single precision many-core processor, each god is calculated to each Internet The value of Jing units.

S102, back-propagating calculating is carried out using single precision many-core processor, error residual is calculated to each Internet Value.

S103, the increment that weights are calculated using single precision many-core processor.

S104, the increment right value update for calculating single precision many-core processor are calculated to high accuracy many-core processor On increment weights, an iteration calculating is completed.

Deep learning algorithm is broadly divided into the three below stage in realization：Propagated forward, back-propagating, parameter update. Propagated forward simply can be interpreted as calculating output according to given input.By the calculating for combining each layer layer, (can In the form of being interpreted as this compound functions of f (x)=h (g (x))), circulation way is bottom-up (bottom to top, institute It is referred to as propagated forward with us).As shown in Figure 2.Its main calculation process is：

The main thought of Back Propagation Algorithm is：A sample (x, y) is given, " forward conduction " computing is carried out first, is counted Calculate all of activation value in network, including h { W, b } output valves (x).Afterwards, for each node i of l layers, calculate Go out its " residual error " δ_i ^l, how much the residual error affects if indicating the node and the residual error of final output value is generated.It is defeated for final Egress, can directly calculate the gap between the activation value and actual value of network generation, be δ by this gap definition_i ⁿ(n-th Layer represents output layer).For how hidden unit is processedWill be based on the weighted mean of node (l+1 node layers) residual error Calculate δ_i ^l, these nodes are with a_i ^lAs input.By calculating with the derivative of weights and current layer, successively can draw each The residual error of layer.

Put it briefly, the weights of current layer, and and current layer are exactly multiplied by back-propagating calculating using the residual error of last layer Gradient do multiplication, so draw all layers of residual values.

Final step is exactly the renewal of weighting parameter, exactly using the residual error of each layer calculated in back propagation Value, calculates weights variable quantity by gradient etc., then carries out right value update.

Specifically, the mixed-precision deep learning algorithm betides the iteration later stage of deep learning.

Why the method calculated by this utilization mixed-precision, ensure that precision is not lost, and be have one in fact Precondition, i.e., the increment size for updating every time is much smaller than full dose value on absolute value, as long as so as to increment has several positions Significant digits can meet required precision.But, deep learning when incipient stage iteration, increment size and full dose value, Difference in size less, has simply arrived the iteration later stage, iterative value relatively true value when, the two just meets this precondition. Therefore, the numerical procedure of this mixed-precision can be used only in the iteration later stage at present.This is also often very important, because iteration Most of the time, 80 or so general percent be flower in the iteration in later stage.

Specifically, it is in step 102, described to carry out back-propagating calculating, error residual value is calculated to each Internet, is had Body includes：

The residual error of last layer is multiplied by into the weights of current layer, and multiplication is done with the gradient of current layer, draw all layers of mistake Difference residual value.

Specifically, in step 103, the increment for calculating weights is specifically included：

Using the residual values of each Internet calculated in back-propagating, weights change is calculated by gradient etc. Amount.

Compared with prior art, the present invention has advantages below：1st, ensure that computational accuracy is constant；Different from some optimizations Algorithm, while efficiency is improved have also been changed precision, so where some are higher to required precision, is to make it difficult for people to connect Receive.2nd, computational efficiency is improve, current depth study is widely used many-core processor and is accelerated, and to many-core processor For, single precision is often fast again than double precision, therefore mixed-precision is calculated and can greatly improve calculating speed.3rd, it is expansible Ability is strong, and due to utilizing single precision in algorithm in a large number, therefore the corresponding traffic also reduces one times；In deep learning, respectively Annexation between layer is often full connection, therefore, reduce the expandability that the traffic can greatly improve algorithm.

Finally it should be noted that：Various embodiments above only to illustrate technical scheme, rather than a limitation；To the greatest extent Pipe has been described in detail to the present invention with reference to foregoing embodiments, it will be understood by those within the art that：Its according to So the technical scheme described in foregoing embodiments can be modified, or which part or all technical characteristic are entered Row equivalent；And these modifications or replacement, do not make the essence of appropriate technical solution depart from various embodiments of the present invention technology The scope of scheme.

Claims

1. mixed-precision deep learning algorithm, it is characterised in that：Comprise the following steps：

S101, propagated forward calculating is carried out using single precision many-core processor, each neuron is calculated to each Internet Value；

S102, back-propagating calculating is carried out using single precision many-core processor, error residual value is calculated to each Internet；

S103, the increment that weights are calculated using single precision many-core processor；

The increment that S104, the increment right value update for calculating single precision many-core processor are calculated to high accuracy many-core processor On weights, an iteration calculating is completed.

2. mixed-precision deep learning algorithm according to claim 1, it is characterised in that：The mixed-precision deep learning Algorithm betides the iteration later stage of deep learning.

3. mixed-precision deep learning algorithm according to claim 1, it is characterised in that：In step 102, it is described carry out after Calculate to propagating, error residual value is calculated to each Internet, is specifically included：

The residual error of last layer is multiplied by into the weights of current layer, and multiplication is done with the gradient of current layer, show that all layers of error is residual Value.

4. mixed-precision deep learning algorithm according to claim 1, it is characterised in that：It is in step 103, described to calculate power The increment of value, specifically includes：

Using the residual values of each Internet calculated in back-propagating, weights variable quantity is calculated by gradient etc..