CN106650931A - Hybrid precision deep learning algorithm - Google Patents
Hybrid precision deep learning algorithm Download PDFInfo
- Publication number
- CN106650931A CN106650931A CN201611132257.8A CN201611132257A CN106650931A CN 106650931 A CN106650931 A CN 106650931A CN 201611132257 A CN201611132257 A CN 201611132257A CN 106650931 A CN106650931 A CN 106650931A
- Authority
- CN
- China
- Prior art keywords
- precision
- deep learning
- core processor
- calculated
- calculating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the technical field of deep learning algorithms and specifically relates to a hybrid precision deep learning algorithm. The to-be-solved technical problem is ensuring calculation precision and improving calculation efficiency at the same time. The technical scheme includes steps of: S101, utilizing a single-precision many-core processor for forward propagation calculation and calculating the value of each nerve cell for each network layer; S102, utilizing the single-precision many-core processor for backward propagation calculation and calculating error residual values for each network layer; S103, utilizing the single-precision many-core processor for calculating weight increment; S104, updating the weight increment calculated by the single-precision many-core processor to weight increment calculated by a high-precision many-core processor and implementing calculation of an irritation. The invention is suitable for the deep leaning field.
Description
Technical field
The present invention relates to the technical field of deep learning algorithm, and in particular to a kind of mixed-precision deep learning algorithm.
Background technology
Deep learning is a new field in machine learning research, and its motivation is to set up, simulate human brain and carry out point
The neutral net of analysis study, it imitates the mechanism of human brain to explain data, such as image, sound and text, its concept by
Hinton et al. was proposed in 2006.Non-supervisory greed successively training algorithm is proposed based on depth confidence network (DBN), is solution
The related optimization difficult problem of deep structure brings hope, subsequently proposes multilamellar autocoder deep structure.Additionally, Lecun et al.
The convolutional neural networks of proposition are first real multiple structure learning algorithms, and its utilization space relativeness reduces number of parameters
To improve training performance.Deep neural network is a kind of neutral net for possessing at least one hidden layer, and Fig. 1 is the signal of neuron
Figure, schematic diagrams of the Fig. 2 for neural network structure.Similar with shallow-layer neutral net, deep neural network can also be complicated non-thread
Sexual system provides modeling, but the level having more provides higher abstraction hierarchy for model, thus improves the ability of model.It is deep
Degree study is a kind of typical computation-intensive problem, and amount of calculation is very big.Presently, there is the deep learning of double precision,
There is the deep learning of single precision, although result is all convergence, but precision is different.For answering for this computation-intensive
With one preferably selects to be exactly to accelerate using many-core, such as carries out speed-up computation using GPU (graphic process unit).For this
For class acceleration components, the geard-down speed of single precision is often the twice of double precision, therefore, calculate double to reach using single precision
The result of precision is a problem for being worth research.
On Modern architectures, 32 floating-point operations are typically at least than the fast twice of 64 floating-point operations.People are very early
Find that to obtain to calculate using low accuracy computation when system of linear equations is solved and accelerate, and only with a small amount of high precision computation
Reach with all with high-precision consistent result of calculation.1948, Wilkinson was carried in design, manufacture turing machine
The iteration for going out to solve system of linear equations improves (iterative refinement).The iteration for solving system of linear equations Ax=b changes
Enter thought to be described as:
x(0)=0
d(s)=b-Ax(s)compute residual in high precision
Ac(s)=d(s)solve equation system in low precision
x(s+1)=x(s)+c(s)accumulate solution in high precision
Wilkinson and his partners prove:If matrix A is not height morbid state, continuously x is asked with mixed-precision
Solution, can converge to the result all with high precision computation.For Solving Linear, more than 90% Floating-point Computation can be led to
Too low precision is completing, and does not interfere with the precision of solution.The main thought that mixed-precision is calculated is by computation-intensive portion
Divide using low precision, to a small amount of committed step using high accuracy, so as to reach one kind strategy for calculating acceleration.
The effectiveness that mixed-precision is calculated is to must assure that final computational accuracy is not lost.Langou etc. (2006
Year) dense matrix is applied on Cell and main flow CPU solves carried out comprehensive test to the effectiveness of mixed-precision algorithm.
Goddeke etc. (2008) is solved in the multi grid of extensive matrix and is tested.These tests confirm mixed-precision
Result of calculation is completely the same with the precision all with high precision computation result.
The content of the invention
For deficiency present in correlation technique, the technical problem to be solved is:There is provided one kind both to ensure
Computational accuracy is constant, improves the mixed-precision deep learning algorithm of computational efficiency again.
To solve above-mentioned technical problem, the technical solution used in the present invention is:Mixed-precision deep learning algorithm, including with
Lower step:S101, propagated forward calculating is carried out using single precision many-core processor, each nerve is calculated to each Internet
The value of unit;S102, back-propagating calculating is carried out using single precision many-core processor, error residual is calculated to each Internet
Value;S103, the increment that weights are calculated using single precision many-core processor;S104, the increasing that single precision many-core processor is calculated
On the increment weights that amount right value update is calculated to high accuracy many-core processor, an iteration calculating is completed.
Preferably, the mixed-precision deep learning algorithm betides the iteration later stage of deep learning.
Preferably, it is in step 102, described to carry out back-propagating calculating, error residual value is calculated to each Internet, is had
Body includes:The residual error of last layer is multiplied by into the weights of current layer, and multiplication is done with the gradient of current layer, draw all layers of error
Residual value.
Preferably, in step 103, the increment for calculating weights is specifically included:Using what is calculated in back-propagating
The residual values of each Internet, calculate weights variable quantity by gradient etc..
The present invention Advantageous Effects be:Mixed-precision deep learning algorithm in the present invention, first with single precision
Propagated forward calculating, back-propagating calculating and the calculating of value increase are carried out, each nerve is calculated to each Internet
The increment of the value, error residual value and weights of unit, then the single precision increment right value update that above-mentioned steps are calculated is to high accuracy
On weights, an iteration calculating is completed;It is also required to change the spy of precision different from some optimized algorithms while efficiency is improved
Levy, the present invention adopts high-precision calculating accelerated method using low precision, to a small amount of committed step to computation-intensive part, and which obtains
The result for going out, and the precision of the result for all being drawn using high precision computation is completely the same;Deep learning in the present invention is adopted
Many-core processor is being accelerated, and for many-core processor, single precision is often fast again than double precision, therefore mixes
Accuracy computation can greatly improve calculating speed;Further, since single precision in algorithm, is utilized in a large number, therefore the corresponding traffic
One times is reduced, in deep learning, the annexation between each layer is often full connection, therefore, reducing the traffic being capable of pole
The expandability of big raising algorithm.
Description of the drawings
Fig. 1 is the schematic diagram of neuron in deep learning algorithm;
Fig. 2 is the schematic diagram of neural network structure in deep learning algorithm;
Fig. 3 is the schematic flow sheet of the mixed-precision deep learning algorithm that the embodiment of the present invention one is provided.
Specific embodiment
To make purpose, technical scheme and the advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
A part of embodiment of the present invention, rather than the embodiment of whole;Based on the embodiment in the present invention, ordinary skill people
The every other embodiment obtained under the premise of creative work is not made by member, belongs to the scope of protection of the invention.
Fig. 3 is the schematic flow sheet of the mixed-precision deep learning algorithm that the embodiment of the present invention one is provided, as shown in figure 3,
Mixed-precision deep learning algorithm, comprises the following steps:
S101, propagated forward calculating is carried out using single precision many-core processor, each god is calculated to each Internet
The value of Jing units.
S102, back-propagating calculating is carried out using single precision many-core processor, error residual is calculated to each Internet
Value.
S103, the increment that weights are calculated using single precision many-core processor.
S104, the increment right value update for calculating single precision many-core processor are calculated to high accuracy many-core processor
On increment weights, an iteration calculating is completed.
Deep learning algorithm is broadly divided into the three below stage in realization:Propagated forward, back-propagating, parameter update.
Propagated forward simply can be interpreted as calculating output according to given input.By the calculating for combining each layer layer, (can
In the form of being interpreted as this compound functions of f (x)=h (g (x))), circulation way is bottom-up (bottom to top, institute
It is referred to as propagated forward with us).As shown in Figure 2.Its main calculation process is:
The main thought of Back Propagation Algorithm is:A sample (x, y) is given, " forward conduction " computing is carried out first, is counted
Calculate all of activation value in network, including h { W, b } output valves (x).Afterwards, for each node i of l layers, calculate
Go out its " residual error " δi l, how much the residual error affects if indicating the node and the residual error of final output value is generated.It is defeated for final
Egress, can directly calculate the gap between the activation value and actual value of network generation, be δ by this gap definitioni n(n-th
Layer represents output layer).For how hidden unit is processedWill be based on the weighted mean of node (l+1 node layers) residual error
Calculate δi l, these nodes are with ai lAs input.By calculating with the derivative of weights and current layer, successively can draw each
The residual error of layer.
Put it briefly, the weights of current layer, and and current layer are exactly multiplied by back-propagating calculating using the residual error of last layer
Gradient do multiplication, so draw all layers of residual values.
Final step is exactly the renewal of weighting parameter, exactly using the residual error of each layer calculated in back propagation
Value, calculates weights variable quantity by gradient etc., then carries out right value update.
Specifically, the mixed-precision deep learning algorithm betides the iteration later stage of deep learning.
Why the method calculated by this utilization mixed-precision, ensure that precision is not lost, and be have one in fact
Precondition, i.e., the increment size for updating every time is much smaller than full dose value on absolute value, as long as so as to increment has several positions
Significant digits can meet required precision.But, deep learning when incipient stage iteration, increment size and full dose value,
Difference in size less, has simply arrived the iteration later stage, iterative value relatively true value when, the two just meets this precondition.
Therefore, the numerical procedure of this mixed-precision can be used only in the iteration later stage at present.This is also often very important, because iteration
Most of the time, 80 or so general percent be flower in the iteration in later stage.
Specifically, it is in step 102, described to carry out back-propagating calculating, error residual value is calculated to each Internet, is had
Body includes:
The residual error of last layer is multiplied by into the weights of current layer, and multiplication is done with the gradient of current layer, draw all layers of mistake
Difference residual value.
Specifically, in step 103, the increment for calculating weights is specifically included:
Using the residual values of each Internet calculated in back-propagating, weights change is calculated by gradient etc.
Amount.
Compared with prior art, the present invention has advantages below:1st, ensure that computational accuracy is constant;Different from some optimizations
Algorithm, while efficiency is improved have also been changed precision, so where some are higher to required precision, is to make it difficult for people to connect
Receive.2nd, computational efficiency is improve, current depth study is widely used many-core processor and is accelerated, and to many-core processor
For, single precision is often fast again than double precision, therefore mixed-precision is calculated and can greatly improve calculating speed.3rd, it is expansible
Ability is strong, and due to utilizing single precision in algorithm in a large number, therefore the corresponding traffic also reduces one times;In deep learning, respectively
Annexation between layer is often full connection, therefore, reduce the expandability that the traffic can greatly improve algorithm.
Finally it should be noted that:Various embodiments above only to illustrate technical scheme, rather than a limitation;To the greatest extent
Pipe has been described in detail to the present invention with reference to foregoing embodiments, it will be understood by those within the art that:Its according to
So the technical scheme described in foregoing embodiments can be modified, or which part or all technical characteristic are entered
Row equivalent;And these modifications or replacement, do not make the essence of appropriate technical solution depart from various embodiments of the present invention technology
The scope of scheme.
Claims (4)
1. mixed-precision deep learning algorithm, it is characterised in that:Comprise the following steps:
S101, propagated forward calculating is carried out using single precision many-core processor, each neuron is calculated to each Internet
Value;
S102, back-propagating calculating is carried out using single precision many-core processor, error residual value is calculated to each Internet;
S103, the increment that weights are calculated using single precision many-core processor;
The increment that S104, the increment right value update for calculating single precision many-core processor are calculated to high accuracy many-core processor
On weights, an iteration calculating is completed.
2. mixed-precision deep learning algorithm according to claim 1, it is characterised in that:The mixed-precision deep learning
Algorithm betides the iteration later stage of deep learning.
3. mixed-precision deep learning algorithm according to claim 1, it is characterised in that:In step 102, it is described carry out after
Calculate to propagating, error residual value is calculated to each Internet, is specifically included:
The residual error of last layer is multiplied by into the weights of current layer, and multiplication is done with the gradient of current layer, show that all layers of error is residual
Value.
4. mixed-precision deep learning algorithm according to claim 1, it is characterised in that:It is in step 103, described to calculate power
The increment of value, specifically includes:
Using the residual values of each Internet calculated in back-propagating, weights variable quantity is calculated by gradient etc..
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611132257.8A CN106650931A (en) | 2016-12-09 | 2016-12-09 | Hybrid precision deep learning algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611132257.8A CN106650931A (en) | 2016-12-09 | 2016-12-09 | Hybrid precision deep learning algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106650931A true CN106650931A (en) | 2017-05-10 |
Family
ID=58825861
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611132257.8A Pending CN106650931A (en) | 2016-12-09 | 2016-12-09 | Hybrid precision deep learning algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106650931A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107480782A (en) * | 2017-08-14 | 2017-12-15 | 电子科技大学 | Learn neural network processor on a kind of piece |
CN109598344A (en) * | 2018-12-14 | 2019-04-09 | 北京字节跳动网络技术有限公司 | Model generating method and device |
CN109670579A (en) * | 2018-12-14 | 2019-04-23 | 北京字节跳动网络技术有限公司 | Model generating method and device |
JP2019079535A (en) * | 2017-10-23 | 2019-05-23 | 三星電子株式会社Samsung Electronics Co.,Ltd. | Method and apparatus for processing parameters |
CN110097188A (en) * | 2019-04-30 | 2019-08-06 | 科大讯飞股份有限公司 | A kind of model training method, working node and parameter update server |
CN111667061A (en) * | 2020-04-17 | 2020-09-15 | 中国海洋大学 | Deep learning framework transplanting and optimizing method and system based on target many-core |
US11385875B2 (en) | 2019-01-31 | 2022-07-12 | Google Llc | Propagating reduced-precision on computation graphs |
-
2016
- 2016-12-09 CN CN201611132257.8A patent/CN106650931A/en active Pending
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107480782A (en) * | 2017-08-14 | 2017-12-15 | 电子科技大学 | Learn neural network processor on a kind of piece |
JP2019079535A (en) * | 2017-10-23 | 2019-05-23 | 三星電子株式会社Samsung Electronics Co.,Ltd. | Method and apparatus for processing parameters |
JP7072484B2 (en) | 2017-10-23 | 2022-05-20 | 三星電子株式会社 | Parameter processing method and equipment |
US11544549B2 (en) | 2017-10-23 | 2023-01-03 | Samsung Electronics Co., Ltd. | Method and apparatus with neural network |
CN109598344A (en) * | 2018-12-14 | 2019-04-09 | 北京字节跳动网络技术有限公司 | Model generating method and device |
CN109670579A (en) * | 2018-12-14 | 2019-04-23 | 北京字节跳动网络技术有限公司 | Model generating method and device |
CN109598344B (en) * | 2018-12-14 | 2020-10-02 | 北京字节跳动网络技术有限公司 | Model generation method and device |
US11385875B2 (en) | 2019-01-31 | 2022-07-12 | Google Llc | Propagating reduced-precision on computation graphs |
US11972238B2 (en) | 2019-01-31 | 2024-04-30 | Google Llc | Propagating reduced-precision on computation graphs |
CN110097188A (en) * | 2019-04-30 | 2019-08-06 | 科大讯飞股份有限公司 | A kind of model training method, working node and parameter update server |
CN111667061A (en) * | 2020-04-17 | 2020-09-15 | 中国海洋大学 | Deep learning framework transplanting and optimizing method and system based on target many-core |
CN111667061B (en) * | 2020-04-17 | 2023-04-18 | 中国海洋大学 | Deep learning framework transplanting and optimizing method and system based on target many-core |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106650931A (en) | Hybrid precision deep learning algorithm | |
CN103902785B (en) | One kind is based on polynary probabilistic structural finite element model updating method | |
CN106875013A (en) | The system and method for optimizing Recognition with Recurrent Neural Network for multinuclear | |
CN107316099A (en) | Ammunition Storage Reliability Forecasting Methodology based on particle group optimizing BP neural network | |
CN106203625B (en) | A kind of deep-neural-network training method based on multiple pre-training | |
CN106920008A (en) | A kind of wind power forecasting method based on Modified particle swarm optimization BP neural network | |
CN105978732B (en) | A kind of method and system based on the most simple echo state network parameter of particle group optimizing | |
Cintra et al. | Data assimilation by artificial neural networks for an atmospheric general circulation model | |
CN110298443A (en) | Neural network computing device and method | |
CN109376852A (en) | Arithmetic unit and operation method | |
CN114218875A (en) | Acceleration method and device for flow field prediction | |
CN105427241B (en) | Distortion correction method for large-view-field display equipment | |
CN108038507A (en) | Local receptor field extreme learning machine image classification method based on particle group optimizing | |
CN104899431A (en) | LSSVM fluctuating wind speed prediction method based on integration of ant colony and particle swarm | |
CN102495932A (en) | Finite element model updating method based on response surface modeling and improved particle swarm algorithm | |
CN108171319A (en) | The construction method of the adaptive depth convolution model of network connection | |
CN106951960A (en) | A kind of learning method of neutral net and the neutral net | |
CN104360896A (en) | Parallel fluid simulation acceleration method based on GPU (Graphics Processing Unit) cluster | |
CN115437795B (en) | Video memory recalculation optimization method and system for heterogeneous GPU cluster load perception | |
CN104101344A (en) | MEMS (micro electro mechanical system) gyroscope random error compensation method based on particle swarm wavelet network | |
CN114626499A (en) | Embedded multi-agent reinforcement learning method using sparse attention to assist decision making | |
CN112949835A (en) | Inference method and device for knowledge graph based on convolution cyclic neural network | |
CN108985449A (en) | A kind of control method and device of pair of convolutional neural networks processor | |
CN104731709B (en) | A kind of Software Defects Predict Methods based on JCUDASA_BP algorithms | |
CN103646188A (en) | Non-invasive diagnostic method of coronary heart disease based on hybrid intelligent algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170510 |