CN106650931A - Hybrid precision deep learning algorithm - Google Patents

Hybrid precision deep learning algorithm Download PDF

Info

Publication number
CN106650931A
CN106650931A CN201611132257.8A CN201611132257A CN106650931A CN 106650931 A CN106650931 A CN 106650931A CN 201611132257 A CN201611132257 A CN 201611132257A CN 106650931 A CN106650931 A CN 106650931A
Authority
CN
China
Prior art keywords
precision
deep learning
core processor
calculated
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611132257.8A
Other languages
Chinese (zh)
Inventor
许建卫
刘立
窦晓光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Information Industry Beijing Co Ltd
Original Assignee
Dawning Information Industry Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Beijing Co Ltd filed Critical Dawning Information Industry Beijing Co Ltd
Priority to CN201611132257.8A priority Critical patent/CN106650931A/en
Publication of CN106650931A publication Critical patent/CN106650931A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of deep learning algorithms and specifically relates to a hybrid precision deep learning algorithm. The to-be-solved technical problem is ensuring calculation precision and improving calculation efficiency at the same time. The technical scheme includes steps of: S101, utilizing a single-precision many-core processor for forward propagation calculation and calculating the value of each nerve cell for each network layer; S102, utilizing the single-precision many-core processor for backward propagation calculation and calculating error residual values for each network layer; S103, utilizing the single-precision many-core processor for calculating weight increment; S104, updating the weight increment calculated by the single-precision many-core processor to weight increment calculated by a high-precision many-core processor and implementing calculation of an irritation. The invention is suitable for the deep leaning field.

Description

Mixed-precision deep learning algorithm
Technical field
The present invention relates to the technical field of deep learning algorithm, and in particular to a kind of mixed-precision deep learning algorithm.
Background technology
Deep learning is a new field in machine learning research, and its motivation is to set up, simulate human brain and carry out point The neutral net of analysis study, it imitates the mechanism of human brain to explain data, such as image, sound and text, its concept by Hinton et al. was proposed in 2006.Non-supervisory greed successively training algorithm is proposed based on depth confidence network (DBN), is solution The related optimization difficult problem of deep structure brings hope, subsequently proposes multilamellar autocoder deep structure.Additionally, Lecun et al. The convolutional neural networks of proposition are first real multiple structure learning algorithms, and its utilization space relativeness reduces number of parameters To improve training performance.Deep neural network is a kind of neutral net for possessing at least one hidden layer, and Fig. 1 is the signal of neuron Figure, schematic diagrams of the Fig. 2 for neural network structure.Similar with shallow-layer neutral net, deep neural network can also be complicated non-thread Sexual system provides modeling, but the level having more provides higher abstraction hierarchy for model, thus improves the ability of model.It is deep Degree study is a kind of typical computation-intensive problem, and amount of calculation is very big.Presently, there is the deep learning of double precision, There is the deep learning of single precision, although result is all convergence, but precision is different.For answering for this computation-intensive With one preferably selects to be exactly to accelerate using many-core, such as carries out speed-up computation using GPU (graphic process unit).For this For class acceleration components, the geard-down speed of single precision is often the twice of double precision, therefore, calculate double to reach using single precision The result of precision is a problem for being worth research.
On Modern architectures, 32 floating-point operations are typically at least than the fast twice of 64 floating-point operations.People are very early Find that to obtain to calculate using low accuracy computation when system of linear equations is solved and accelerate, and only with a small amount of high precision computation Reach with all with high-precision consistent result of calculation.1948, Wilkinson was carried in design, manufacture turing machine The iteration for going out to solve system of linear equations improves (iterative refinement).The iteration for solving system of linear equations Ax=b changes Enter thought to be described as:
x(0)=0
d(s)=b-Ax(s)compute residual in high precision
Ac(s)=d(s)solve equation system in low precision
x(s+1)=x(s)+c(s)accumulate solution in high precision
Wilkinson and his partners prove:If matrix A is not height morbid state, continuously x is asked with mixed-precision Solution, can converge to the result all with high precision computation.For Solving Linear, more than 90% Floating-point Computation can be led to Too low precision is completing, and does not interfere with the precision of solution.The main thought that mixed-precision is calculated is by computation-intensive portion Divide using low precision, to a small amount of committed step using high accuracy, so as to reach one kind strategy for calculating acceleration.
The effectiveness that mixed-precision is calculated is to must assure that final computational accuracy is not lost.Langou etc. (2006 Year) dense matrix is applied on Cell and main flow CPU solves carried out comprehensive test to the effectiveness of mixed-precision algorithm. Goddeke etc. (2008) is solved in the multi grid of extensive matrix and is tested.These tests confirm mixed-precision Result of calculation is completely the same with the precision all with high precision computation result.
The content of the invention
For deficiency present in correlation technique, the technical problem to be solved is:There is provided one kind both to ensure Computational accuracy is constant, improves the mixed-precision deep learning algorithm of computational efficiency again.
To solve above-mentioned technical problem, the technical solution used in the present invention is:Mixed-precision deep learning algorithm, including with Lower step:S101, propagated forward calculating is carried out using single precision many-core processor, each nerve is calculated to each Internet The value of unit;S102, back-propagating calculating is carried out using single precision many-core processor, error residual is calculated to each Internet Value;S103, the increment that weights are calculated using single precision many-core processor;S104, the increasing that single precision many-core processor is calculated On the increment weights that amount right value update is calculated to high accuracy many-core processor, an iteration calculating is completed.
Preferably, the mixed-precision deep learning algorithm betides the iteration later stage of deep learning.
Preferably, it is in step 102, described to carry out back-propagating calculating, error residual value is calculated to each Internet, is had Body includes:The residual error of last layer is multiplied by into the weights of current layer, and multiplication is done with the gradient of current layer, draw all layers of error Residual value.
Preferably, in step 103, the increment for calculating weights is specifically included:Using what is calculated in back-propagating The residual values of each Internet, calculate weights variable quantity by gradient etc..
The present invention Advantageous Effects be:Mixed-precision deep learning algorithm in the present invention, first with single precision Propagated forward calculating, back-propagating calculating and the calculating of value increase are carried out, each nerve is calculated to each Internet The increment of the value, error residual value and weights of unit, then the single precision increment right value update that above-mentioned steps are calculated is to high accuracy On weights, an iteration calculating is completed;It is also required to change the spy of precision different from some optimized algorithms while efficiency is improved Levy, the present invention adopts high-precision calculating accelerated method using low precision, to a small amount of committed step to computation-intensive part, and which obtains The result for going out, and the precision of the result for all being drawn using high precision computation is completely the same;Deep learning in the present invention is adopted Many-core processor is being accelerated, and for many-core processor, single precision is often fast again than double precision, therefore mixes Accuracy computation can greatly improve calculating speed;Further, since single precision in algorithm, is utilized in a large number, therefore the corresponding traffic One times is reduced, in deep learning, the annexation between each layer is often full connection, therefore, reducing the traffic being capable of pole The expandability of big raising algorithm.
Description of the drawings
Fig. 1 is the schematic diagram of neuron in deep learning algorithm;
Fig. 2 is the schematic diagram of neural network structure in deep learning algorithm;
Fig. 3 is the schematic flow sheet of the mixed-precision deep learning algorithm that the embodiment of the present invention one is provided.
Specific embodiment
To make purpose, technical scheme and the advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is A part of embodiment of the present invention, rather than the embodiment of whole;Based on the embodiment in the present invention, ordinary skill people The every other embodiment obtained under the premise of creative work is not made by member, belongs to the scope of protection of the invention.
Fig. 3 is the schematic flow sheet of the mixed-precision deep learning algorithm that the embodiment of the present invention one is provided, as shown in figure 3, Mixed-precision deep learning algorithm, comprises the following steps:
S101, propagated forward calculating is carried out using single precision many-core processor, each god is calculated to each Internet The value of Jing units.
S102, back-propagating calculating is carried out using single precision many-core processor, error residual is calculated to each Internet Value.
S103, the increment that weights are calculated using single precision many-core processor.
S104, the increment right value update for calculating single precision many-core processor are calculated to high accuracy many-core processor On increment weights, an iteration calculating is completed.
Deep learning algorithm is broadly divided into the three below stage in realization:Propagated forward, back-propagating, parameter update. Propagated forward simply can be interpreted as calculating output according to given input.By the calculating for combining each layer layer, (can In the form of being interpreted as this compound functions of f (x)=h (g (x))), circulation way is bottom-up (bottom to top, institute It is referred to as propagated forward with us).As shown in Figure 2.Its main calculation process is:
The main thought of Back Propagation Algorithm is:A sample (x, y) is given, " forward conduction " computing is carried out first, is counted Calculate all of activation value in network, including h { W, b } output valves (x).Afterwards, for each node i of l layers, calculate Go out its " residual error " δi l, how much the residual error affects if indicating the node and the residual error of final output value is generated.It is defeated for final Egress, can directly calculate the gap between the activation value and actual value of network generation, be δ by this gap definitioni n(n-th Layer represents output layer).For how hidden unit is processedWill be based on the weighted mean of node (l+1 node layers) residual error Calculate δi l, these nodes are with ai lAs input.By calculating with the derivative of weights and current layer, successively can draw each The residual error of layer.
Put it briefly, the weights of current layer, and and current layer are exactly multiplied by back-propagating calculating using the residual error of last layer Gradient do multiplication, so draw all layers of residual values.
Final step is exactly the renewal of weighting parameter, exactly using the residual error of each layer calculated in back propagation Value, calculates weights variable quantity by gradient etc., then carries out right value update.
Specifically, the mixed-precision deep learning algorithm betides the iteration later stage of deep learning.
Why the method calculated by this utilization mixed-precision, ensure that precision is not lost, and be have one in fact Precondition, i.e., the increment size for updating every time is much smaller than full dose value on absolute value, as long as so as to increment has several positions Significant digits can meet required precision.But, deep learning when incipient stage iteration, increment size and full dose value, Difference in size less, has simply arrived the iteration later stage, iterative value relatively true value when, the two just meets this precondition. Therefore, the numerical procedure of this mixed-precision can be used only in the iteration later stage at present.This is also often very important, because iteration Most of the time, 80 or so general percent be flower in the iteration in later stage.
Specifically, it is in step 102, described to carry out back-propagating calculating, error residual value is calculated to each Internet, is had Body includes:
The residual error of last layer is multiplied by into the weights of current layer, and multiplication is done with the gradient of current layer, draw all layers of mistake Difference residual value.
Specifically, in step 103, the increment for calculating weights is specifically included:
Using the residual values of each Internet calculated in back-propagating, weights change is calculated by gradient etc. Amount.
Compared with prior art, the present invention has advantages below:1st, ensure that computational accuracy is constant;Different from some optimizations Algorithm, while efficiency is improved have also been changed precision, so where some are higher to required precision, is to make it difficult for people to connect Receive.2nd, computational efficiency is improve, current depth study is widely used many-core processor and is accelerated, and to many-core processor For, single precision is often fast again than double precision, therefore mixed-precision is calculated and can greatly improve calculating speed.3rd, it is expansible Ability is strong, and due to utilizing single precision in algorithm in a large number, therefore the corresponding traffic also reduces one times;In deep learning, respectively Annexation between layer is often full connection, therefore, reduce the expandability that the traffic can greatly improve algorithm.
Finally it should be noted that:Various embodiments above only to illustrate technical scheme, rather than a limitation;To the greatest extent Pipe has been described in detail to the present invention with reference to foregoing embodiments, it will be understood by those within the art that:Its according to So the technical scheme described in foregoing embodiments can be modified, or which part or all technical characteristic are entered Row equivalent;And these modifications or replacement, do not make the essence of appropriate technical solution depart from various embodiments of the present invention technology The scope of scheme.

Claims (4)

1. mixed-precision deep learning algorithm, it is characterised in that:Comprise the following steps:
S101, propagated forward calculating is carried out using single precision many-core processor, each neuron is calculated to each Internet Value;
S102, back-propagating calculating is carried out using single precision many-core processor, error residual value is calculated to each Internet;
S103, the increment that weights are calculated using single precision many-core processor;
The increment that S104, the increment right value update for calculating single precision many-core processor are calculated to high accuracy many-core processor On weights, an iteration calculating is completed.
2. mixed-precision deep learning algorithm according to claim 1, it is characterised in that:The mixed-precision deep learning Algorithm betides the iteration later stage of deep learning.
3. mixed-precision deep learning algorithm according to claim 1, it is characterised in that:In step 102, it is described carry out after Calculate to propagating, error residual value is calculated to each Internet, is specifically included:
The residual error of last layer is multiplied by into the weights of current layer, and multiplication is done with the gradient of current layer, show that all layers of error is residual Value.
4. mixed-precision deep learning algorithm according to claim 1, it is characterised in that:It is in step 103, described to calculate power The increment of value, specifically includes:
Using the residual values of each Internet calculated in back-propagating, weights variable quantity is calculated by gradient etc..
CN201611132257.8A 2016-12-09 2016-12-09 Hybrid precision deep learning algorithm Pending CN106650931A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611132257.8A CN106650931A (en) 2016-12-09 2016-12-09 Hybrid precision deep learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611132257.8A CN106650931A (en) 2016-12-09 2016-12-09 Hybrid precision deep learning algorithm

Publications (1)

Publication Number Publication Date
CN106650931A true CN106650931A (en) 2017-05-10

Family

ID=58825861

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611132257.8A Pending CN106650931A (en) 2016-12-09 2016-12-09 Hybrid precision deep learning algorithm

Country Status (1)

Country Link
CN (1) CN106650931A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480782A (en) * 2017-08-14 2017-12-15 电子科技大学 Learn neural network processor on a kind of piece
CN109598344A (en) * 2018-12-14 2019-04-09 北京字节跳动网络技术有限公司 Model generating method and device
CN109670579A (en) * 2018-12-14 2019-04-23 北京字节跳动网络技术有限公司 Model generating method and device
JP2019079535A (en) * 2017-10-23 2019-05-23 三星電子株式会社Samsung Electronics Co.,Ltd. Method and apparatus for processing parameters
CN110097188A (en) * 2019-04-30 2019-08-06 科大讯飞股份有限公司 A kind of model training method, working node and parameter update server
CN111667061A (en) * 2020-04-17 2020-09-15 中国海洋大学 Deep learning framework transplanting and optimizing method and system based on target many-core
US11385875B2 (en) 2019-01-31 2022-07-12 Google Llc Propagating reduced-precision on computation graphs

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480782A (en) * 2017-08-14 2017-12-15 电子科技大学 Learn neural network processor on a kind of piece
JP2019079535A (en) * 2017-10-23 2019-05-23 三星電子株式会社Samsung Electronics Co.,Ltd. Method and apparatus for processing parameters
JP7072484B2 (en) 2017-10-23 2022-05-20 三星電子株式会社 Parameter processing method and equipment
US11544549B2 (en) 2017-10-23 2023-01-03 Samsung Electronics Co., Ltd. Method and apparatus with neural network
CN109598344A (en) * 2018-12-14 2019-04-09 北京字节跳动网络技术有限公司 Model generating method and device
CN109670579A (en) * 2018-12-14 2019-04-23 北京字节跳动网络技术有限公司 Model generating method and device
CN109598344B (en) * 2018-12-14 2020-10-02 北京字节跳动网络技术有限公司 Model generation method and device
US11385875B2 (en) 2019-01-31 2022-07-12 Google Llc Propagating reduced-precision on computation graphs
US11972238B2 (en) 2019-01-31 2024-04-30 Google Llc Propagating reduced-precision on computation graphs
CN110097188A (en) * 2019-04-30 2019-08-06 科大讯飞股份有限公司 A kind of model training method, working node and parameter update server
CN111667061A (en) * 2020-04-17 2020-09-15 中国海洋大学 Deep learning framework transplanting and optimizing method and system based on target many-core
CN111667061B (en) * 2020-04-17 2023-04-18 中国海洋大学 Deep learning framework transplanting and optimizing method and system based on target many-core

Similar Documents

Publication Publication Date Title
CN106650931A (en) Hybrid precision deep learning algorithm
CN103902785B (en) One kind is based on polynary probabilistic structural finite element model updating method
CN106875013A (en) The system and method for optimizing Recognition with Recurrent Neural Network for multinuclear
CN107316099A (en) Ammunition Storage Reliability Forecasting Methodology based on particle group optimizing BP neural network
CN106203625B (en) A kind of deep-neural-network training method based on multiple pre-training
CN106920008A (en) A kind of wind power forecasting method based on Modified particle swarm optimization BP neural network
CN105978732B (en) A kind of method and system based on the most simple echo state network parameter of particle group optimizing
Cintra et al. Data assimilation by artificial neural networks for an atmospheric general circulation model
CN110298443A (en) Neural network computing device and method
CN109376852A (en) Arithmetic unit and operation method
CN114218875A (en) Acceleration method and device for flow field prediction
CN105427241B (en) Distortion correction method for large-view-field display equipment
CN108038507A (en) Local receptor field extreme learning machine image classification method based on particle group optimizing
CN104899431A (en) LSSVM fluctuating wind speed prediction method based on integration of ant colony and particle swarm
CN102495932A (en) Finite element model updating method based on response surface modeling and improved particle swarm algorithm
CN108171319A (en) The construction method of the adaptive depth convolution model of network connection
CN106951960A (en) A kind of learning method of neutral net and the neutral net
CN104360896A (en) Parallel fluid simulation acceleration method based on GPU (Graphics Processing Unit) cluster
CN115437795B (en) Video memory recalculation optimization method and system for heterogeneous GPU cluster load perception
CN104101344A (en) MEMS (micro electro mechanical system) gyroscope random error compensation method based on particle swarm wavelet network
CN114626499A (en) Embedded multi-agent reinforcement learning method using sparse attention to assist decision making
CN112949835A (en) Inference method and device for knowledge graph based on convolution cyclic neural network
CN108985449A (en) A kind of control method and device of pair of convolutional neural networks processor
CN104731709B (en) A kind of Software Defects Predict Methods based on JCUDASA_BP algorithms
CN103646188A (en) Non-invasive diagnostic method of coronary heart disease based on hybrid intelligent algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170510