CN107341540A - A kind of apparatus and method for performing Hessian-Free training algorithms - Google Patents

A kind of apparatus and method for performing Hessian-Free training algorithms Download PDF

Info

Publication number
CN107341540A
CN107341540A CN201610283885.XA CN201610283885A CN107341540A CN 107341540 A CN107341540 A CN 107341540A CN 201610283885 A CN201610283885 A CN 201610283885A CN 107341540 A CN107341540 A CN 107341540A
Authority
CN
China
Prior art keywords
msub
submodule
unit
updated
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610283885.XA
Other languages
Chinese (zh)
Other versions
CN107341540B (en
Inventor
张士锦
郭崎
陈天石
陈云霁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cambricon Technologies Corp Ltd
Beijing Zhongke Cambrian Technology Co Ltd
Original Assignee
Beijing Zhongke Cambrian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Cambrian Technology Co Ltd filed Critical Beijing Zhongke Cambrian Technology Co Ltd
Priority to CN201610283885.XA priority Critical patent/CN107341540B/en
Priority to PCT/CN2016/081842 priority patent/WO2017185413A1/en
Publication of CN107341540A publication Critical patent/CN107341540A/en
Application granted granted Critical
Publication of CN107341540B publication Critical patent/CN107341540B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Complex Calculations (AREA)
  • Advance Control (AREA)

Abstract

A kind of apparatus and method for performing Hessian Free training algorithms, the device include direct memory access unit, controller unit, data processing module and data cache module.Hessian Free training algorithms can be realized using the device, complete the training to various neutral nets, such as autocoder, convolutional neural networks RNN.During each iteration, the second Taylor series are done to error function (object function), and add damping term, estimation as object function, afterwards according to current gradient, Gauss-Newton matrix, damping function and damped coefficient, renewal vector is tried to achieve with the conjugate gradient method for having fore condition, updates parameter to be updated.Continue iteration until parameter vector to be updated is restrained.

Description

A kind of apparatus and method for performing Hessian-Free training algorithms
Technical field
The present invention relates to neural network computing technical field, relates more specifically to a kind of for performing Hessian-Free instructions Practice the apparatus and method of algorithm.
Background technology
Gradient descent method is widely used in fields such as function approximation, optimization calculating, Pattern recognition and image processings.At present The training method of the neutral net of main flow is gradient descent method (with reference to BP algorithm), but this method have ignored error function Curvature information, not only easily occurs that Parameters variation is excessively gentle, so as to which the situation of local best points can not be converged to, and can not The error function (such as Rosenbrock functions) of processing " ill curvature " well.Hessian-Free training algorithms are good Solve this problem, and improved by some details, be allowed to not occur operand increase on number of parameters square (and Gradient descent method is equally linear increase) situation.
At present, a kind of known method of execution Hessian-Free training algorithms is to use general processor.This method is led to Cross and perform universal command using general-purpose register and general utility functions part to support above-mentioned algorithm.One of the shortcomings that this method is The operational performance of single general processor is relatively low.And multiple general processors are when performing parallel, phase intercommunication between general processor Letter becomes performance bottleneck again.In addition, general processor needs related operation corresponding to Hessian-Free training algorithms to translate Code brings larger power dissipation overhead into a queue of computing and access instruction sequence, the decoding of processor front end.
The known method that another kind performs Hessian-Free training algorithms is to use graphics processor (GPU).This method General SIMD instruction is performed to support above-mentioned algorithm by using general-purpose register and general stream processing unit.Because GPU is The equipment for performing graph image computing and scientific algorithm is specifically used to, not to Hessian-Free training algorithm related operations Special support, it is still desirable to substantial amounts of front end work decoding could perform fortune related in Hessian-Free training algorithms Calculate, bring substantial amounts of overhead.In addition, GPU only has less upper caching, data (such as Gauss-Newton needed for computing Matrix etc.) need to carry outside piece repeatedly, the outer bandwidth of piece becomes main performance bottleneck, while brings huge power consumption and open Pin.
The content of the invention
In view of this, it is an object of the invention to provide it is a kind of be used for perform Hessian-Free training algorithms device and Method, at least one of firm above-mentioned technical problem.
To achieve these goals, as one aspect of the present invention, it is used to perform the invention provides one kind The device of Hessian-Free training algorithms, including:
Controller unit, for the microcommand by the Instruction decoding of reading for control corresponding module, and send it to phase Answer module;
Data buffer storage unit, initialization is performed for storing the intermediate variable in calculating process, and to the intermediate variable And renewal operation;
Data processing module, deposited for performing arithmetic operation under the control of the controller unit, and by intermediate variable It is stored in the data buffer storage unit.
Wherein, the data processing module includes operation control submodule, gradient algorithm submodule, damping term computing submodule Block, Gauss-Newton matrix operation submodule, conjugate gradient method computing submodule and basic operation submodule;Wherein, it is described basic What computing submodule was carried out is adding, multiplying basic operations between matrix, vector;
Preferably, the gradient algorithm submodule, damping term computing submodule, Gauss-Newton matrix operation submodule, Conjugate gradient method computing submodule can call the basic operation submodule, and the according to circumstances gradient algorithm submodule Allow between block, damping term computing submodule, Gauss-Newton matrix operation submodule, conjugate gradient method computing submodule mutual Call.
Wherein, the data buffer storage unit initializes f (θ) second order estimation in device initializationIn n-th Parameter vector θ to be updatednRenewal start before, willRead into data processing module, and in the data processing module In obtain will after renewal vectorWrite again;Wherein, θ is parameter vector to be updated, θnFor n-th parameter to be updated to Amount, f (θ) is error function, i.e., the function that the actual value of weighing result deviates with predicted value;δnIt is renewal vector, and θn+1n+ δn
Wherein, the data buffer storage unit is initializingThe step of in, initialize gradient thereinGauss-Newton matrix Gf, damped coefficient λ and damping functionWherein,The ladder DegreeRefer to f in θnThe Grad at place, GfIt is f in θnThe Gauss-Newton matrix at place;Damping functionIt is according to training The function that model has predefined is in θnThe value at place;Damped coefficient λ is tried to achieve by LM formula heuristics;
The data processing module is read from the data buffer storage unitRead and treat from outside designated space Undated parameter vector θn;Renewal vector δ is obtained in modulen, by θnIt is updated to θn+1, it is correspondingIt is updated toSo Afterwards willWrite-in is to the data buffer storage unit, by θn+1It is written in outside designated space;Wherein, θn+1For (n+1)th time Parameter vector to be updated,Estimate for f (θ+1) second order.
As another aspect of the present invention, present invention also offers a kind of side of execution Hessian-Free training algorithms Method, comprise the following steps:
Step (1), by instruction, the initialization operation of data buffer storage unit is completed, that is, initializes f (θ) second order estimationWherein, θ is parameter vector to be updated, θnFor n-th parameter vector to be updated, f (θ) is error function, i.e. weighing result The function that deviates of actual value and predicted value;δnIt is renewal vector, and θn+1nn
Step (2), by I/O instruction, complete direct memory access unit and read parameter vector to be updated from exterior space Operation;
Step (3), data processing module is according to command adapted thereto, in θnPlace carries out second order Taylor's exhibition to error function f (θ) Open, and add damping termF (θ) is obtained in θnNeighbouring estimationI.e.
Wherein, GfIt is f in θnThe Gauss-Newton matrix at place;Damped coefficient λ is tried to achieve by LM formula heuristics;Damp letter NumberIt is the function predefined according to training pattern in θnThe value at place;
Step (4), for data processing module according to corresponding instruction, the conjugate gradient method for carrying out fore condition seeks δnSo thatReach minimum value, and θnIt is updated to θn+1, specifically updating operation is:
θn+1nn
Step (5), data processing unit judge whether the parameter vector after renewal restrains, if convergence, computing terminates, no Then, step (2) place is gone to continue executing with.
Wherein, the step of initialization operation that data buffer storage unit is completed described in step (1), includes:To gradientGauss-Newton matrix Gf, damped coefficient λ and damping functionCarry out zero-setting operation.
Wherein, in step (3) when carrying out RNN training,
Damping function
Wherein S and f is distance function, GSIt is S in θnThe Gauss-Newton matrix at place, μ are one and predefined just Number.
Wherein, step (4) conjugate gradient method for carrying out fore condition seeks δnSo thatThe step of reaching minimum value In, during carrying out having the conjugate gradient method of fore condition, " mini-batch " rather than all samples are only used, and be directed to Gauss-Newton Matrix Multiplication vector computing all pass throughMake implicit approximate meter Calculate.
As another aspect of the invention, present invention also offers a kind of side of execution Hessian-Free training algorithms Method, it is characterised in that comprise the following steps:
Step S1, an I/O instruction is pre-deposited at the first address of instruction cache unit.
Step S2, computing start, and controller unit reads this I/O instruction from the first address of instruction cache unit, according to translating The microcommand gone out, direct memory access unit own from external address space reading is all relevant with Hessian-Free calculating Instruction, and buffered into instruction cache unit;
Step S3, controller unit read in an I/O instruction from instruction cache unit, directly interior according to the microcommand translated Deposit access unit and read initial parameter vector θ to be updated from exterior space0Into data processing module;
Step S4, controller unit reads in assignment directive from instruction cache unit, according to the microcommand translated, data buffer storage In unitInitialize, the iterations n in data processing unit is arranged to 0;Wherein, θ be parameter to be updated to Amount, θnFor n-th parameter vector to be updated, f (θ) is error function, i.e., the letter that the actual value of weighing result deviates with predicted value Number;δnIt is renewal vector, and θn+1nnEstimate for f (θ) second order;
Step S5, controller unit read in an I/O instruction from instruction cache unit, directly interior according to the microcommand translated Deposit access unit and read parameter vector θ to be updated from exterior spacenIt is passed in data processing module;
Step S6, controller unit read in one to error function in parameter current vector value from instruction cache unit The instruction of second order estimation is nearby carried out, according to the microcommand translated, ask f (θ) in θnNeighbouring second order estimation Operation;In the operation, instruction is sent to operation control submodule, operation control submodule send command adapted thereto carry out with Lower operation:Calculated using gradient algorithm submoduleUsing in Gauss-Newton computing submodule and basic operation submodule Matrix multiplication operation, obtain f in θnThe Gauss-Newton matrix G at placef;Utilize damping term computing submodule and basic operation submodule Block performs LM heuristics and obtains damped coefficient λ, and then obtains damping termFinally, pass throughObtainExpression formula It is stored in data buffer storage unit;Wherein, damping functionIt is the function predefined according to training pattern in θnThe value at place;
Step S7, controller unit reads a data transmission instruction from instruction cache unit, according to the microcommand translated, WillIt is sent to from data buffer storage unit in data processing unit;
Step S8, controller unit reads a parameter renewal operational order from instruction cache unit, micro- according to what is translated Instruction, carries out seeking δ with the conjugate gradient method for having fore conditionnSo thatReach minimum value, and θnIt is updated to θn+1Operation; Direct memory access unit reads parameter vector θ to be updated from exterior spacenIt is passed in data processing module;Operation control Module control related operation module proceeds as follows:Obtained more using conjugate gradient computing submodule and basic operation submodule New vectorial δn;Finally, using the vectorial addition in basic operation submodule by θnIt is updated to θn+1
Step S9, controller unit reads an I/O instruction from instruction cache unit, according to the microcommand translated, after renewal Parameter vector θn+1Outside designated space is sent to from data processing unit by direct memory access unit;
Step S10, controller unit reads a convergence decision instruction from instruction cache unit, according to the micro- finger translated Order, data processing unit judge the parameter vector θ after renewaln+1Whether restrain:If convergence, computing terminate;Otherwise, iterations n Value increase by 1, go back to and perform step S5.
As the still another aspect of the present invention, present invention also offers a kind of dress of execution Hessian-Free training algorithms Put, the journey for performing the method as described above for performing Hessian-Free training algorithms is solidified with the controller of described device Sequence.
Understand that apparatus and method of the present invention has the advantages that based on above-mentioned technical proposal:Can using the device To realize Hessian-Free training algorithms, the training to various neutral nets is completed, such as autocoder (auto- Encoders), convolutional neural networks (RNN) etc.;By using dedicated for perform Hessian-Free training algorithms equipment, Can solve the problem of general processor operational performance of data is insufficient, and leading portion decoding overheads are big, accelerate the execution of related application Speed;Meanwhile to the application of data buffer storage unit, avoid repeatedly to memory read data, reduce the bandwidth of internal storage access.
Brief description of the drawings
Fig. 1 is the device for being used to realize Hessian-Free training algorithm related applications according to one embodiment of the invention Overall structure example block diagram;
Fig. 2 is in the device for being used to realize Hessian-Free training algorithm related applications according to one embodiment of the invention The example block diagram of data processing module;
Fig. 3 is the computing stream for being used to realize Hessian-Free training algorithm related applications according to one embodiment of the invention Cheng Tu.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific embodiment, and reference Accompanying drawing, the present invention is described in further detail.By described in detail below, other side of the invention, advantage and protrusion Feature will become obvious for those skilled in the art.
In this manual, following various embodiments for being used to describe the principle of the invention simply illustrate, should not be with any Mode is construed to the scope of limitation invention.Referring to the drawings described below is used to help comprehensive understanding by claim and its equivalent The exemplary embodiment of the invention that thing limits.It is described below to help to understand including a variety of details, but these details should Think what is be merely exemplary.Therefore, it will be appreciated by those of ordinary skill in the art that without departing substantially from scope and spirit of the present invention In the case of, embodiment described herein can be made various changes and modifications.In addition, for clarity and brevity, Eliminate the description of known function and structure.In addition, running through accompanying drawing, same reference numbers are used for identity function and operation.
It is used to performing the devices of Hessian-Free training algorithms the invention discloses a kind of, including instruction cache unit, Instruction decoding unit, direct memory access unit, data processing module and data cache module.It can be realized using the device Hessian-Free training algorithms, the training to various neutral nets is completed, such as autocoder (auto-encoders), volume Product neutral net (RNN) etc..During each iteration, the second Taylor series are done to error function (object function), and add damping , as the estimation of object function, afterwards according to current gradient, Gauss-Newton matrix, damping function (dampling Function) and damped coefficient (dampling constant), with the conjugate gradient method for having fore condition (PreconditioningCG-Minimize) renewal vector is tried to achieve, updates parameter to be updated.Continue iteration until ginseng to be updated Number vector is restrained.
More specifically, the inventive system comprises direct memory control unit, instruction cache unit, controller unit, number According to buffer unit and data processing module.Wherein, direct memory access unit is able to access that external address space, can be to device Internal each buffer unit read-write data, complete the loading and storage of data, specifically include to instruction cache unit and read instruction, From parameter to be updated and corresponding Grad is read between designated memory cell to data processing unit, by the parameter after renewal to Amount writes direct outside designated space from data processing module;Instruction cache unit is read by direct memory access unit to be referred to Order, and cache the instruction of reading;Controller unit reads instruction from instruction cache unit, is other moulds of control by Instruction decoding The microcommand of block behavior simultaneously sends it to other modules such as direct memory access unit, data buffer storage unit, data processing mould Block etc.;Some intermediate variables needed in the operation of data buffer storage unit storage device, and these variables are initialized and updated Operation;Data processing module does corresponding arithmetic operation according to instruction.
In addition, the invention also discloses a kind of method of execution Hessian-Free training algorithms, comprise the following steps:
Step (1), by instruction, the initialization operation of data buffer storage unit is completed, that is, initializes f (θ) second order estimationSpecifically, it is to gradient thereinGauss-Newton matrix Gf, damped coefficient λ and damping functionEnter Row zero-setting operation.
Step (2), by I/O instruction, complete direct memory access unit and read parameter vector to be updated from exterior space Operation.
Step (3), data processing module is according to command adapted thereto, in θnPlace is carried out to error function f (θ) The second Taylor series, and add damping termF (θ) is obtained in θnNeighbouring estimationI.e.
Wherein, GfIt is f in θnThe Gauss-Newton matrix at place;δnIt is renewal vector;Damped coefficient λ is to use LM The method of formula heuristic (Levenburg-Marquardt style heuristics) is tried to achieve;Damping functionIt is the function predefined according to training pattern in θnThe value at place, such as during progress RNN training,S is similar with f, is distance function, GSIt is S in θnThe Gauss at place- Newton matrix, μ (weighting constant) are a positive numbers predefined.
Step (4), for data processing module according to corresponding instruction, the conjugate gradient method for carrying out fore condition seeks δnSo thatReach minimum value, and θnIt is updated to θn+1Operation.Renewal operation is as follows:
θn+1nn
It is noted that carrying out, with during having the conjugate gradient method of fore condition, only using " mini- Batch " rather than all samples, and the computing for the Gauss-Newton Matrix Multiplication vector being directed to all passes throughDo implicit approximate calculation (Pearlmutter R { }-method).So both improve The efficiency of big data study improves the operation efficiency of data operation module in other words, it also avoid operand and is put down with number of parameters Fang Zengchang situation.
Step (5), data processing unit judge whether the parameter vector after renewal restrains, if convergence, computing terminates, no Then, step (2) place is gone to continue executing with.
The device for the Hessian-Free training algorithms realized according to embodiments of the present invention, can be supporting to use Application in terms of Hessian-Free training algorithms.A space memory error function is opened up in every generation in data buffer storage unit Second order estimation near parameter to be updated, when every time have the conjugate gradient method of fore condition, estimate to calculate using the second order One renewal vector, then carry out treating the renewal operation of renewal vector.Repeat above-mentioned steps, until vector to be updated is received Hold back.
Explanation is further elaborated in technical scheme below in conjunction with the accompanying drawings.
Fig. 1 shows the entirety according to an embodiment of the invention for being used to realize the device of Hessian-Free training algorithms Topology example block diagram.As shown in figure 1, the device include direct memory access unit 1, instruction cache unit 2, controller unit 3, Data buffer storage unit 4 and data processing module 5, can be realized by hardware circuit.
Direct memory access unit 1 is able to access that external address space, can be read and write to each buffer unit in device inside Data, complete the loading and storage of data.Specifically include to read to instruction cache unit 2 and instruct, between designated memory cell Parameter to be updated is read to data processing unit 5, the parameter vector after renewal is write direct into outside from data processing module 5 and referred to Determine space.
Instruction cache unit 2 is read by direct memory access unit 1 and instructed, and caches the instruction of reading.
Controller unit 3 reads instruction from instruction cache unit 2, by Instruction decoding for control other module behaviors it is micro- Instruct and send it to other modules such as direct memory access unit 1, data buffer storage unit 4, data processing module 5 etc..
Data buffer storage unit 4 initializes in device initializationSpecifically, it is initialization gradient thereinGauss-Newton matrix Gf, damped coefficient λ and damping functionIn n-th parameter vector θ to be updatednRenewal Before beginning, it can incite somebody to actionRead into data processing module 5.Renewal vector δ is obtained in data processing module 5n, by θnRenewal For θn+1, it is correspondingIt is updated toThen willTo data buffer storage unit 4, (new data were by before for write-in Corresponding data cover), for next time use.
Data processing module 5 is read from data buffer storage unit 4Referred to by direct memory access unit 1 from outside Determine to read parameter vector θ to be updated in spacen.Renewal vector δ is obtained in modulen, by θnIt is updated to θn+1, it is corresponding It is updated toThen willWrite-in is to data buffer storage unit 4, by θn+1It is written to by direct memory access unit 1 In outside designated space.
Fig. 2 shows the device for being used to realize Hessian-Free training algorithm related applications according to embodiments of the present invention The example block diagram of middle data processing module.As shown in Fig. 2 data processing module includes operation control submodule 51, gradient algorithm Submodule 52, damping term computing submodule 53, Gauss-Newton matrix operation submodule 54, conjugate gradient method computing submodule 55 And basic operation submodule 56.Wherein, what basic operation submodule 56 was carried out is between matrix, vector plus the basis fortune such as multiplies Calculate, 52,53,54,55 submodules can all call 56 submodules, and according to circumstances, these modules also allow mutual from each other Call.
Fig. 3 shows the overview flow chart for the device that related operation is carried out according to Hessian-Free training algorithms.
In step S1, an I/O instruction is pre-deposited at the first address of instruction cache unit 2.
In step S2, computing starts, and control unit 3 reads this I/O instruction from the first address of instruction cache unit 2, according to The microcommand translated, direct memory access unit 1 read all relevant with Hessian-Free calculating from external address space All instructions, and buffered into instruction cache unit 2.
In step S3, controller unit 3 reads in an I/O instruction from instruction cache unit 2, according to the microcommand translated, directly Connect internal storage access unit 1 and read initial parameter vector θ to be updated from exterior space0Into data processing module 5.
In step S4, controller unit 3 reads in assignment directive from instruction cache unit 2, according to the microcommand translated, data In buffer unit 4Initialize, the iterations n in data processing unit 5 is arranged to 0.
In step S5, controller unit 3 reads in an I/O instruction from instruction cache unit 2, according to the microcommand translated, directly Connect internal storage access unit 1 and read parameter vector θ to be updated from exterior spacenIt is passed in data processing module 5.
In step S6, controller unit 3 reads in one to error function in parameter current vector value from instruction cache unit 2 The instruction of second order estimation is nearby carried out, according to the microcommand translated, ask f (θ) in θnNeighbouring second order estimation's Operation.In the operation, instruction is sent to operation control submodule 51, and it is following that operation control submodule 51 sends command adapted thereto progress Operation:Calculated using gradient algorithm submodule 52Utilize Gauss-Newton computing submodule 54 and basic operation submodule Matrix multiplication operation in 56, f is obtained in θnThe Gauss-Newton matrix G at placef;Utilize damping term computing submodule 53 and basic fortune Operator module 56 performs LM heuristics and obtains damped coefficient λ, and then obtains damping termFinally, obtain Expression formula deposit data buffer storage unit 4.
In step S7, controller unit 3 reads a data transmission instruction from instruction cache unit 2, micro- according to what is translated Instruction, willIt is sent to from data buffer storage unit 4 in data processing unit 5.
In step S8, controller unit 3 reads a parameter renewal operational order from instruction cache unit 2, according to translating Microcommand, carry out seeking δ with the conjugate gradient method for having fore conditionnSo thatReach minimum value, and θnIt is updated to θn+1Behaviour Make.Direct memory access unit 1 reads parameter vector θ to be updated from exterior spacenIt is passed in data processing module 5.Computing Control submodule 51 controls related operation module to proceed as follows:Utilize conjugate gradient computing submodule 55 and basic operation Module 56 obtains renewal vector δn.Among these, according to damping functionExpression formula, it is also possible to need call Gauss-Newton- Computing module (than RNN as previously mentioned example).Finally, using the vectorial addition in basic operation submodule 56 by θnMore New is θn+1
In step S9, controller unit 3 reads an I/O instruction from instruction cache unit 2, according to the microcommand translated, more Parameter vector θ after newn+1Outside designated space is sent to from data processing unit 5 by direct memory access unit 1.
In step S10, controller unit reads a convergence decision instruction from instruction cache unit 2, micro- according to what is translated Instruction, data processing unit judge the parameter vector θ after renewaln+1Whether restrain:If convergence, computing terminate;Otherwise, iteration time Number n value increases by 1, goes back to and performs step S5.
The process or method described in accompanying drawing above can be by including hardware (for example, circuit, special logic etc.), solid Part, software (for example, being carried on the software in non-transient computer-readable media), or both the processing logic of combination hold OK.Although process or method are described according to the operation of some orders above, however, it is to be understood that described some operation energy Performed with different order.In addition, concurrently rather than certain operations can be sequentially performed.
In foregoing specification, various embodiments of the present invention are described with reference to its certain exemplary embodiments.Obviously, may be used Various modifications are made to each embodiment, and do not depart from the wider spirit and scope of the invention described in appended claims. Correspondingly, specification and drawings should be considered as illustrative and not restrictive.

Claims (10)

  1. A kind of 1. device for being used to perform Hessian-Free training algorithms, it is characterised in that including:
    Controller unit, for the microcommand by the Instruction decoding of reading for control corresponding module, and send it to respective mode Block;
    Data buffer storage unit, for storing the intermediate variable in calculating process, and to intermediate variable execution initialization and more New operation;
    Data processing module, it is stored in for performing arithmetic operation under the control of the controller unit, and by intermediate variable In the data buffer storage unit.
  2. 2. the device as claimed in claim 1 for being used to perform Hessian-Free training algorithms, it is characterised in that the data Processing module includes operation control submodule, gradient algorithm submodule, damping term computing submodule, Gauss-Newton matrix operation Submodule, conjugate gradient method computing submodule and basic operation submodule;Wherein, what the basic operation submodule was carried out is square Adding, multiplying basic operations between battle array, vector;
    Preferably, the gradient algorithm submodule, damping term computing submodule, Gauss-Newton matrix operation submodule, conjugation Gradient method computing submodule can call the basic operation submodule, and the according to circumstances gradient algorithm submodule, resistance Allow to call mutually between Buddhist nun's item computing submodule, Gauss-Newton matrix operation submodule, conjugate gradient method computing submodule.
  3. 3. the device as claimed in claim 1 for being used to perform Hessian-Free training algorithms, it is characterised in that the data Buffer unit initializes f (θ) second order estimation in device initializationIn n-th parameter vector θ to be updatednRenewal , will before beginningRead into data processing module, and incited somebody to action after obtaining renewal vector in the data processing moduleWrite again;Wherein, θ is parameter vector to be updated, θnFor n-th parameter vector to be updated, f (θ) is error function, The function that i.e. actual value of weighing result deviates with predicted value;δnIt is renewal vector, and θn+1nn
  4. 4. the device as claimed in claim 3 for being used to perform Hessian-Free training algorithms, it is characterised in that the data Buffer unit is initializingThe step of in, initialize gradient thereinGauss-Newton matrix Gf, damped coefficient λ and damping functionWherein, The gradientRefer to f in θnThe Grad at place, GfIt is f in θnThe Gauss-Newton matrix at place;Damping functionIt is basis The function that training pattern has predefined is in θnThe value at place;Damped coefficient λ is tried to achieve by LM formula heuristics;
    The data processing module is read from the data buffer storage unitGinseng to be updated is read from outside designated space Number vector θn;Renewal vector δ is obtained in modulen, by θnIt is updated to θn+1, it is correspondingIt is updated toThen willWrite-in is to the data buffer storage unit, by θn+1It is written in outside designated space;Wherein, θn+1It is to be updated for (n+1)th time Parameter vector,Estimate for f (θ+1) second order.
  5. A kind of 5. method of execution Hessian-Free training algorithms, it is characterised in that comprise the following steps:
    Step (1), by instruction, the initialization operation of data buffer storage unit is completed, that is, initializes f (θ) second order estimation Wherein, θ is parameter vector to be updated, θnFor n-th parameter vector to be updated, f (θ) is the reality of error function, i.e. weighing result The function that actual value deviates with predicted value;δnIt is renewal vector, and θn+1nn
    Step (2), by I/O instruction, complete the behaviour that direct memory access unit reads parameter vector to be updated from exterior space Make;
    Step (3), data processing module is according to command adapted thereto, in θnPlace carries out the second Taylor series to error function f (θ), and adds Add damping termF (θ) is obtained in θnNeighbouring estimationI.e.
    <mrow> <mover> <mi>f</mi> <mo>~</mo> </mover> <mrow> <mo>(</mo> <msub> <mi>&amp;delta;</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>M</mi> <msub> <mi>&amp;theta;</mi> <mi>n</mi> </msub> </msub> <mrow> <mo>(</mo> <mi>&amp;theta;</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&amp;lambda;</mi> <mi>n</mi> </msub> <msub> <mi>R</mi> <msub> <mi>&amp;theta;</mi> <mi>n</mi> </msub> </msub> <mrow> <mo>(</mo> <mi>&amp;theta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mi>&amp;theta;</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mo>&amp;dtri;</mo> <mi>f</mi> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;theta;</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msub> <mi>&amp;delta;</mi> <mi>n</mi> </msub> <mo>+</mo> <mfrac> <mrow> <msup> <msub> <mi>&amp;delta;</mi> <mi>n</mi> </msub> <mi>T</mi> </msup> <msub> <mi>G</mi> <mi>f</mi> </msub> <msub> <mi>&amp;delta;</mi> <mi>n</mi> </msub> </mrow> <mn>2</mn> </mfrac> <mo>+</mo> <msub> <mi>&amp;lambda;R</mi> <msub> <mi>&amp;theta;</mi> <mi>n</mi> </msub> </msub> <mrow> <mo>(</mo> <mi>&amp;theta;</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow>
    Wherein, GfIt is f in θnThe Gauss-Newton matrix at place;Damped coefficient λ is tried to achieve by LM formula heuristics;Damping functionIt is the function predefined according to training pattern in θnThe value at place;
    Step (4), for data processing module according to corresponding instruction, the conjugate gradient method for carrying out fore condition seeks δnSo thatReach To minimum value, and θnIt is updated to θn+1, specifically updating operation is:
    θn+1nn
    Step (5), data processing unit judge whether the parameter vector after renewal restrains, if convergence, computing terminate, otherwise, turn Continued executing with to step (2) place.
  6. 6. the method for Hessian-Free training algorithms is performed as claimed in claim 5, it is characterised in that institute in step (1) The step of stating the initialization operation for completing data buffer storage unit includes:To gradientGauss-Newton matrix Gf, damping system Number λ and damping functionCarry out zero-setting operation.
  7. 7. as claimed in claim 5 perform Hessian-Free training algorithms method, it is characterised in that in step (3) when When carrying out RNN training,
    Damping function
    Wherein S and f is distance function, GSIt is S in θnThe Gauss-Newton matrix at place, μ are a positive numbers predefined.
  8. 8. the method for Hessian-Free training algorithms is performed as claimed in claim 5, it is characterised in that step (4) is described The conjugate gradient method for carrying out fore condition seeks δnSo thatIn the step of reaching minimum value, in the conjugation ladder for carrying out having fore condition During degree method, " mini-batch " rather than all samples are only used, and the fortune for the Gauss-Newton Matrix Multiplication vector being directed to Calculation all passes throughDo implicit approximate calculation.
  9. A kind of 9. method of execution Hessian-Free training algorithms, it is characterised in that comprise the following steps:
    Step S1, an I/O instruction is pre-deposited at the first address of instruction cache unit.
    Step S2, computing start, and controller unit reads this I/O instruction from the first address of instruction cache unit, according to what is translated Microcommand, direct memory access unit read all fingers relevant with Hessian-Free calculating from external address space Order, and buffered into instruction cache unit;
    Step S3, controller unit reads in an I/O instruction from instruction cache unit, and according to the microcommand translated, direct internal memory is visited Ask that unit reads initial parameter vector θ to be updated from exterior space0Into data processing module;
    Step S4, controller unit reads in assignment directive from instruction cache unit, according to the microcommand translated, data buffer storage unit InInitialize, the iterations n in data processing unit is arranged to 0;Wherein, θ is parameter vector to be updated, θnFor N-th parameter vector to be updated, f (θ) are error function, i.e., the function that the actual value of weighing result deviates with predicted value;δnIt is Renewal vector, and θn+1nnEstimate for f (θ) second order;
    Step S5, controller unit reads in an I/O instruction from instruction cache unit, and according to the microcommand translated, direct internal memory is visited Ask that unit reads parameter vector θ to be updated from exterior spacenIt is passed in data processing module;
    Step S6, controller unit read in one from instruction cache unit and error function are carried out near parameter current vector value The instruction of second order estimation, according to the microcommand translated, ask f (θ) in θnNeighbouring second order estimationOperation;The operation In, instruction is sent to operation control submodule, and operation control submodule sends command adapted thereto and carries out following operate:Utilize Gradient algorithm submodule calculatesUtilize the matrix in Gauss-Newton computing submodule and basic operation submodule Multiplying, f is obtained in θnThe Gauss-Newton matrix G at placef;Utilize damping term computing submodule and basic operation submodule Block performs LM heuristics and obtains damped coefficient λ, and then obtains damping termFinally, pass throughObtainExpression formula deposit data delay Memory cell;Wherein, damping functionIt is the function predefined according to training pattern in θnThe value at place;
    Step S7, controller unit read a data transmission instruction from instruction cache unit, will according to the microcommand translatedIt is sent to from data buffer storage unit in data processing unit;
    Step S8, controller unit read parameter renewal operational order from instruction cache unit, according to the microcommand translated, Carry out seeking δ with the conjugate gradient method for having fore conditionnSo thatReach minimum value, and θnIt is updated to θn+1Operation;In directly Deposit access unit and read parameter vector θ to be updated from exterior spacenIt is passed in data processing module;Operation control submodule control Related operation module processed proceeds as follows:Renewal vector is obtained using conjugate gradient computing submodule and basic operation submodule δn;Finally, using the vectorial addition in basic operation submodule by θnIt is updated to θn+1
    Step S9, controller unit read an I/O instruction, according to the microcommand translated, the ginseng after renewal from instruction cache unit Number vector θn+1Outside designated space is sent to from data processing unit by direct memory access unit;
    Step S10, controller unit reads a convergence decision instruction from instruction cache unit, according to the microcommand translated, number Judge the parameter vector θ after renewal according to processing unitn+1Whether restrain:If convergence, computing terminate;Otherwise, iterations n value Increase by 1, go back to and perform step S5.
  10. 10. a kind of device of execution Hessian-Free training algorithms, execution such as right are solidified with the controller of described device It is required that the program of the method for execution Hessian-Free training algorithms described in 5 to 9 any one.
CN201610283885.XA 2016-04-29 2016-04-29 Device and method for executing Hessian-Free training algorithm Active CN107341540B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610283885.XA CN107341540B (en) 2016-04-29 2016-04-29 Device and method for executing Hessian-Free training algorithm
PCT/CN2016/081842 WO2017185413A1 (en) 2016-04-29 2016-05-12 Device and method for executing hessian-free training algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610283885.XA CN107341540B (en) 2016-04-29 2016-04-29 Device and method for executing Hessian-Free training algorithm

Publications (2)

Publication Number Publication Date
CN107341540A true CN107341540A (en) 2017-11-10
CN107341540B CN107341540B (en) 2021-07-20

Family

ID=60160584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610283885.XA Active CN107341540B (en) 2016-04-29 2016-04-29 Device and method for executing Hessian-Free training algorithm

Country Status (2)

Country Link
CN (1) CN107341540B (en)
WO (1) WO2017185413A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626434A (en) * 2020-05-15 2020-09-04 浪潮电子信息产业股份有限公司 Distributed training parameter updating method, device, equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934208A (en) * 2019-04-22 2019-06-25 江苏邦融微电子有限公司 A kind of hardware-accelerated system and method for fingerprint recognition

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1658550A (en) * 2004-04-16 2005-08-24 威盛电子股份有限公司 Apparatus and method for performing cipher operation
CN1834898A (en) * 2005-05-16 2006-09-20 威盛电子股份有限公司 Microprocessor apparatus and method for modular exponentiation
CN102156637A (en) * 2011-05-04 2011-08-17 中国人民解放军国防科学技术大学 Vector crossing multithread processing method and vector crossing multithread microprocessor
US20140067738A1 (en) * 2012-08-28 2014-03-06 International Business Machines Corporation Training Deep Neural Network Acoustic Models Using Distributed Hessian-Free Optimization
US20150161987A1 (en) * 2013-12-06 2015-06-11 International Business Machines Corporation Systems and methods for accelerating hessian-free optimization for deep neural networks by implicit preconditioning and sampling
WO2016037351A1 (en) * 2014-09-12 2016-03-17 Microsoft Corporation Computing system for training neural networks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1658550A (en) * 2004-04-16 2005-08-24 威盛电子股份有限公司 Apparatus and method for performing cipher operation
CN1834898A (en) * 2005-05-16 2006-09-20 威盛电子股份有限公司 Microprocessor apparatus and method for modular exponentiation
CN102156637A (en) * 2011-05-04 2011-08-17 中国人民解放军国防科学技术大学 Vector crossing multithread processing method and vector crossing multithread microprocessor
US20140067738A1 (en) * 2012-08-28 2014-03-06 International Business Machines Corporation Training Deep Neural Network Acoustic Models Using Distributed Hessian-Free Optimization
US20150161987A1 (en) * 2013-12-06 2015-06-11 International Business Machines Corporation Systems and methods for accelerating hessian-free optimization for deep neural networks by implicit preconditioning and sampling
WO2016037351A1 (en) * 2014-09-12 2016-03-17 Microsoft Corporation Computing system for training neural networks

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JAMES MARTENS: "Deep learning via Hessian-free optimization", 《PROCEEDINGS OF THE 27TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING》 *
JAMES MARTENS等: "Training Deep and Recurrent Networks with Hessian-Free Optimization", 《NEURAL NETWORKS: TRICKS OFTHE TRADE》 *
RYAN KIROS: "Training Neural Networks with Stochastic Hessian-Free Optimization", 《ARXIV:1301.3641V3 [CS.LG]》 *
YUNJI CHEN等: "DaDianNao: A Machine-Learning Supercomputer", 《2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626434A (en) * 2020-05-15 2020-09-04 浪潮电子信息产业股份有限公司 Distributed training parameter updating method, device, equipment and storage medium
CN111626434B (en) * 2020-05-15 2022-06-07 浪潮电子信息产业股份有限公司 Distributed training parameter updating method, device, equipment and storage medium

Also Published As

Publication number Publication date
WO2017185413A1 (en) 2017-11-02
CN107341540B (en) 2021-07-20

Similar Documents

Publication Publication Date Title
CN108197705A (en) Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium
CN108241890B (en) Reconfigurable neural network acceleration method and architecture
CN109886588B (en) Method for solving flexible job shop scheduling based on improved whale algorithm
CN108205702B (en) Parallel processing method for multi-input multi-output matrix convolution
CN107341542B (en) Apparatus and method for performing recurrent neural networks and LSTM operations
CN107578098A (en) Neural network processor based on systolic arrays
CN112101530B (en) Neural network training method, device, equipment and storage medium
CN107832844A (en) A kind of information processing method and Related product
CN110326003A (en) The hardware node with location-dependent query memory for Processing with Neural Network
CN106991478A (en) Apparatus and method for performing artificial neural network reverse train
Meng et al. Accelerating proximal policy optimization on cpu-fpga heterogeneous platforms
CN107688854A (en) A kind of arithmetic element, method and device that can support different bit wide operational datas
CN106991476A (en) Apparatus and method for performing artificial neural network forward operation
CN107818367A (en) Processing system and processing method for neutral net
CN108763159A (en) To arithmetic accelerator before a kind of LSTM based on FPGA
CN107341132A (en) It is a kind of to be used to perform the apparatus and method that AdaGrad gradients decline training algorithm
CN110135584A (en) Extensive Symbolic Regression method and system based on self-adaptive parallel genetic algorithm
CN108334944B (en) Artificial neural network operation device and method
CN105808309A (en) High-performance realization method of BLAS (Basic Linear Algebra Subprograms) three-level function GEMM on the basis of SW platform
CN107341540A (en) A kind of apparatus and method for performing Hessian-Free training algorithms
CN107957977A (en) A kind of computational methods and Related product
CN113806261A (en) Pooling vectorization implementation method for vector processor
CN113469354A (en) Memory-constrained neural network training
CN111860814B (en) Apparatus and method for performing batch normalization operations
Li et al. An experimental study on deep learning based on different hardware configurations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Applicant after: Zhongke Cambrian Technology Co., Ltd

Address before: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Applicant before: Beijing Zhongke Cambrian Technology Co., Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant