WO2017185257A1 - Dispositif et procédé d'exécution d'un algorithme d'apprentissage de descente de gradient adam - Google Patents
Dispositif et procédé d'exécution d'un algorithme d'apprentissage de descente de gradient adam Download PDFInfo
- Publication number
- WO2017185257A1 WO2017185257A1 PCT/CN2016/080357 CN2016080357W WO2017185257A1 WO 2017185257 A1 WO2017185257 A1 WO 2017185257A1 CN 2016080357 W CN2016080357 W CN 2016080357W WO 2017185257 A1 WO2017185257 A1 WO 2017185257A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vector
- instruction
- module
- moment
- sub
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the present invention relates to the field of Adam algorithm application technology, and in particular to an apparatus and method for performing an Adam gradient descent training algorithm, and relates to a hardware implementation of an Adam gradient descent optimization algorithm.
- the gradient descent optimization algorithm is widely used in the fields of function approximation, optimization calculation, pattern recognition and image processing.
- Adam algorithm is one of the gradient descent optimization algorithms. It is easy to implement, small in computation, small in required storage space and gradient. Features such as symmetry transformation invariance are widely used, and the implementation of the Adam algorithm using a dedicated device can significantly increase the speed of its execution.
- one known method of performing the Adam gradient descent algorithm is to use a general purpose processor.
- the method supports the above algorithm by executing general purpose instructions using a general purpose register file and generic functions.
- One of the disadvantages of this approach is that the performance of a single general purpose processor is low.
- communication between general-purpose processors becomes a performance bottleneck.
- the general-purpose processor needs to decode the correlation operation corresponding to the Adam gradient descent algorithm into a long-column operation and a fetch instruction sequence, and the processor front-end decoding brings a large power consumption overhead.
- Another known method of performing the Adam gradient descent algorithm is to use a graphics processing unit (GPU).
- the method supports the above algorithm by executing a general single instruction multiple data stream (SIMD) instruction using a general purpose register file and a general stream processing unit.
- SIMD general single instruction multiple data stream
- the GPU is a device dedicated to performing graphics image operations and scientific calculations, without the special support for the Adam gradient descent algorithm related operations, a large amount of front-end decoding work is still required to perform the correlation operations in the Adam gradient descent algorithm. A lot of extra overhead.
- the GPU has only a small on-chip buffer, and the data required in the operation (such as first-order moment vector and second-order moment vector) needs to be repeatedly transferred from off-chip. The off-chip bandwidth becomes the main performance bottleneck, and brings great work. Cost.
- the main object of the present invention is to provide an apparatus and method for performing an Adam gradient descent training algorithm, which solves the problem that the general-purpose processor of the data has insufficient performance, and the decoding cost of the previous stage is large, and avoids repeated memory. Read data and reduce the bandwidth of memory access.
- the present invention provides an apparatus for performing an Adam gradient descent training algorithm, the apparatus comprising a direct memory access unit 1, an instruction buffer unit 2, a controller unit 3, a data buffer unit 4, and a data processing module 5. ,among them:
- the direct memory access unit 1 is configured to access an external designated space, read and write data to the instruction cache unit 2 and the data processing module 5, and complete loading and storing of the data;
- the instruction cache unit 2 is configured to read the instruction by the direct memory access unit 1 and cache the read instruction;
- the controller unit 3 is configured to read an instruction from the instruction cache unit 2, and decode the read instruction into a micro instruction that controls the behavior of the direct memory access unit 1, the data buffer unit 4, or the data processing module 5.
- a data buffer unit 4 configured to cache each first moment vector and second moment vector during initialization and data update
- the data processing module 5 is configured to update the moment vector, calculate the moment estimation vector, update the vector to be updated, and write the updated moment vector to the data buffer unit 4, and pass the updated vector to be updated through the direct memory access unit 1 Write to the external specified space.
- the direct memory access unit 1 writes an instruction from the external designated space to the instruction cache unit 2, reads the parameter to be updated and the corresponding gradient value from the external designated space to the data processing module 5, and updates the updated
- the parameter vector is directly written from the data processing module 5 to the external designated space.
- the controller unit 3 decodes the read instruction into a micro-instruction that controls the behavior of the direct memory access unit 1, the data buffer unit 4 or the data processing module 5, and controls the direct memory access unit 1 to be externally specified.
- the address reads the data and writes the data to the external designated address
- the control data buffer unit 4 obtains an instruction required for the operation from the external designated address through the direct memory access unit 1, and controls the data processing module 5 to perform the update operation of the parameter to be updated.
- the control data buffer unit 4 and the data processing module 5 perform data transmission.
- the data buffer unit 4 initializes a first moment vector m t and a second moment vector v t at initialization, and first moment vector m t-1 and second moment vector v in each data update process.
- the t-1 is read out and sent to the data processing module 5, updated in the data processing module 5 as a first moment vector m t and a second moment vector v t , and then written into the data buffer unit 4.
- the data buffer unit 4 always stores a copy of the first moment vector m t and the second moment vector v t .
- the data processing module 5 reads the moment vectors m t-1 , v t-1 from the data buffer unit 4, and reads the vector to be updated ⁇ t-1 from the external designated space through the direct memory access unit 1.
- the vector to be updated ⁇ t-1 is updated to ⁇ t
- m t , v t are written into the data buffer unit 4
- ⁇ t is written into the external designated space through the direct memory access unit 1.
- the data processing module 5 updates the moment vectors m t-1 , v t-1 to m t according to the formula.
- the data processing module 5 calculates the moment estimation vector by using m t and v t Is based on the formula Implemented, the data processing module 5 to be updated vector ⁇ t-1 is updated according to the formula ⁇ t Realized.
- the data processing module 5 includes an operation control sub-module 51, a vector addition parallel operation sub-module 52, a vector multiplication parallel operation sub-module 53, a vector division parallel operation sub-module 54, a vector square root parallel operation sub-module 55, and a basic The operation sub-module 56, wherein the vector addition parallel operation sub-module 52, the vector multiplication parallel operation sub-module 53, the vector division parallel operation sub-module 54, the vector square root parallel operation sub-module 55, and the basic operation sub-module 56 are connected in parallel, and the operation control sub-module is connected.
- vector operations 51 and vector addition parallel operation sub-module 52, vector multiplication parallel operation sub-module 53, vector division parallel operation sub-module 54, vector The square root parallel operation sub-module 55 and the basic operation sub-module 56 are connected in series.
- the vector operations are element-wise operations, and the same vector performs parallel operations on the same vector when performing certain operations.
- the present invention also provides a method for performing an Adam gradient descent training algorithm, the method comprising:
- the first moment vector m Q , the second moment vector v Q , the exponential decay rate ⁇ 1 , ⁇ 2 , and the learning step length ⁇ are initialized, and the vector to be updated ⁇ O is obtained from an external designated space, including:
- step S1 an INSTRUCTION_IO instruction is pre-stored at the first address of the instruction cache unit 2, and the INSTRUCTION_IO instruction is used to drive the direct memory unit 1 to read all instructions related to the Adam gradient descent calculation from the external address space.
- Step S2 the operation starts, the controller unit 3 reads the INSTRUCTION_IO instruction from the first address of the instruction cache unit 2, and drives the direct memory access unit 1 to read from the external address space according to the translated microinstruction, which is related to the Adam gradient descent calculation. All instructions, and cache these instructions into the instruction cache unit 2;
- step S3 the controller unit 3 reads a HYPERPARAMETER_IO instruction from the instruction cache unit 2, and drives the direct memory access unit 1 to read the global update step size ⁇ , the exponential decay rate ⁇ 1 from the external designated space according to the translated microinstruction. ⁇ 2 , convergence threshold ct, and then sent to the data processing module 5;
- step S4 the controller unit 3 reads the assignment instruction from the instruction cache unit 2, and according to the translated microinstruction, drives the first moment vectors m t-1 and v t-1 in the data buffer unit 4 to initialize and drive.
- the number of iterations t in the data processing unit 5 is set to 1;
- step S5 the controller unit 3 reads a DATA_IO instruction from the instruction cache unit 2, and drives the direct memory access unit 1 to read the parameter vector ⁇ t-1 to be updated and the corresponding gradient from the external designated space according to the translated microinstruction. vector Then sent to the data processing module 5;
- step S6 the controller unit 3 reads a data transfer instruction from the instruction cache unit 2, and according to the translated microinstruction, the first moment vector m t-1 and the second moment vector v t- in the data buffer unit 4. 1 is transferred to the data processing unit 5.
- the implementation specifically includes: the controller unit 3 reads a moment vector update instruction from the instruction cache unit 2, and drives the data buffer unit 4 to perform the first moment vector m t-1 and the second moment vector according to the translated micro instruction.
- the update operation of v t-1 in which the moment vector update instruction is sent to the operation control sub-module 51 , and the operation control sub-module 51 sends the corresponding instruction to perform the following operations: sending the INS_1 instruction to the basic operation sub-module 56, driving The basic operation sub-module 56 calculates (1- ⁇ 1 ) and (1- ⁇ 2 ); sends an INS_2 instruction to the vector multiplication parallel operation sub-module 53, and the drive vector multiplication parallel operation sub-module 53 calculates Then, the INS_3 instruction is sent to the vector multiplication parallel operation sub-module 53, and the drive vector multiplication parallel operation sub-module 53 is simultaneously calculated.
- the controller unit 3 reads a data transmission instruction from the instruction cache unit 2, according to the translated micro instruction.
- the updated first moment vector m t and second moment vector v t are transferred from the data processing unit 5 to the data buffer unit 4.
- the moment vector estimation is obtained by the moment vector operation with Is based on the formula
- the implementation includes: the controller unit 3 reads a moment estimation vector operation instruction from the instruction cache unit 2, and drives the operation control sub-module 51 to calculate the moment estimation vector according to the translated micro-instruction, and the operation control sub-module 51 sends The corresponding instruction performs the following operations: the arithmetic control sub-module 51 sends the command INS_4 to the basic operation sub-module 56, and drives the basic operation sub-module 56 to calculate with The iteration number t is incremented by 1, the operation control sub-module 51 sends the instruction INS_5 to the vector multiplication parallel operation sub-module 53, and the drive vector multiplication parallel operation sub-module 53 calculates the first-order moment vector m t and Second moment vector v t and Biased estimate vector with
- the update to be updated vector ⁇ t-1 is ⁇ t according to the formula
- the implementation specifically includes: the controller unit 3 reads a parameter vector update instruction from the instruction cache unit 2, and drives the operation control sub-module 51 to perform the following operation according to the translated micro-instruction: the operation control sub-module 51 sends the instruction INS_6 to The basic operation sub-module 56 drives the basic operation sub-module 56 to calculate - ⁇ ; the operation control sub-module 51 sends the instruction INS_7 to the vector square root parallel operation sub-module 55, and drives the operation thereof. The operation control sub-module 51 sends the instruction INS_7 to the vector division parallel operation sub-module 54 to drive the operation thereof.
- the operation control sub-module 51 sends the instruction INS_8 to the vector multiplication parallel operation sub-module 53 to drive the operation thereof.
- the operation control sub-module 51 sends the instruction INS_9 to the vector addition parallel operation sub-module 52 to drive its calculation.
- the controller unit 3 further reads a DATABACK_IO instruction from the instruction cache unit 2, and updates the parameter vector according to the translated micro instruction.
- ⁇ t is transferred from the data processing unit 5 to the external designated space through the direct memory access unit 1.
- the step of repeating the process until the vector to be updated converges includes determining whether the vector to be updated converges.
- the specific determination process is as follows: the controller unit 3 reads a convergence judgment instruction from the instruction cache unit 2, according to the translation. The micro-instruction, the data processing module 5 determines whether the updated parameter vector converges, and if temp2 ⁇ ct, it converges, and the operation ends.
- the apparatus and method for performing the Adam gradient descent training algorithm provided by the present invention can solve the problem of insufficient performance of the general-purpose processor of the data by using a device specially used for executing the Adam gradient descent training algorithm, and the decoding cost of the previous segment is large. Problem, speed up the execution of related applications.
- the apparatus and method for performing the Adam gradient descent training algorithm avoids repeatedly reading data into the memory by using the moment vector required for the intermediate process of the data buffer unit, thereby reducing the device and the external address.
- the IO operation between the spaces reduces the bandwidth of memory access.
- the apparatus and method for performing the Adam gradient descent training algorithm provided by the present invention, because the data processing module uses the related parallel operation sub-module to perform vector operations, the degree of parallelism of the operation is high, so the working frequency is low, so that the work is low. The cost is small.
- FIG. 1 shows an example block diagram of the overall structure of an apparatus for performing an Adam gradient descent training algorithm in accordance with an embodiment of the present invention.
- FIG. 2 shows an example block diagram of a data processing module in an apparatus for performing an Adam gradient descent training algorithm in accordance with an embodiment of the present invention.
- FIG. 3 shows a flow chart of a method for performing an Adam gradient descent training algorithm in accordance with an embodiment of the present invention.
- the first moment vector m O , the second moment vector v Q , the exponential decay rate ⁇ 1 , ⁇ 2 , and the learning step length ⁇ are initialized, and the vector to be updated ⁇ Q is obtained from the external designated space;
- the exponential decay rate updates the first moment vector m t-1 and the second moment vector v t-1 , ie
- ⁇ t-1 is the value before ⁇ 0 is not updated at the t-th cycle
- the t-th cycle updates ⁇ t-1 to ⁇ t . This process is repeated until the vector to be updated converges.
- the device includes a direct memory access unit 1, an instruction cache unit 2, a controller unit 3, a data buffer unit 4, and a data processing module 5, all of which can be implemented by hardware circuits.
- the direct memory access unit 1 is configured to access an external designated space, read and write data to the instruction cache unit 2 and the data processing module 5, and complete loading and storing of data. Specifically, an instruction is written from the external designated space to the instruction cache unit 2, the parameter to be updated and the corresponding gradient value are read from the external designated space to the data processing module 5, and the updated parameter vector is directly written from the data processing module 5. Externally specified space.
- the instruction cache unit 2 is configured to read an instruction through the direct memory access unit 1 and cache the read instruction.
- the controller unit 3 is configured to read an instruction from the instruction cache unit 2, and decode the read instruction into a micro instruction that controls the behavior of the direct memory access unit 1, the data buffer unit 4, or the data processing module 5, and each micro
- the instruction is sent to the direct memory access unit 1, the data buffer unit 4 or the data processing module 5, and controls the direct memory access unit 1 to read data from the external designated address and write the data to the external designated address, and control the data cache unit 3 to access through the direct memory.
- the unit 1 acquires an instruction required for an operation from an external designated address, controls the data processing module 5 to perform an update operation of the parameter to be updated, and controls the data buffer unit 4 to perform data transmission with the data processing module 5.
- the data buffer unit 4 is configured to buffer each first moment vector and second moment vector in the initialization and data update process; specifically, the data buffer unit 4 initializes the first moment vector m t and the second moment vector v during initialization. t , in each data update process, the data buffer unit 4 reads out and sends the first moment vector m t-1 and the second moment vector v t-1 to the data processing module 5, and updates it in the data processing module 5 to The first moment vector m t and the second moment vector v t are then written to the data buffer unit 4. During the operation of the device, the data buffer unit 4 always stores a copy of the first moment vector m t and the second moment vector v t . In the present invention, since the moment vector required for the intermediate process of the data buffer unit is temporarily used, the data is repeatedly read into the memory, the IO operation between the device and the external address space is reduced, and the bandwidth of the memory access is reduced.
- the data processing module 5 is configured to update the moment vector, calculate the moment estimation vector, update the vector to be updated, and write the updated moment vector to the data buffer unit 4, and pass the updated vector to be updated through the direct memory access unit 1 Write to the external designated space; specifically, the data processing module 5 reads the moment vectors m t-1 , v t-1 from the data buffer unit 4, and reads from the external designated space through the direct memory access unit 1 Update vector ⁇ t-1 , gradient vector Updating the step size ⁇ and the exponential decay rates ⁇ 1 and ⁇ 2 ; then updating the moment vectors m t-1 , v t-1 to m t , v t , ie Calculate the moment estimation vector by m t , v t which is Finally, the vector ⁇ t-1 to be updated is updated to ⁇ t , ie The m t and v t are written into the data buffer unit 4, and ⁇ t is written into the external designated space through the direct memory access unit 1.
- the data processing module 5 includes an operation control sub-module 51, a vector addition parallel operation sub-module 52, a vector multiplication parallel operation sub-module 53, a vector division parallel operation sub-module 54, a vector square root parallel operation sub-module 55, and a basic The operation sub-module 56, wherein the vector addition parallel operation sub-module 52, the vector multiplication parallel operation sub-module 53, the vector division parallel operation sub-module 54, the vector square root parallel operation sub-module 55, and the basic operation sub-module 56 are connected in parallel, and the operation control sub-module is connected.
- the vector operations are element-wise operations, and the same vector performs parallel operations on the same vector when performing certain operations.
- FIG. 3 illustrates an algorithm for performing Adam gradient descent training according to an embodiment of the present invention.
- the flow chart of the method specifically includes the following steps:
- step S1 an instruction prefetch instruction (INSTRUCTION_IO) is pre-stored at the first address of the instruction cache unit 2, and the INSTRUCTION_IO instruction is used to drive the direct memory unit 1 to read all instructions related to the Adam gradient descent calculation from the external address space.
- INSTRUCTION_IO instruction prefetch instruction
- Step S2 the operation starts, the controller unit 3 reads the INSTRUCTION_IO instruction from the first address of the instruction cache unit 2, and drives the direct memory access unit 1 to read from the external address space according to the translated microinstruction, which is related to the Adam gradient descent calculation. All instructions, and cache these instructions into the instruction cache unit 2;
- step S3 the controller unit 3 reads a hyperparametric read instruction (HYPERPARAMETER_IO) from the instruction cache unit 2, and drives the direct memory access unit 1 to read the global update step size ⁇ , index from the external designated space according to the translated microinstruction.
- step S4 the controller unit 3 reads the assignment instruction from the instruction cache unit 2, and initializes the first-order moment vectors m t-1 and v t-1 in the data buffer unit 4 according to the translated micro-instruction, and drives the data.
- the number of iterations t in the processing unit 5 is set to 1;
- step S5 the controller unit 3 reads a parameter read instruction (DATA_IO) from the instruction cache unit 2, and drives the direct memory access unit 1 to read the parameter vector to be updated ⁇ t-1 from the external designated space according to the translated micro instruction. And the corresponding gradient vector Then sent to the data processing module 5;
- DATA_IO parameter read instruction
- step S6 the controller unit 3 reads a data transfer instruction from the instruction cache unit 2, and according to the translated microinstruction, the first moment vector m t-1 and the second moment vector v t-1 in the data buffer unit 4. Transfer to the data processing unit 5.
- step S7 the controller unit 3 reads a moment vector update instruction from the instruction buffer unit 2, and drives the data buffer unit 4 to perform a first moment vector m t-1 and a second moment vector v t- according to the translated microinstruction. 1 update operation.
- the moment vector update instruction is sent to the operation control sub-module 51, and the operation control sub-module 51 sends the corresponding instruction to perform the following operations: sending the operation instruction 1 (INS_1) to the basic operation sub-module 56, and driving the basic operation sub-module 56 calculates (1- ⁇ 1 ) and (1 - ⁇ 2 ); sends an operation instruction 2 (INS_2) to a vector multiplication parallel operation submodule 53, and the drive vector multiplication parallel operation sub-module 53 calculates Then, the operation instruction 3 (INS_3) is sent to the vector multiplication parallel operation sub-module 53, and the drive vector multiplication parallel operation sub-module 53 simultaneously calculates ⁇ 1 m t-1 , 2 2 v t-1 and The results are denoted as a 1 , a 2 , b 1 and b 2 respectively ; then, a 1 and a 2 , b 1 and b 2 are respectively taken as two inputs and sent to the vector addition parallel operation sub-module 52 to obtain an
- step S8 the controller unit 3 reads a data transmission instruction from the instruction buffer unit 2, and transmits the updated first-order moment vector m t and second-order moment vector v t from the data processing unit 5 according to the translated micro-instruction.
- the data buffer unit 4 In the data buffer unit 4.
- step S9 the controller unit 3 reads a moment estimation vector operation instruction from the instruction buffer unit 2, and drives the operation control sub-module 51 to calculate the moment estimation vector according to the translated micro-instruction, and the operation control sub-module 51 sends the corresponding instruction.
- the operation operation sub-module 51 sends the operation instruction 4 (INS_4) to the basic operation sub-module 56, and drives the basic operation sub-module 56 to calculate with The iteration number t is incremented by 1, and the operation control sub-module 51 sends an operation instruction 5 (INS_5) to the vector multiplication parallel operation sub-module 53, and the drive vector multiplication parallel operation sub-module 53 calculates the first-order moment vector m t and Second moment vector v t and Biased estimate vector with
- INS_4 operation instruction 4
- INS_5 operation instruction 5
- step S10 the controller unit 3 reads a parameter vector update instruction from the instruction cache unit 2, and drives the operation control sub-module 51 to perform the following operation according to the translated micro-instruction: the operation control sub-module 51 sends the operation instruction 6 (INS_6) To the basic operation sub-module 56, the drive basic operation sub-module 56 calculates - ⁇ ; the operation control sub-module 51 sends the operation instruction 7 (INS_7) to the vector square root parallel operation sub-module 55, and drives the operation thereof. The operation control sub-module 51 sends the operation instruction 7 (INS_7) to the vector division parallel operation sub-module 54 to drive the operation thereof.
- the operation control sub-module 51 sends the operation instruction 8 (INS_8) to the vector multiplication parallel operation sub-module 53, and drives the operation thereof to obtain
- the operation control sub-module 51 sends the operation instruction 9 (INS_9) to the vector addition parallel operation sub-module 52 to drive the calculation thereof.
- the updated parameter vector ⁇ t is obtained ; wherein ⁇ t-1 is the value before ⁇ 0 is not updated at the tth cycle, and the tth cycle updates ⁇ t-1 to ⁇ t ; the arithmetic control sub-module 51 transmits The operation instruction 10 (INS_10) to the vector division parallel operation sub-module 54 drives the operation to obtain a vector
- step S11 the controller unit 3 reads a to-be-updated write-back instruction (DATABACK_IO) from the instruction cache unit 2, and passes the updated parameter vector ⁇ t from the data processing unit 5 through the direct memory access unit according to the translated micro-instruction. 1 Transfer to the external designated space.
- DATABACK_IO to-be-updated write-back instruction
- step S12 the controller unit 3 reads a convergence judgment instruction from the instruction buffer unit 2, and according to the translated microinstruction, the data processing module 5 determines whether the updated parameter vector converges, and if temp2 ⁇ ct, the convergence is performed, and the operation ends. Otherwise, go to step S5 to continue execution.
- the invention can solve the problem that the general processor of the data has insufficient performance and the decoding cost of the previous segment is large, and the execution speed of the related application is accelerated.
- the application of the data cache unit avoids repeatedly reading data into the memory and reducing the bandwidth of the memory access.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
L'invention porte également sur un dispositif et sur un procédé permettant d'exécuter un algorithme d'apprentissage de descente de gradient Adam, le dispositif comprenant une unité d'accès direct à la mémoire (1), une unité de mémoire cache d'instructions (2), une unité de commande (3), une unité de mémoire cache de données (4) et un module de traitement de données (5). Le procédé consiste : premièrement, à lire un vecteur de gradient et un vecteur de valeur à mettre à jour, et à initialiser un vecteur de premier ordre, un vecteur de second ordre et un taux de décroissance exponentielle correspondant ; à chaque itération, à mettre à jour le vecteur de premier ordre et le vecteur de second ordre à l'aide du vecteur de gradient, et à calculer un vecteur d'estimation polarisé de premier ordre et un vecteur d'estimation polarisé de second ordre ; à mettre à jour des paramètres à mettre à jour à l'aide du vecteur d'estimation polarisé de premier ordre et du vecteur d'estimation polarisé de second ordre ; et à poursuivre l'apprentissage jusqu'à ce que le vecteur des paramètres à mettre à jour soit convergé. La présente invention permet d'obtenir l'application de l'algorithme de descente de gradient Adam et d'améliorer considérablement l'efficacité du traitement des données.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2016/080357 WO2017185257A1 (fr) | 2016-04-27 | 2016-04-27 | Dispositif et procédé d'exécution d'un algorithme d'apprentissage de descente de gradient adam |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2016/080357 WO2017185257A1 (fr) | 2016-04-27 | 2016-04-27 | Dispositif et procédé d'exécution d'un algorithme d'apprentissage de descente de gradient adam |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017185257A1 true WO2017185257A1 (fr) | 2017-11-02 |
Family
ID=60161795
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/080357 WO2017185257A1 (fr) | 2016-04-27 | 2016-04-27 | Dispositif et procédé d'exécution d'un algorithme d'apprentissage de descente de gradient adam |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2017185257A1 (fr) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111931937A (zh) * | 2020-09-30 | 2020-11-13 | 深圳云天励飞技术股份有限公司 | 图像处理模型的梯度更新方法、装置及系统 |
CN112329941A (zh) * | 2020-11-04 | 2021-02-05 | 支付宝(杭州)信息技术有限公司 | 深度学习模型的更新方法及装置 |
CN112580507A (zh) * | 2020-12-18 | 2021-03-30 | 合肥高维数据技术有限公司 | 一种基于图像矩矫正的深度学习文本字符检测方法 |
CN113238975A (zh) * | 2021-06-08 | 2021-08-10 | 中科寒武纪科技股份有限公司 | 优化深度神经网络的参数的内存、集成电路及板卡 |
CN116863492A (zh) * | 2023-09-04 | 2023-10-10 | 山东正禾大教育科技有限公司 | 一种移动数字出版系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130325401A1 (en) * | 2012-05-29 | 2013-12-05 | Xerox Corporation | Adaptive weighted stochastic gradient descent |
CN103956992A (zh) * | 2014-03-26 | 2014-07-30 | 复旦大学 | 一种基于多步梯度下降的自适应信号处理方法 |
CN105184369A (zh) * | 2015-09-08 | 2015-12-23 | 杭州朗和科技有限公司 | 用于深度学习模型的矩阵压缩方法和装置 |
CN105184366A (zh) * | 2015-09-15 | 2015-12-23 | 中国科学院计算技术研究所 | 一种时分复用的通用神经网络处理器 |
-
2016
- 2016-04-27 WO PCT/CN2016/080357 patent/WO2017185257A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130325401A1 (en) * | 2012-05-29 | 2013-12-05 | Xerox Corporation | Adaptive weighted stochastic gradient descent |
CN103956992A (zh) * | 2014-03-26 | 2014-07-30 | 复旦大学 | 一种基于多步梯度下降的自适应信号处理方法 |
CN105184369A (zh) * | 2015-09-08 | 2015-12-23 | 杭州朗和科技有限公司 | 用于深度学习模型的矩阵压缩方法和装置 |
CN105184366A (zh) * | 2015-09-15 | 2015-12-23 | 中国科学院计算技术研究所 | 一种时分复用的通用神经网络处理器 |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111931937A (zh) * | 2020-09-30 | 2020-11-13 | 深圳云天励飞技术股份有限公司 | 图像处理模型的梯度更新方法、装置及系统 |
CN111931937B (zh) * | 2020-09-30 | 2021-01-01 | 深圳云天励飞技术股份有限公司 | 图像处理模型的梯度更新方法、装置及系统 |
CN112329941A (zh) * | 2020-11-04 | 2021-02-05 | 支付宝(杭州)信息技术有限公司 | 深度学习模型的更新方法及装置 |
CN112580507A (zh) * | 2020-12-18 | 2021-03-30 | 合肥高维数据技术有限公司 | 一种基于图像矩矫正的深度学习文本字符检测方法 |
CN112580507B (zh) * | 2020-12-18 | 2024-05-31 | 合肥高维数据技术有限公司 | 一种基于图像矩矫正的深度学习文本字符检测方法 |
CN113238975A (zh) * | 2021-06-08 | 2021-08-10 | 中科寒武纪科技股份有限公司 | 优化深度神经网络的参数的内存、集成电路及板卡 |
CN116863492A (zh) * | 2023-09-04 | 2023-10-10 | 山东正禾大教育科技有限公司 | 一种移动数字出版系统 |
CN116863492B (zh) * | 2023-09-04 | 2023-11-21 | 山东正禾大教育科技有限公司 | 一种移动数字出版系统 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017185257A1 (fr) | Dispositif et procédé d'exécution d'un algorithme d'apprentissage de descente de gradient adam | |
CN111260025B (zh) | 用于执行lstm神经网络运算的装置和运算方法 | |
WO2017185389A1 (fr) | Dispositif et procédé servant à exécuter des opérations de multiplication de matrices | |
WO2017124644A1 (fr) | Dispositif et procédé de codage par compression de réseau neuronal artificiel | |
WO2017124641A1 (fr) | Dispositif et procédé permettant d'exécuter un apprentissage inversé d'un réseau de neurones artificiels | |
CN111860812B (zh) | 一种用于执行卷积神经网络训练的装置和方法 | |
CN111353589B (zh) | 用于执行人工神经网络正向运算的装置和方法 | |
WO2018120016A1 (fr) | Appareil d'exécution d'opération de réseau neuronal lstm, et procédé opérationnel | |
CN110929863B (zh) | 用于执行lstm运算的装置和方法 | |
WO2017185396A1 (fr) | Dispositif et procédé à utiliser lors de l'exécution d'opérations d'addition/de soustraction matricielle | |
WO2017185411A1 (fr) | Appareil et procédé d'exécution d'un algorithme d'apprentissage de descente de gradient adagrad | |
WO2019127838A1 (fr) | Procédé et appareil de réalisation d'un réseau neuronal convolutionnel, terminal et support de stockage | |
WO2017185347A1 (fr) | Appareil et procédé permettant d'exécuter des calculs de réseau neuronal récurrent et de ltsm | |
WO2017185393A1 (fr) | Appareil et procédé d'exécution d'une opération de produit interne de vecteurs | |
US20170185888A1 (en) | Interconnection Scheme for Reconfigurable Neuromorphic Hardware | |
US11436301B2 (en) | Apparatus and methods for vector operations | |
US10831861B2 (en) | Apparatus and methods for vector operations | |
WO2017185336A1 (fr) | Appareil et procédé pour exécuter une opération de regroupement | |
CN109754062B (zh) | 卷积扩展指令的执行方法以及相关产品 | |
EP3561732A1 (fr) | Appareil et procédé de fonctionnement pour un réseau neuronal artificiel | |
WO2017185413A1 (fr) | Dispositif et procédé permettant d'exécuter un algorithme d'apprentissage hessian-free | |
WO2017172174A1 (fr) | Apprentissage événementiel et modulation de récompense avec plasticité dépendant du minutage de pointe dans des ordinateurs neuromorphiques | |
WO2017185248A1 (fr) | Appareil et procédé permettant d'effectuer une opération d'apprentissage automatique de réseau neuronal artificiel | |
WO2017185256A1 (fr) | Appareil et procédé d'exécution d'algorithme de descente de gradient rmsprop | |
CN107315570B (zh) | 一种用于执行Adam梯度下降训练算法的装置及方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16899770 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16899770 Country of ref document: EP Kind code of ref document: A1 |