WO2017185394A1 - 一种用于执行全连接层神经网络反向训练的装置和方法 - Google Patents
一种用于执行全连接层神经网络反向训练的装置和方法 Download PDFInfo
- Publication number
- WO2017185394A1 WO2017185394A1 PCT/CN2016/081114 CN2016081114W WO2017185394A1 WO 2017185394 A1 WO2017185394 A1 WO 2017185394A1 CN 2016081114 W CN2016081114 W CN 2016081114W WO 2017185394 A1 WO2017185394 A1 WO 2017185394A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- module
- storage unit
- gradient
- weight
- instruction
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
Definitions
- the present invention relates generally to artificial neural networks, and more particularly to an apparatus and method for performing reverse training of an artificial neural network full connection layer.
- the artificial neural network full connection layer is a common type of artificial neural network. Just like the neural network in the brain, the artificial neural network full connection layer is composed of some interconnected nodes, as shown in Figure 1, each circle represents a Neurons, each arrow indicates that the connection between two neurons is also called a weight, and all inputs and outputs are connected to each other.
- x represents all input neurons connected to the output neurons
- w represents the corresponding weight between x and the output neurons
- b is a constant
- f(x) is a nonlinear function, usually called an activation function. Common functions are: Wait.
- One known method of supporting artificial neural network full connection layer reverse training is to use a general purpose processor.
- the method supports the above algorithm by executing general purpose instructions using a general purpose register file and generic functions.
- One of the disadvantages of this method is that the performance of a single general-purpose processor is low, which cannot meet the performance requirements of the normal artificial neural network full-layer layer reverse training.
- communication between general-purpose processors becomes a performance bottleneck.
- the general-purpose processor needs to decode the artificial neural network full-connection layer into a long-column operation and a fetch instruction sequence, and the processor front-end decoding brings a large power consumption overhead.
- An aspect of the present invention provides an apparatus for performing artificial neural network full connection layer reverse training, comprising an instruction storage unit, a controller unit, a data access unit, an interconnection module, a main operation module, and a plurality of slave operation modules. ,among them:
- An instruction storage unit is configured to store an instruction
- the data access unit performs data or instruction read and write operations between the external address space and the device;
- the main operation module uses the output gradient vector of this layer to complete the subsequent calculation in the calculation process of each layer.
- Another aspect of the present invention provides a method of performing a single layer artificial neural network full connection layer reverse training using the above apparatus.
- Another aspect of the present invention provides a method of performing multi-layer artificial neural network full connection layer reverse training using the above apparatus.
- the invention can be applied to the following (including but not limited to) scenarios: data processing, robots, computers, printers, scanners, telephones, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, cloud servers , cameras, camcorders, projectors, watches, headphones, mobile storage, Wearable devices and other electronic products; aircraft, ships, vehicles and other types of vehicles; televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, range hoods and other household appliances; Nuclear magnetic resonance apparatus, B-ultrasound, electrocardiograph and other medical equipment.
- Figure 1 shows a block diagram of an artificial neural network fully connected layer.
- FIG 2 illustrates an example block diagram of an overall structure of an apparatus for performing artificial neural network full connection layer reverse training in accordance with an embodiment of the present invention.
- FIG. 4 is a block diagram showing an example of a main operation module structure in an apparatus for performing artificial neural network full connection layer reverse training according to an embodiment of the present invention.
- FIG. 5 illustrates an example block diagram of a slave arithmetic module structure in an apparatus for performing artificial neural network full connection layer reverse training in accordance with an embodiment of the present invention.
- FIG. 7 shows a flow chart of a single layer artificial neural network full connection layer reverse training in accordance with an embodiment of the present invention.
- the input gradient vector is first weighted and summed to calculate the output gradient vector of the layer.
- the output gradient vector is multiplied by the derivative value of the activation function of the following layer in the forward operation to obtain the input gradient vector of the next layer.
- the input gradient vector is multiplied by the input neuron in the forward operation to obtain the gradient of the layer weight, and then the weight of the layer can be updated according to the obtained gradient of the layer weight.
- the apparatus includes an instruction storage unit 1, a controller unit 2, a data access unit 3, an interconnection module 4, a main operation module 5, and a plurality of slave operation modules 6.
- the instruction storage unit 1, the controller unit 2, the data access unit 3, the interconnection module 4, the main operation module 5, and the slave operation module 6 may all pass through hardware circuits (including but not limited to FPGA, CGRA, application specific integrated circuit ASIC, analog Circuits and memristors, etc.).
- the instruction storage unit 1 reads in an instruction through the data access unit 3 and stores the read instruction.
- the controller unit 2 reads an instruction from the instruction storage unit 1, translates the instruction into a control signal that controls the behavior of other modules, and transmits it to other modules such as the data access unit 3, the main operation module 5, and the slave operation module 6.
- the data access unit 3 can access the external address space, directly read and write data to various storage units inside the device, and complete data loading and storage.
- the interconnect module 4 is used to connect the main operation module and the slave operation module, and can be implemented into different interconnection topologies (such as a tree structure, a ring structure, a grid structure, a hierarchical interconnection, a bus structure, etc.).
- FIG. 3 schematically shows an embodiment of an interconnection module 4: an H-tree structure.
- the interconnection module 4 constitutes a data path between the main operation module 5 and the plurality of slave operation modules 6, and has an H-tree structure.
- the H-tree is a binary tree path composed of multiple nodes. Each node sends the upstream data to the downstream two nodes in the same way, and the data returned by the two downstream nodes are combined and returned to the upstream node. For example, in the artificial neural network full connection layer inverse operation, the vector returned by the two downstream nodes will be added to a vector at the current node and returned to the upstream node.
- the order of calculation started at the full connection layer of each layer of artificial neural network The input gradients in the main operation module 5 are sent to the respective slave operation modules 6 through the interconnection module 4; when the calculation process from the operation module 6 is completed, the output gradient vector portions output from each of the operation modules 6 are mutually
- the module 4 is added step by step, that is, the sum and sum of all the output gradient vectors are used as the final output gradient vector.
- the main operation module 5 includes a first operation unit 51, a first data dependency determination unit 52, and a first neuron storage unit 53.
- the first neuron storage unit 53 is configured to store input data and output data used by the main arithmetic module 5 in the calculation process.
- the first operation unit 51 performs various arithmetic functions of the main operation module, including weight update gradient summation and derivative calculation in the artificial neural network full connection layer reverse training.
- the first data dependency determining unit 52 is a port in which the first arithmetic unit 51 reads and writes the first neuron storage unit 53, and can ensure that there is no consistency conflict between the reading and writing of data in the neuron storage unit 53.
- the first data dependency determining unit 52 is also responsible for transmitting the input gradient vector read from the first neuron storage unit 53 to the slave arithmetic module 6 through the interconnect module 4, and the output data from the arithmetic module 6 is interconnected.
- Module 4 is sent directly to arithmetic unit 51.
- the command output from the controller unit 2 is sent to the first arithmetic unit 51 and the first dependency determining unit 52 to control its behavior.
- FIG. 5 shows an example block diagram of the structure of the slave arithmetic module 6 in an apparatus for performing artificial neural network full-layer layer reverse training in accordance with an embodiment of the present invention.
- each slave operation module 6 includes a second operation unit 61, a second data dependency determination unit 62, a second neuron storage unit 63, a weight storage unit 64, and a weight update gradient storage unit 65.
- the second arithmetic unit 61 receives the control signal issued by the controller unit 2 and performs an arithmetic logic operation.
- the second data dependency determining unit 62 is responsible for the second neuron storage unit 63 in the calculation process. Read and write operations. The second data dependency determining unit 62 ensures that there is no consistency conflict between the reading and writing of the second neuron storage unit. Specifically, the second data dependency determining unit 62 determines whether there is a dependency relationship between the control signal that has not been executed and the data of the control signal that is being executed, and if not, allows the control signal to be immediately transmitted, otherwise it is necessary to wait for the control. The control signal is allowed to be transmitted after all control signals on which the signal depends are fully executed.
- all control signals sent to the second data dependency unit 62 are stored in a control signal queue inside the second data dependency unit 62, in which the range of read data of the read control signal is compared with the queue. If the range of the write control signal of the top position conflicts, the control signal must wait until the dependent write control signal is executed before it can be executed.
- the second neuron storage unit 63 stores the scalar data corresponding to the slave arithmetic module 6 in the input gradient vector data and the output gradient vector partial sum calculated by the slave arithmetic module 6.
- the weight storage unit 64 stores the weight data required by the slave computing module 6 in the calculation process.
- Each of the slave arithmetic modules stores a column in the weight matrix corresponding to the scalar data stored by the slave arithmetic module 6.
- the weight update gradient storage unit 65 stores the weight update gradient data corresponding to the slave node from the update weight process. Each of the weight update gradient data stored from the arithmetic module 6 corresponds to its stored weight data.
- Each of the slave computing modules 6 calculates the portion of the output gradient vector and all of the portions and completes the splicing operation in the interconnect module 4 to obtain the final output gradient vector.
- Each slave computing module 6 also calculates a weight gradient based on the output neurons and output gradients of each layer to update the weights stored by the slave arithmetic module 6.
- Forward and reverse training are the two main processes of artificial neural network algorithms. In order to train (update) the weights in the network, the artificial neural network needs to calculate the forward output of the input vector in the network composed of the current weights.
- the output value of each layer in the forward operation is the data existing at the beginning of the reverse operation, and can be stored in the main operation module through the data access unit 3 and sent to the slave operation module through the interconnection module.
- the main operation module 5 performs subsequent calculation based on the output gradient vector, for example, multiplying the output gradient vector by the derivative of the activation function in the forward operation to obtain the input gradient value of the next layer.
- the derivative of the activation function in the forward operation is the data existing at the beginning of the reverse operation, and can be stored in the main arithmetic module through the data memory access unit.
- an instruction set for performing artificial neural network full connection layer reverse training on the aforementioned apparatus includes the CONFIG instruction, the COMPUTE instruction, the IO instruction, the NOP instruction, the JUMP instruction, and the MOVE instruction, where:
- the CONFIG command configures various constants required for current layer calculation before the start of calculation of the full connection layer of each layer of artificial neural network
- the COMPUTE instruction completes the arithmetic logic calculation of the full connection layer of each layer of artificial neural network
- the IO instruction realizes reading input data required for calculation from the external address space and storing the data back to the external space after the calculation is completed;
- the NOP instruction is responsible for clearing the control signals in all control signal queues of the current device, and ensuring that all control signals before the NOP instruction are executed.
- the NOP instruction itself does not contain any operations;
- the JUMP instruction is responsible for the jump of the next instruction address that the controller will read from the instruction storage unit, and is used to implement the jump of the control flow;
- the MOVE instruction is responsible for copying data of an address in the internal address space of the device to another address in the internal address space of the device.
- the process is independent of the operation unit and does not occupy the resources of the operation unit during execution.
- the output gradient vector of the previous layer in Figure 6 is input gradien t[input gradient0, input gradient1, inputgradient 2, input gradient3] multiplied by the corresponding activation function derivative [f'(out0), f'(out1), f'(out2 ), f'(out3)] get the input of this layer
- the gradient vector in_gradient is multiplied by the weight matrix w[w00, w10, w20, w30] to obtain an output gradient vector [output gradient0, output gradient1, output gradient 2, output gradient3].
- the i-th slave computing module calculates the product of the i-th scalar in the input gradient vector and the column vector [w_i0,...,w_iN] in the weight matrix, and the resulting output vector is spliced in the interconnect module 4.
- the final output gradient vector output gradient is obtained ([output gradient0,...,output gradient3] in Fig. 6).
- the process of calculating the weight update gradient is as follows. Let x be the output neuron vector of the nth layer, and the main module 5 transmits the output neurons to the respective slave arithmetic modules through the interconnect module 4. The main arithmetic module 5 also transmits the output gradient vector in_gradient of the n+1th layer to the respective slave arithmetic modules 6 through the interconnect module 4. Each slave operation module 6 multiplies the scalar data corresponding to the slave operation module 6 in the output gradient vector in_gradient by the output neuron vector x to obtain the original weight update gradient vector dw_original of the nth layer of the slave operation module.
- each slave arithmetic module stores column vectors corresponding to the slave arithmetic module in w, dw, and dw'.
- FIG. 7 is a flow chart showing a single layer artificial neural network full connection layer reverse training in accordance with one embodiment.
- the flowchart depicts the process of implementing a single layer artificial neural network full connection layer reverse training as shown in FIG. 6 using the apparatus and instruction set of the present invention.
- step S1 an IO instruction is pre-stored at the first address of the instruction memory unit 1.
- step S2 the operation starts, the controller unit 2 reads the IO instruction from the first address of the instruction storage unit 1, and according to the decoded control signal, the data access unit 3 reads from the external address space and the single-layer artificial neural network.
- the full connection layer reverse trains all the relevant instructions and stores them in the instruction storage unit 1.
- step S3 the controller unit 2 then reads in the next IO instruction from the instruction storage unit, and according to the decoded control signal, the data access unit 3 reads all the data required by the main operation module 5 from the external address space to the main operation module 5.
- the first neuron storage unit 53 includes the input neuron and activation function derivative values and the input gradient vector in the previous forward operation.
- the controller unit 2 then reads in the next IO instruction from the instruction storage unit, and according to the translated control signal, the data access unit 3 reads the ownership value data and the weight update required from the arithmetic module 6 from the external address space.
- the gradient data is stored in the weight storage unit 64 and the weight update gradient storage unit 65 of the corresponding slave operation module 6, respectively.
- step S5 the controller unit 2 then reads the next CONFIG instruction from the instruction storage unit, and the first and second operation units configure the value of the internal unit register of the operation unit according to the decoded control signal, including the calculation required by the layer artificial neural network. Various constants, the accuracy of the calculation of this layer, the learning rate when updating the weight, and so on.
- step S6 the controller unit 2 then reads the next COMPUTE instruction from the instruction storage unit, and according to the decoded control signal, the earth operation module 5 sends the input gradient vector and the input neuron in the forward operation through the interconnection module 4
- Each of the slave arithmetic modules 6 stores the input gradient vector and the input neuron at the time of the forward operation to the second neuron storage unit 63 of the slave arithmetic module 6.
- step S7 the weight vector (i.e., the partial column of the weight matrix stored by the slave module) is read from the weight storage unit 64 from the second operation unit 61 of the arithmetic module 6 according to the control signal decoded by the COMPUTE instruction. Completing the vector multiplication scalar operation of the weight vector and the input gradient vector, returning the output vector portion and passing through the interconnection module 4; simultaneously multiplying the input gradient vector by the operation module 6 to obtain the weight update gradient deposit right The value updates the gradient storage unit 65.
- step S8 in the interconnection module 4, the output gradient portion returned from each of the arithmetic modules 6 and the splicing operation through the interconnection module obtain a complete output gradient vector.
- step S9 the main operation module 5 obtains the return value of the interconnection module 4, and the control is translated according to the COMPUTE instruction. Signaling, reading the activation function derivative value in the forward operation from the first neuron storage unit 53, multiplying the derivative value by the returned output vector, obtaining the input gradient vector of the next layer of reverse training, and writing it back to The first neuron storage unit 53.
- step S10 the controller unit 2 then reads the next COMPUTE instruction from the instruction storage unit, reads the weight w from the weight storage unit 64 from the arithmetic module 6 according to the translated control signal, and updates the gradient storage unit from the weight.
- the weight update gradient dw of this time and the weight update gradient dw' used by the last update weight are read, and the weight m is updated according to the learning rate ⁇ , the momentum m and the weight attenuation coefficient ⁇ set by the instruction.
- step S11 the controller unit then reads the next IO instruction from the instruction storage unit, and according to the decoded control signal, the data access unit 3 stores the output gradient vector in the first neuron storage unit 53 to the external address space designation address. The operation ends.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Feedback Control In General (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (11)
- 一种用于执行人工神经网络全连接层反向训练的装置,包括指令存储单元、控制器单元、数据访问单元、互连模块、主运算模块、多个从运算模块,其中:指令存储单元用于存储指令;控制器单元用于从指令存储单元读取指令,并将该指令译码成控制互连模块、主运算模块、以及所述多个从运算模块行为的控制信号;数据访问单元执行外部地址空间与所述装置之间的数据或指令读写操作;在每层人工神经网络全连接层反向训练开始计算的阶段,主运算模块通过互连模块向所有的从运算模块传输本层的输入梯度向量;每个从运算模块计算输入梯度向量中相应的部分标量元素与权值矩阵的对应列的乘积,得到输出梯度向量部分和;在从运算模块的计算过程完成后,互连模块逐级将各从运算模块的输出梯度向量部分和拼接得到本层的输出梯度向量;主运算模块在每一层的计算过程中,利用本层的输出梯度向量完成后续计算。
- 根据权利要求1所述的装置,其中,主运算模块将每一层的输出梯度向量与下一层的激活函数求导值对位相乘,作为下一层的输入梯度向量。
- 根据权利要求1所述的装置,其中,主运算模块包括第一运算单元、第一数据依赖关系判断单元和第一神经元存储单元,其中:第一神经元存储单元用于存储主运算模块在计算过程中用到的输入数据和输出数据;第一运算单元完成主运算模块的各种运算功能;第一数据依赖关系判断单元是第一运算单元读写第一神经元存储单元的端口,保证对第一神经元存储单元中离散或连续的数据读写不存在一致性冲突,并且负责从第一神经元存储单元读取输入梯度向量通过互连模块发送给各从运算模块;以及来自互连模块的输出梯度向量被发送到第一运算单元。
- 根据权利要求1所述的装置,其中,每个从运算模块包括第二运算单元、 第二数据依赖关系判定单元、第二神经元存储单元、权值存储单元和权值更新梯度存储单元,其中:第二运算单元接收控制器单元发出的控制信号并进行算数逻辑运算;第二数据依赖关系判断单元负责计算过程中对第二神经元存储单元、权值存储单元和权值更新梯度存储单元的读写操作,保证对第二神经元存储单元、权值存储单元和权值更新梯度存储单元的读写不存在一致性冲突;第二神经元存储单元存储输入梯度向量数据中与该从运算模块相对应的标量数据以及该从运算模块计算得到的输出梯度向量部分和;权值存储单元存储权值矩阵中与该从运算模块所存储的所述标量数据相对应的列;以及权值更新梯度存储单元存储该从运算模块在更新权值过程中需要的权值更新梯度数据,每个从运算模块存储的权值更新梯度数据与其存储的权值数据相对应。
- 根据权利要求4所述的装置,其中:计算权值更新梯度的过程如下:设x是第n层的输出神经元向量,主运算模块将该输出神经元向量通过互连模块传送给各个从运算模块,主运算模块还将第n+1层的输出梯度向量in_gradient通过互连模块传送到各个从运算模块;每个从运算模块将输出梯度向量in_gradient中与该从运算模块相对应的标量数据与输出神经元向量x相乘,得到本从运算模块的第n层的原始权值更新梯度向量dw_original。
- 根据权利要求5所述的装置,其中:在算出所有层的原始权值更新梯度向量之后,主运算模块对原始权值更新梯度进行处理,首先设定一个正的常数clip_gradient;然后计算所有层的原始权值更新梯度的平方和sumsq_diff,然后对sumsq_diff进行开方得到12norm_diff;如果12norm_diff大于clip_gradient,则得到一个缩放因子scale_factor=clip_gradient/12norm_diff,所有的原始权值更新梯度dw_original都分别乘以这个缩放因子scale_factor,得到权值更新梯度dw。
- 根据权利要求6所述的装置,其中,在得到scale_dw后,使用w、dw和 上一次更新权值时使用的权值更新梯度dw’根据指令设置的学习率α,动量m和权重衰减系数η更新权值w。
- 根据权利要求7所述的装置,其中,更新权值的方式是:w=η*w+α*(dw+m*dw’),或w=w+α*dw。
- 根据权利要求7所述的装置,其中,每个从运算模块存储w、dw和dw’中与该从运算模块相对应的列向量。
- 一种用于执行单层人工神经网络全连接层反向训练的方法,包括:在步骤S1,在指令存储单元的首地址处预先存入一条IO指令;在步骤S2,运算开始,控制器单元从指令存储单元的首地址读取该条IO指令,根据译出的控制信号,数据访问单元从外部地址空间读取与该单层人工神经网络全连接层反向训练有关的所有指令,并将其存储在指令存储单元中;在步骤S3,控制器单元接着从指令存储单元读入下一条IO指令,根据译出的控制信号,数据访问单元从外部地址空间读取主运算模块需要的所有数据至主运算模块的第一神经元存储单元,所述数据包括之前正向运算时的输入神经元和激活函数导数值以及输入梯度向量;在步骤S4,控制器单元接着从指令存储单元读入下一条IO指令,根据译出的控制信号,数据访问单元从外部地址空间读取从运算模块需要的所有权值数据和权值更新梯度数据,并分别存储到相应的从运算模块的权值存储单元和权值更新梯度存储单元;在步骤S5,控制器单元接着从指令存储单元读入下一条CONFIG指令,第一和第二运算单元根据译出的控制信号配置运算单元内部寄存器的值,包括该层人工神经网络计算需要的各种常数,本层计算的精度设置、更新权值时的学习率;在步骤S6,控制器单元接着从指令存储单元读入下一条COMPUTE指令,根据译出的控制信号,主运算模块通过互连模块将输入梯度向量和正向运算时的输入神经元发给各从运算模块,所述输入梯度向量和正向运算时的输入神经元存至从运算模块的第二神经元存储单元;在步骤S7,根据COMPUTE指令译出的控制信号,从运算模块的第二运算单元从权值存储单元64读取权值向量,完成权值向量和输入梯度向量的向量乘标量运算,将输出向量部分和通过互连模块返回;同时从运算模块将输入梯度向量与输入神 经元相乘,得到权值更新梯度存至权值更新梯度存储单元65;在步骤S8,在互连模块中,各从运算模块返回的输出梯度部分和通过互连模块的拼接操作得到完整的输出梯度向量;在步骤S9,主运算模块得到互连模块的返回值,根据COMPUTE指令译出的控制信号,从神经元存储单元读取正向运算时的激活函数导数值,将导数值乘以返回的输出向量,得到下一层反向训练的输入梯度向量,将其写回至第一神经元存储单元;在步骤S10,控制器单元接着从指令存储单元读入下一条COMPUTE指令,根据译出的控制信号,从运算模块从权值存储单元读取权值w,从权值更新梯度存储单元读取本次的权值更新梯度dw和上一次更新权值使用的权值更新梯度dw’,根据指令设置的学习率α,动量m和权重衰减系数η更新权值w;以及在步骤S11,控制器单元接着从指令存储单元读入下一条IO指令,根据译出的控制信号,数据访问单元将第一神经元存储单元中的输出梯度向量存至外部地址空间指定地址,运算结束。
- 一种用于执行多层人工神经网络全连接层反向训练的方法,包括:对于每一层,执行根据权利要求10所述的方法;以及当上一层人工神经网络全连接层执行完毕后,下一层的运算指令将主运算模块中计算出的输出梯度向量作为下一层训练的输入梯度向量执行根据权利要求10的方法,指令中的权值地址和权值更新梯度地址相应地变更至本层对应的地址。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP16899905.0A EP3451242A4 (en) | 2016-04-29 | 2016-05-05 | DEVICE AND METHOD FOR REVERSE LEARNING OF COMPLETELY CONNECTED LAYERS OF A NEURONAL NETWORK |
KR1020187033950A KR102470264B1 (ko) | 2016-04-29 | 2016-05-05 | 완전연결층 신경망 역방향 트레이닝 실행용 장치와 방법 |
US16/174,050 US20190065958A1 (en) | 2016-04-29 | 2018-10-29 | Apparatus and Methods for Training in Fully Connected Layers of Convolutional Networks |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610285062.0 | 2016-04-29 | ||
CN201610285062.0A CN107341541B (zh) | 2016-04-29 | 2016-04-29 | 一种用于执行全连接层神经网络训练的装置和方法 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/174,050 Continuation-In-Part US20190065958A1 (en) | 2016-04-29 | 2018-10-29 | Apparatus and Methods for Training in Fully Connected Layers of Convolutional Networks |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017185394A1 true WO2017185394A1 (zh) | 2017-11-02 |
Family
ID=60160559
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/081114 WO2017185394A1 (zh) | 2016-04-29 | 2016-05-05 | 一种用于执行全连接层神经网络反向训练的装置和方法 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20190065958A1 (zh) |
EP (1) | EP3451242A4 (zh) |
KR (1) | KR102470264B1 (zh) |
CN (2) | CN107341541B (zh) |
WO (1) | WO2017185394A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110147873A (zh) * | 2018-05-18 | 2019-08-20 | 北京中科寒武纪科技有限公司 | 卷积神经网络的处理器及训练方法 |
CN111126581A (zh) * | 2018-12-18 | 2020-05-08 | 中科寒武纪科技股份有限公司 | 数据处理方法、装置及相关产品 |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180053086A1 (en) * | 2016-08-22 | 2018-02-22 | Kneron Inc. | Artificial neuron and controlling method thereof |
CN108108662B (zh) * | 2017-11-24 | 2021-05-25 | 深圳市华尊科技股份有限公司 | 深度神经网络识别模型及识别方法 |
CN111242294B (zh) * | 2017-12-14 | 2023-08-25 | 中科寒武纪科技股份有限公司 | 集成电路芯片装置及相关产品 |
CN108108189B (zh) * | 2017-12-15 | 2020-10-30 | 安徽寒武纪信息科技有限公司 | 一种计算方法及相关产品 |
CN108053029B (zh) * | 2017-12-27 | 2021-08-27 | 上海闪易半导体有限公司 | 一种基于存储阵列的神经网络的训练方法 |
CN110163334B (zh) * | 2018-02-11 | 2020-10-09 | 上海寒武纪信息科技有限公司 | 集成电路芯片装置及相关产品 |
CN110163349B (zh) * | 2018-02-12 | 2021-03-23 | 上海寒武纪信息科技有限公司 | 一种网络模型的计算方法及装置 |
US12073215B2 (en) | 2018-02-13 | 2024-08-27 | Shanghai Cambricon Information Technology Co., Ltd | Computing device with a conversion unit to convert data values between various sizes of fixed-point and floating-point data |
EP3651079B1 (en) * | 2018-02-13 | 2021-10-27 | Shanghai Cambricon Information Technology Co., Ltd | Computation device and method |
CN111368987B (zh) * | 2018-12-25 | 2023-03-24 | 上海寒武纪信息科技有限公司 | 一种神经网络计算装置和方法 |
CN110728364A (zh) * | 2018-07-17 | 2020-01-24 | 上海寒武纪信息科技有限公司 | 一种运算装置和运算方法 |
US11423284B2 (en) * | 2018-09-07 | 2022-08-23 | Black Sesame Technologies, Inc | Subgraph tile fusion in a convolutional neural network |
CN109447258B (zh) * | 2018-09-19 | 2021-09-14 | 北京市商汤科技开发有限公司 | 神经网络模型的优化方法及装置、电子设备和存储介质 |
CN109165738B (zh) * | 2018-09-19 | 2021-09-14 | 北京市商汤科技开发有限公司 | 神经网络模型的优化方法及装置、电子设备和存储介质 |
CN110968404B (zh) * | 2018-09-30 | 2023-04-28 | 阿里巴巴集团控股有限公司 | 一种设备数据处理方法及装置 |
CN110059797B (zh) * | 2018-10-10 | 2020-03-10 | 中科寒武纪科技股份有限公司 | 一种计算装置及相关产品 |
CN111368986B (zh) * | 2018-12-25 | 2023-03-10 | 上海寒武纪信息科技有限公司 | 一种神经网络计算装置和方法 |
CN109740739B (zh) * | 2018-12-29 | 2020-04-24 | 中科寒武纪科技股份有限公司 | 神经网络计算装置、神经网络计算方法及相关产品 |
CN111722937B (zh) * | 2019-03-21 | 2024-05-10 | 阿里巴巴集团控股有限公司 | 深度学习权重更新方法、装置 |
WO2020192582A1 (zh) * | 2019-03-26 | 2020-10-01 | 上海寒武纪信息科技有限公司 | 神经网络运算模块及方法 |
JP7370158B2 (ja) * | 2019-04-03 | 2023-10-27 | 株式会社Preferred Networks | 情報処理装置および情報処理方法 |
CN110032450B (zh) * | 2019-04-17 | 2021-04-20 | 中山大学 | 一种基于固态盘扩展内存的大规模深度学习方法及系统 |
CN113222101A (zh) * | 2020-02-05 | 2021-08-06 | 北京百度网讯科技有限公司 | 深度学习处理装置、方法、设备和存储介质 |
CN112329941B (zh) * | 2020-11-04 | 2022-04-12 | 支付宝(杭州)信息技术有限公司 | 深度学习模型的更新方法及装置 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5517596A (en) * | 1991-05-17 | 1996-05-14 | International Business Machines Corporation | Learning machine synapse processor system apparatus |
CN103150596A (zh) * | 2013-02-22 | 2013-06-12 | 百度在线网络技术(北京)有限公司 | 一种反向传播神经网络dnn的训练系统 |
CN103996069A (zh) * | 2013-02-20 | 2014-08-20 | 百度在线网络技术(北京)有限公司 | 一种基于多gpu的bpnn训练方法和装置 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105488565A (zh) * | 2015-11-17 | 2016-04-13 | 中国科学院计算技术研究所 | 加速深度神经网络算法的加速芯片的运算装置及方法 |
CN105512723B (zh) * | 2016-01-20 | 2018-02-16 | 南京艾溪信息科技有限公司 | 一种用于稀疏连接的人工神经网络计算装置和方法 |
CN106991478B (zh) * | 2016-01-20 | 2020-05-08 | 中科寒武纪科技股份有限公司 | 用于执行人工神经网络反向训练的装置和方法 |
WO2017185248A1 (zh) * | 2016-04-27 | 2017-11-02 | 北京中科寒武纪科技有限公司 | 用于执行人工神经网络自学习运算的装置和方法 |
-
2016
- 2016-04-29 CN CN201610285062.0A patent/CN107341541B/zh active Active
- 2016-04-29 CN CN201811221576.5A patent/CN109376861B/zh active Active
- 2016-05-05 WO PCT/CN2016/081114 patent/WO2017185394A1/zh active Application Filing
- 2016-05-05 KR KR1020187033950A patent/KR102470264B1/ko active IP Right Grant
- 2016-05-05 EP EP16899905.0A patent/EP3451242A4/en not_active Ceased
-
2018
- 2018-10-29 US US16/174,050 patent/US20190065958A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5517596A (en) * | 1991-05-17 | 1996-05-14 | International Business Machines Corporation | Learning machine synapse processor system apparatus |
CN103996069A (zh) * | 2013-02-20 | 2014-08-20 | 百度在线网络技术(北京)有限公司 | 一种基于多gpu的bpnn训练方法和装置 |
CN103150596A (zh) * | 2013-02-22 | 2013-06-12 | 百度在线网络技术(北京)有限公司 | 一种反向传播神经网络dnn的训练系统 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3451242A4 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110147873A (zh) * | 2018-05-18 | 2019-08-20 | 北京中科寒武纪科技有限公司 | 卷积神经网络的处理器及训练方法 |
CN110147873B (zh) * | 2018-05-18 | 2020-02-18 | 中科寒武纪科技股份有限公司 | 卷积神经网络的处理器及训练方法 |
CN111126581A (zh) * | 2018-12-18 | 2020-05-08 | 中科寒武纪科技股份有限公司 | 数据处理方法、装置及相关产品 |
Also Published As
Publication number | Publication date |
---|---|
KR20190003612A (ko) | 2019-01-09 |
CN107341541A (zh) | 2017-11-10 |
CN107341541B (zh) | 2021-01-29 |
CN109376861A (zh) | 2019-02-22 |
KR102470264B1 (ko) | 2022-11-23 |
EP3451242A4 (en) | 2019-12-25 |
CN109376861B (zh) | 2020-04-24 |
EP3451242A1 (en) | 2019-03-06 |
US20190065958A1 (en) | 2019-02-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017185394A1 (zh) | 一种用于执行全连接层神经网络反向训练的装置和方法 | |
US10713568B2 (en) | Apparatus and method for executing reversal training of artificial neural network | |
KR102544275B1 (ko) | 콘볼루션 신경망 트레이닝 실행용 장치와 방법 | |
WO2017185387A1 (zh) | 一种用于执行全连接层神经网络正向运算的装置和方法 | |
CN107316078B (zh) | 用于执行人工神经网络自学习运算的装置和方法 | |
CN111860813B (zh) | 一种用于执行卷积神经网络正向运算的装置和方法 | |
CN109284825B (zh) | 用于执行lstm运算的装置和方法 | |
CN111260025B (zh) | 用于执行lstm神经网络运算的装置和运算方法 | |
WO2017185347A1 (zh) | 用于执行循环神经网络和lstm运算的装置和方法 | |
EP3564863B1 (en) | Apparatus for executing lstm neural network operation, and operational method | |
CN107886166B (zh) | 一种执行人工神经网络运算的装置和方法 | |
WO2017185248A1 (zh) | 用于执行人工神经网络自学习运算的装置和方法 | |
WO2018058452A1 (zh) | 一种执行人工神经网络运算的装置和方法 | |
WO2017177446A1 (zh) | 支持离散数据表示的人工神经网络反向训练装置和方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 20187033950 Country of ref document: KR Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16899905 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2016899905 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2016899905 Country of ref document: EP Effective date: 20181129 |