WO2018112699A1 - 人工神经网络反向训练装置和方法 - Google Patents

人工神经网络反向训练装置和方法 Download PDF

Info

Publication number
WO2018112699A1
WO2018112699A1 PCT/CN2016/110751 CN2016110751W WO2018112699A1 WO 2018112699 A1 WO2018112699 A1 WO 2018112699A1 CN 2016110751 W CN2016110751 W CN 2016110751W WO 2018112699 A1 WO2018112699 A1 WO 2018112699A1
Authority
WO
WIPO (PCT)
Prior art keywords
learning rate
unit
layer
training
gradient vector
Prior art date
Application number
PCT/CN2016/110751
Other languages
English (en)
French (fr)
Inventor
陈云霁
郝一帆
刘少礼
陈天石
Original Assignee
上海寒武纪信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海寒武纪信息科技有限公司 filed Critical 上海寒武纪信息科技有限公司
Priority to PCT/CN2016/110751 priority Critical patent/WO2018112699A1/zh
Publication of WO2018112699A1 publication Critical patent/WO2018112699A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons

Definitions

  • the invention relates to an artificial neural network, in particular to an artificial neural network reverse training device and an artificial neural network reverse training method.
  • Neural Networks are simply referred to as Neural Networks (NNs), which is an algorithmic mathematical model that mimics the behavioral characteristics of animal neural networks and performs distributed parallel information processing. This kind of network relies on the complexity of the system, and adjusts the interconnection relationship between a large number of internal nodes to achieve the purpose of processing information.
  • the algorithm used by neural networks is vector multiplication, and symbolic functions and their various approximations are widely used.
  • One known method of supporting multi-layer artificial neural network reverse training is to use a general purpose processor.
  • One of the disadvantages of this approach is that the performance of a single general purpose processor is low and cannot meet the performance requirements of conventional multi-layer artificial neural network operations.
  • communication between general-purpose processors becomes a performance bottleneck.
  • the general-purpose processor needs to reverse-decompose the multi-layer artificial neural network into a long-column operation and a fetch instruction sequence, and the processor front-end decoding brings a large power consumption overhead.
  • GPU graphics processing unit
  • the GPU has only a small on-chip cache, and the model data (weight) of the multi-layer artificial neural network needs to be repeatedly transferred from off-chip.
  • the off-chip bandwidth becomes the main performance bottleneck, and brings huge power consumption overhead.
  • an artificial neural network reverse training apparatus includes a controller unit, a storage unit, a learning rate adjustment unit, and an operation unit, where
  • a storage unit for storing neural network data, including instructions, weights, derivatives of activation functions, learning rates, gradient vectors, and learning rate adjustment data;
  • controller unit configured to read an instruction from the storage unit, and decode the instruction into a micro-instruction that controls a behavior of the storage unit, the learning rate adjustment unit, and the operation unit;
  • the learning rate adjustment unit adjusts the data according to the previous generation learning rate and the learning rate before each training start, and obtains the learning rate for the current training after the operation;
  • the arithmetic unit calculates the generation weight according to the gradient vector, the learning rate of the current generation, the derivative of the activation function, and the previous generation weight.
  • the operation unit includes a main operation unit, an interconnection unit and a plurality of slave operation units, and the gradient vector includes an input gradient vector and an output gradient vector, wherein: a main operation unit is used in the calculation process of each layer.
  • the output gradient vector of the layer is used to complete the subsequent calculation;
  • the interconnecting unit is used to perform the calculation of the back training of each layer of the neural network, and the main operation unit transmits the input gradient vector of the layer to all the slave units through the interconnection unit.
  • the interconnect unit sequentially adds the output gradient vector portion of each slave unit and the two pairs to obtain the output gradient vector of the layer; the plurality of slave units use the same input gradient vector and The respective weight data is used to calculate the sum of the corresponding output gradient vectors in parallel.
  • the storage unit is an on-chip cache.
  • the instruction is a SIMD instruction.
  • the learning rate adjustment data includes a weight change amount and an error function.
  • an artificial neural network reverse training method comprising the steps of:
  • step S4 It is judged whether the neural network converges, and if so, the operation ends, otherwise, the process proceeds to step S1.
  • step S2 includes:
  • step S25 It is judged whether all the layers are updated, and if yes, the process proceeds to step S3; otherwise, the process proceeds to step S21.
  • the weights use a non-uniform learning rate.
  • the weights adopt a unified learning rate.
  • FIG. 1 is an overall knot of an artificial neural network reverse training device according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of an interconnection unit in the artificial neural network reverse training device of FIG. 1;
  • FIG. 3 is a schematic diagram of a reverse adjustment process of an artificial neural network according to an embodiment of the invention.
  • FIG. 4 is a schematic diagram of a reverse adjustment process using an artificial neural network according to an embodiment of the invention.
  • FIG. 5 is a flowchart of operations using an artificial neural network reverse training method according to an embodiment of the invention.
  • FIG. 6 is a flowchart of operations using an artificial neural network reverse training method according to another embodiment of the present invention.
  • the traditional artificial neural network training method is the back propagation algorithm.
  • the change of the weight between the two generations is the error function multiplied by the constant of the weight by a constant. This constant is called the learning rate.
  • the learning rate determines the amount of weight change that occurs during each cycle of training. If the value is too small, the effective update of the weight in each iteration is too small. The small learning rate leads to a longer training time, and the convergence speed is quite slow; if the value is too large, the iterative process will oscillate and become divergent.
  • the artificial neural network reverse training device of the present invention is provided with a learning rate adjusting unit, and before each training starts, the data is adjusted according to the previous generation learning rate and the learning rate, and the learning rate is used for the current generation learning rate. It is more appropriate to determine the amount of weight change generated in each cycle training, which makes the training iterative process more stable, reduces the time required for neural network training to stabilize, and improves training efficiency.
  • FIG. 1 is a block diagram showing an overall structure of an artificial neural network reverse training device according to an embodiment of the present invention.
  • An embodiment of the present invention provides an apparatus for artificial neural network reverse training that supports an adaptive learning rate, including:
  • a storage unit A for storing neural network data, including instructions, weights, derivatives of activation functions, learning rates, gradient vectors (which may include input gradient vectors and output gradient vectors), and Learning rate adjustment data (which may include network error value, value change amount, etc.);
  • the storage unit may be an on-chip buffer, avoiding repeatedly reading the data into the memory and the memory bandwidth becomes a multi-layer artificial neural network operation and its training algorithm Performance bottlenecks.
  • controller unit B configured to read an instruction from the storage unit A, and decode the instruction into a micro-instruction that controls a behavior of the storage unit, the learning rate adjustment unit, and the operation unit;
  • the SIMD instruction may be used to solve the problem of insufficient performance of the existing CPU and GPU by adopting a dedicated SIMD instruction for the multi-layer artificial neural network operation, and the front-end decoding overhead Big problem.
  • the learning rate adjustment unit E adjusts the data according to the previous generation learning rate and the learning rate before each generation of training, and calculates the learning rate for the current generation after the calculation;
  • the arithmetic unit calculates the generation weight according to the gradient vector, the current learning rate, the derivative of the activation function, and the previous generation weight.
  • the storage unit A for storing neural network data including instructions and storing neuron input, weights, neuron output, learning rate, weight change amount, activation function derivative, layer gradient vectors, and the like;
  • controller unit B it is used to read an instruction from storage unit A and decode the instruction into microinstructions that control the behavior of each unit;
  • the arithmetic unit may include a main arithmetic unit C, an interconnect unit D, and a plurality of slave arithmetic units F.
  • the interconnect unit D is used to connect the main operation unit and the slave operation unit, and can be implemented into different interconnection topologies (such as a tree structure, a ring structure, a grid structure, a hierarchical interconnection, a bus structure, etc.).
  • the interconnecting unit D is used to transmit the input gradient vector of the current layer to all the slave operating units F through the interconnecting unit D at the stage of starting the calculation of the reverse training of each layer of the neural network, in the slave computing unit F. After the calculation process is completed, the interconnect unit D sequentially adds the output gradient vector portions of the respective slave operation units F and the two pairs to obtain the output gradient vector of the layer.
  • the main operation unit C is configured to perform subsequent calculations by using the output gradient vector of the layer in the calculation process of each layer;
  • weight, network error value, weight change amount and other information of the previous generation (the information is stored in the storage unit in advance, can be called)
  • the learning rate for this generation of training is obtained.
  • FIG. 2 schematically shows an embodiment of an interconnection unit 4: an interconnection structure.
  • the interconnection unit D constitutes a data path between the main operation unit C and the plurality of slave operation units F, and has an interconnection structure.
  • the interconnection includes a plurality of nodes, which constitute a binary tree path, that is, each node has one parent node and two child nodes. Each node sends the upstream data to the downstream two child nodes through the parent node, merges the data returned by the two downstream child nodes, and returns the data to the upstream parent node.
  • the vectors returned by the two downstream nodes are added to a vector at the current node and returned to the upstream node.
  • the input gradient in the main operation unit C is sent to each slave operation unit F through the interconnection unit D; when the calculation process from the operation unit F is completed, each output from the operation unit F is output.
  • the output gradient vector portion sums are added two by two in the interconnect unit D, that is, the sum and sum of all the output gradient vectors are used as the final output gradient vector.
  • the learning rate adjustment unit E the calculation performed by the data differs depending on the adaptive learning rate adjustment method.
  • w(k) is the current training weight, ie, the generation weight, w(k+1) is the next generation weight, and ⁇ is a fixed learning rate, which is a predetermined constant, g (w) is a gradient vector.
  • the method of adjusting the learning rate is to reduce the learning rate when the training error increases, and to increase the learning rate when the training error is reduced.
  • adaptive learning rate adjustment rules are given below, but are not limited to these.
  • ⁇ (k) is the generation learning rate
  • ⁇ (k+1) is the next generation learning rate
  • a >0, b>0, a, b are appropriate constants.
  • ⁇ (k) is the generation learning rate
  • ⁇ (k+1) is the next-generation learning rate
  • ⁇ (k) is the generation learning rate
  • ⁇ (k+1) is the next generation learning rate
  • a >1,0 ⁇ b ⁇ 1,c>0 a, b, c are appropriate constants.
  • ⁇ (k) is the learning rate of the current generation
  • ⁇ (k+1) is the next-generation learning rate
  • the learning rate ⁇ in the above four methods can be common to all weights, that is, each weight of each layer uses the same learning rate in each generation of training, and we remember that this method is unified.
  • Adaptive learning rate training method it may not be universal, that is, different learning rates are used for each weight.
  • this method is a self-adaptive learning rate training method.
  • the adaptive learning rate training method can further improve the training accuracy and reduce the training time.
  • connection weights w jp1 , w jp2 , . . . , w jpn between the output layer P and the hidden layer J are uniformly adjusted by the learning rate ⁇ in the reverse adjustment; in FIG. 4, the output layer
  • the connection weights w jp1 , w jp2 , . . . , w jpn between P and the hidden layer J are adjusted by the learning ratios ⁇ 1 , ⁇ 2 , . . . , ⁇ n in the reverse adjustment.
  • the differential reverse adjustment between different nodes can maximize the adaptive ability of the learning rate and maximize the variable requirements of the weight in learning.
  • the iterative updating of the respective learning rates can still be performed according to the method one to the fourth method, and is not limited to the four.
  • the learning rate ⁇ in this formula is the respective learning rate corresponding to each weight.
  • the present invention also provides an artificial neural network reverse training method, and the operation flow chart is as shown in FIG. 5, including the steps:
  • step S4 Determine whether the neural network converges. If yes, the operation ends. Otherwise, go to step S1.
  • the learning rate adjustment unit E calls the learning rate adjustment data in the storage unit A to adjust the learning rate, and obtains the learning rate for the current training.
  • Step S2 after the current generation of training begins, according to the learning rate of the current training, the weight is updated layer by layer.
  • Step S2 may include the following sub-steps (see FIG. 6):
  • step S21 for each layer, first, the input gradient vector is weighted and summed to calculate the output gradient vector of the layer, wherein the weight of the weighted summation is the weight of the layer to be updated, and the process is performed by the main operation.
  • Unit C, interconnection unit D and each slave operation unit F are completed together;
  • Step S22 in the main operation unit C, the output gradient vector is multiplied by the derivative value of the activation function of the following layer in the forward operation to obtain the input gradient vector of the next layer;
  • Step S23 in the main operation unit C, the input gradient vector and the input in the forward operation The neuron is multiplied by a bit to obtain a gradient of the weight of the layer;
  • Step S24 finally, in the main operation unit C, the weight of the layer is updated according to the obtained gradient of the layer weight and the learning rate;
  • Step S25 It is judged whether the weights of all the layers are updated, and if yes, proceed to step S3, otherwise, go to step S21.
  • the main operation unit C calculates other data for adjusting the learning rate, such as the network error of the present generation, and puts it into the storage unit A, and the training ends.
  • Step S4 It is judged whether the network converges, and if so, the operation ends, otherwise, the process proceeds to step S1.
  • the weight is a non-uniform learning rate or a unified learning rate. For details, refer to the above, and I will not repeat them here.

Abstract

一种人工神经网络反向训练装置和方法,该装置包括控制器单元(B)、存储单元(A)、学习率调整单元(E)和运算单元(D,C,F),存储单元(A)用于存储神经网络数据,包括指令、权值、激活函数的导数、学习率、梯度向量和学习率调整数据;控制器单元(B),用于从存储单元中读取指令,并将指令译码成控制存储单元(A)、学习率调整单元(E)和运算单元(D,C,F)行为的微指令;学习率调整单元(E),每代训练开始前,根据上一代学习率和学习率调整数据,运算后得出用于本代学习率;运算单元(D,C,F),根据梯度向量、本代学习率、激活函数的导数和上一代权值计算本代权值。上述装置和方法使得训练迭代过程更加稳定,而且减少了神经网络训练至稳定所需的时间,提升了训练效率。

Description

人工神经网络反向训练装置和方法 技术领域
本发明涉及人工神经网络,具体地涉及一种人工神经网络反向训练装置,以及一种人工神经网络反向训练方法。
背景技术
人工神经网络(Artificial Neural Networks,ANNs)简称为神经网络(NNs),它是一种模仿动物神经网络行为特征,进行分布式并行信息处理的算法数学模型。这种网络依靠系统的复杂程度,通过调整内部大量节点之间的相互连接关系,从而达到处理信息的目的。神经网络用到的算法就是向量乘法,并且广泛采用符号函数及其各种逼近。
一种支持多层人工神经网络反向训练的已知方法是使用通用处理器。该方法的缺点之一是单个通用处理器的运算性能较低,无法满足通常的多层人工神经网络运算的性能需求。而多个通用处理器并行执行时,通用处理器之间相互通信又成为了性能瓶颈。另外,通用处理器需要把多层人工神经网络反向运算译码成一长列运算及访存指令序列,处理器前端译码带来了较大的功耗开销。
另一种支持多层人工神经网络反向训练的已知方法是使用图形处理器(GPU)。GPU只有较小的片上缓存,多层人工神经网络的模型数据(权值)需要反复从片外搬运,片外带宽成为了主要性能瓶颈,同时带来了巨大的功耗开销。
发明内容
本发明的目的在于,提供一种支持自适应性学习率的人工神经网络反向训练的装置和方法,解决以上所述现有技术中的至少一项技术问题。
根据本发明的一方面,提供一种人工神经网络反向训练装置,包括控制器单元、存储单元、学习率调整单元和运算单元,其中,
存储单元,用于存储神经网络数据,包括指令、权值、激活函数的导数、学习率、梯度向量和学习率调整数据;
控制器单元,用于从存储单元中读取指令,并将指令译码成控制存储单元、学习率调整单元和运算单元行为的微指令;
学习率调整单元,每代训练开始前,根据上一代学习率和学习率调整数据,运算后得出用于本代训练的学习率;
运算单元,根据梯度向量、本代学习率、激活函数的导数和上一代权值计算本代权值。
进一步的,所述运算单元包括主运算单元、互联单元和多个从运算单元,所述梯度向量包括输入梯度向量和输出梯度向量,其中:主运算单元,用于在每一层的计算过程中,利用本层的输出梯度向量完成后续计算;互联单元,用于在每层神经网络反向训练开始计算的阶段,主运算单元通过互联单元向所有的从运算单元传输本层的输入梯度向量,在从运算单元的计算过程完成后,互联单元逐级将各从运算单元的输出梯度向量部分和两两相加得到本层的输出梯度向量;多个从运算单元,利用相同的输入梯度向量和各自的权值数据,并行地计算出相应的输出梯度向量部分和。
进一步的,所述存储单元为片上缓存。
进一步的,所述指令为SIMD指令。
进一步的,所述学习率调整数据包括权值变化量和误差函数。
根据本发明的另一方面,提供一种人工神经网络反向训练方法,包括步骤:
S1:每代训练开始前,根据上一代学习率和学习率调整数据,计算得到用于本代训练的学习率;
S2:训练开始,依据本代训练的学习率,逐层更新权值;
S3:所有权值更新完毕后,计算本代网络的学习率调整数据,进行存储;
S4:判断神经网络是否收敛,如果是,运算结束,否则,转步骤S1。
进一步的,步骤S2包括:
S21:对于网络的每一层,输入梯度向量进行加权求和计算出本层的输出梯度向量,其中加权求和的权重为本层待更新的权值;
S22:本层的输出梯度向量乘以下一层在正向运算时的激活函数的导数值得到下一层的输入梯度向量;
S23:将输入梯度向量与正向运算时的输入神经元对位相乘得到本层权值的梯度;
S24:根据所得到的本层权值的梯度和学习率来更新本层的权值;
S25:判断是否所有层更新完毕,如果是,进入步骤S3;否则,转步骤S21。
进一步的,本代训练时,权值采用非统一学习率。
进一步的,本代训练时,权值采用统一学习率。
(1)通过设置学习率调整单元,采用自适应性学习率训练网络,更加恰当的决定了每次循环训练中所产生的权值变化量,不仅使得训练迭代过程更加稳定,而且减少了神经网络训练至稳定所需的时间,提升了训练效率;
(2)通过采用针对多层人工神经网络运算算法的专用片上缓存,充分挖掘了输入神经元和权值数据的重用性,避免了反复向内存读取这些数据,降低了内存访问带宽,避免了内存带宽成为多层人工神经网络运算及其训练算法性能瓶颈的问题。
(3)通过采用针对多层人工神经网络运算的专用SIMD指令和定制的运算单元,解决了CPU和GPU运算性能不足,前端译码开销大的问题,有效提高了对多层人工神经网络运算算法的支持。
附图说明
图1是根据本发明一实施例的人工神经网络反向训练装置的整体结 构示例框图;
图2是图1中的人工神经网络反向训练装置中互联单元的结构示意图;
图3是根据本发明一实施例的人工神经网络反向调节过程示意图;
图4是根据本发明一实施例的采用人工神经网络反向调节过程示意图;
图5是根据本发明一实施例的采用人工神经网络反向训练方法的运算流程图。
图6是根据本发明另一实施例的采用人工神经网络反向训练方法的运算流程图。
具体实施方式
传统的人工神经网络采用的训练方法是反向传播算法,两代之间权值的变化量为误差函数对权值的梯度乘以一个常数,这个常数称为学习率。学习率决定每次循环训练中所产生的权值变化量。取值过小,每次迭代中权值的有效更新太小,小的学习率导致较长的训练时间,收敛的速度相当慢;取值过大,迭代过程会振荡以致发散。本发明的人工神经网络反向训练装置,其中设置有学习率调整单元,在每代训练开始前,根据上一代学习率和学习率调整数据,运算后得出用于本代学习率。更加恰当的决定了每次循环训练中所产生的权值变化量,使得训练迭代过程更加稳定,减少神经网络训练至稳定所需的时间,提升训练效率。
为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明作进一步的详细说明。
图1是根据本发明一实施例的一种人工神经网络反向训练装置的整体结构示例框图。本发明实施例提供了一种支持自适应性学习率的人工神经网络反向训练的装置,包括:
存储单元A,用于存储神经网络数据,包括指令、权值、激活函数的导数、学习率、梯度向量(可包括输入梯度向量和输出梯度向量)和 学习率调整数据(可包括网络误差值、取值变化量等);所述存储单元可以为片上缓存,避免了反复向内存读取这些数据以及内存带宽成为多层人工神经网络运算及其训练算法性能瓶颈。
控制器单元B,用于从存储单元A中读取指令,并将指令译码成控制存储单元、学习率调整单元和运算单元行为的微指令;
对于存储单元A和控制器单元B存取和读取的指令,可以为SIMD指令,通过采用针对多层人工神经网络运算的专用SIMD指令,解决现有CPU和GPU运算性能不足,前端译码开销大的问题。
学习率调整单元E,每代训练开始前,根据上一代学习率和学习率调整数据,运算后得出用于本代学习率;
运算单元(D,C,F),根据梯度向量、本代学习率、激活函数的导数和上一代权值计算本代权值。
其中,对于存储单元A,用于存储包括指令以及存储神经元输入、权值、神经元输出、学习率、权值变化量、激活函数导数、各层梯度向量等的神经网络数据;
对于控制器单元B,其用于从存储单元A中读取指令,并将该指令译码成控制各个单元行为的微指令;
对于运算单元,其可以包括主运算单元C、互联单元D和多个从运算单元F。
互连单元D用于连接主运算单元和从运算单元,可以实现成不同的互连拓扑(如树状结构、环状结构、网格状结构、分级互连、总线结构等)。
其中,互联单元D,用于在每层神经网络反向训练开始计算的阶段,主运算单元C通过互联单元D向所有的从运算单元F传输本层的输入梯度向量,在从运算单元F的计算过程完成后,互联单元D逐级将各从运算单元F的输出梯度向量部分和两两相加得到本层的输出梯度向量。
主运算单元C,用于在每一层的计算过程中,利用本层的输出梯度向量完成后续计算;
多个从运算单元F,利用相同的输入梯度向量和各自的权值数据, 并行地计算出相应的输出梯度向量部分和;
对于学习率调整单元E,用于在每代训练开始前,根据上一代的学习率、权值、网络误差值、权值变化量等信息(这些信息事先存储在存储单元中,可以被调用),运算后得出用于这一代训练的学习率。
图2示意性示出了互连单元4的一种实施方式:互联结构。互联单元D构成主运算单元C和多个从运算单元F之间的数据通路,并具有互联结构。互联包括多个节点,该多个节点构成二叉树通路,即每个节点都有一个父(parent)节点和2个子(child)节点。每个节点将上游的数据通过父节点同样地发给下游的两个子节点,将下游的两个子节点返回的数据进行合并,并返回给上游的父节点。
例如,在神经网络反向运算过程中,下游两个节点返回的向量会在当前节点相加成一个向量并返回给上游节点。在每层人工神经网络开始计算的阶段,主运算单元C内的输入梯度通过互联单元D发送给各从运算单元F;当从运算单元F的计算过程完成后,每个从运算单元F输出的输出梯度向量部分和会在互联单元D中逐级两两相加,即对所有输出梯度向量部分和求和,作为最终的输出梯度向量。
学习率调整单元E中,根据自适应性学习率调整方法的不同,数据在其中进行的运算也不同。
首先,在标准的反向传播算法中:
w(k+1)=w(k)-ηg(w(k))    (1)
式(1)中,w(k)是当前的训练权值,即本代权值,w(k+1)是下一代权值,η是固定的学习率,是一个事先确定的常数,g(w)是梯度向量。
这里,我们允许学习率像其他网络参数一样,进行逐代的更新。调整学习率的方法是:当训练误差增大时,减小学习率;当训练误差减小时,增大学习率。下面给出几种具体的自适应性学习率调整规则例子,但不仅限于这几种。
方法一:
Figure PCTCN2016110751-appb-000001
式(2)中,η(k)为本代学习率,η(k+1)为下一代学习率,ΔE=E(k)-E(k-1)是误差函数E的变化量,a>0,b>0,a,b为适当的常数。
方法二:
η(k+1)=η(k)(1-EE)     (3)
式(3)中,η(k)为本代学习率,η(k+1)为下一代学习率,ΔE=E(k)-E(k-1)是误差函数E的变化量。
方法三:
Figure PCTCN2016110751-appb-000002
式(4)中,η(k)为本代学习率,η(k+1)为下一代学习率,ΔE=E(k)-E(k-1)是误差函数E的变化量,a>1,0<b<1,c>0,a,b,c为适当的常数。
方法四:
Figure PCTCN2016110751-appb-000003
式(5)中,η(k)为本代学习率,η(k+1)为下一代学习率,ΔE=E(k)-E(k-1)是误差函数E的变化量,0<a<1,b>1,0<α<1,a,b,α为适当的常数,
Figure PCTCN2016110751-appb-000004
以上四种方法中的学习率η,可以是对所有的权值通用的,即每一层的各个权值在每一代的训练时,用的是同一个学习率,我们记这种方法为统一自适应性学习率训练方法;也可以不是通用的,即对每个权值采用不同的学习率,我们记这种方法为各自自适应性学习率训练方法。各自自适应性学习率训练方法能进一步提高训练精度,减少训练时间。
为了对比更加清晰,我们分别给出了两种方法的示意图,统一自适 应性学习率训练方法和各自自适应性学习率训练方法分别对应图3和图4。
图3中,输出层P与隐含层J之间的连接权值wjp1,wjp2,...,wjpn在反向调节时,统一采用学习率η进行调整;图4中,输出层P与隐含层J之间的连接权值wjp1,wjp2,...,wjpn在反向调节时,分别采用学习率η1,η2,...,ηn进行调整。不同节点间的差异性反向调节,可以最大限度地调动学习率的自适应能力,最大程度地满足权重在学习中的多变要求。
至于各自的自适应性学习率的调整方法,在取完各个学习率的初始值后,各个学习率的迭代更新依然可以依照方法一到方法四,同样不仅限于这四种。此时式中的学习率η是各个权值所对应的各自的学习率。
基于同一发明构思,本发明还提供了一种人工神经网络反向训练方法,运算流程图如图5所示,包括步骤:
S1:每代训练开始前,根据上一代学习率和学习率调整数据,计算得到用于本代训练的学习率;
S2:训练开始,依据本代训练的学习率,逐层更新权值;
S3:所有权值更新完毕后,计算本代网络的学习率调整数据,进行存储;
S4:判断神经网络是否收敛,如果是,运算结束,否则,转步骤S1
对于步骤S1,每代训练开始前,学习率调整单元E调用存储单元A中学习率调整数据以调整学习率,得到用于本代训练的学习率。
对于步骤S2:此后本代训练开始,依据本代训练的学习率,逐层更新权值逐层更新权值。步骤S2可以包括以下子步骤(参见图6所示):
步骤S21,对于每一层来说,首先,对输入梯度向量进行加权求和计算出本层的输出梯度向量,其中加权求和的权重为本层待更新的权值,这一过程由主运算单元C、互联单元D和各从运算单元F共同完成;
步骤S22,主运算单元C中,该输出梯度向量乘以下一层在正向运算时的激活函数的导数值可以得到下一层的输入梯度向量;
步骤S23,主运算单元C中,将输入梯度向量与正向运算时的输入 神经元对位相乘得到本层权值的梯度;
步骤S24,最后,主运算单元C中,根据所得到的本层权值的梯度和学习率来更新本层的权值;
步骤S25:判断是否所有层的权值都更新完毕,如果是,进行步骤S3,否则,转步骤S21。
对于步骤S3,所有权值更新完毕后,主运算单元C计算本代网络误差等用于调整学习率的其他数据,并放入存储单元A,此代训练结束。
步骤S4:判断网络是否收敛,如果是,运算结束,否则,转步骤S1。
权值采用非统一学习率或者统一学习率,具体介绍参照上文所述内容,在此不予赘述。
以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进一步详细说明,应理解的是,以上所述仅为本发明的具体实施例而已,并不用于限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内

Claims (9)

  1. 一种人工神经网络反向训练装置,包括控制器单元、存储单元、学习率调整单元和运算单元,其中,
    存储单元,用于存储神经网络数据,神经网络数据包括指令、权值、激活函数的导数、学习率、梯度向量和学习率调整数据;
    控制器单元,用于从存储单元中读取指令,并将指令译码成控制存储单元、学习率调整单元和运算单元行为的微指令;
    学习率调整单元,每代训练开始前,根据上一代学习率和学习率调整数据,运算后得出用于本代学习率;
    运算单元,根据梯度向量、本代学习率、激活函数的导数和上一代权值计算本代权值。
  2. 根据权利要求1所述的装置,其特征在于,所述运算单元包括主运算单元、互联单元和多个从运算单元,所述梯度向量包括输入梯度向量和输出梯度向量,其中:
    主运算单元,用于在每一层的计算过程中,利用本层的输出梯度向量完成后续计算;
    互联单元,用于在每层神经网络反向训练开始计算的阶段,主运算单元通过互联单元向所有的从运算单元传输本层的输入梯度向量,在从运算单元的计算过程完成后,互联单元逐级将各从运算单元的输出梯度向量部分和两两相加得到本层的输出梯度向量;
    多个从运算单元,利用相同的输入梯度向量和各自的权值数据,并行地计算出相应的输出梯度向量部分和。
  3. 根据权利要求1所述的装置,其特征在于,所述存储单元为片上缓存。
  4. 根据权利要求1所述的装置,其特征在于,所述指令为SIMD指令。
  5. 根据权利要求1所述的装置,其特征在于,所述学习率调整数据包括权值变化量和误差函数。
  6. 一种人工神经网络反向训练方法,包括步骤:
    S1:每代训练开始前,根据上一代学习率和学习率调整数据,计算得到用于本代训练的学习率;
    S2:训练开始,依据本代训练的学习率,逐层更新权值;
    S3:所有权值更新完毕后,计算本代网络的学习率调整数据,进行存储;
    S4:判断神经网络是否收敛,如果是,运算结束,否则,转步骤S1。
  7. 根据权利要求6所述的方法,其特征在于,步骤S2包括:
    S21:对于网络的每一层,输入梯度向量进行加权求和计算出本层的输出梯度向量,其中加权求和的权重为本层待更新的权值;
    S22:本层的输出梯度向量乘以下一层在正向运算时的激活函数的导数值得到下一层的输入梯度向量;
    S23:将输入梯度向量与正向运算时的输入神经元对位相乘得到本层权值的梯度;
    S24:根据所得到的本层权值的梯度和学习率来更新本层的权值;
    S25:判断是否所有层更新完毕,如果是,进入步骤S3;否则,转步骤S21。
  8. 根据权利要求6所述的方法,其特征在于,本代训练时,权值采用非统一学习率。
  9. 根据权利要求6所述的方法,其特征在于,本代训练时,权值采用统一学习率。
PCT/CN2016/110751 2016-12-19 2016-12-19 人工神经网络反向训练装置和方法 WO2018112699A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/110751 WO2018112699A1 (zh) 2016-12-19 2016-12-19 人工神经网络反向训练装置和方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/110751 WO2018112699A1 (zh) 2016-12-19 2016-12-19 人工神经网络反向训练装置和方法

Publications (1)

Publication Number Publication Date
WO2018112699A1 true WO2018112699A1 (zh) 2018-06-28

Family

ID=62624197

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/110751 WO2018112699A1 (zh) 2016-12-19 2016-12-19 人工神经网络反向训练装置和方法

Country Status (1)

Country Link
WO (1) WO2018112699A1 (zh)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782017A (zh) * 2019-10-25 2020-02-11 北京百度网讯科技有限公司 用于自适应调整学习率的方法和装置
CN111222632A (zh) * 2018-11-27 2020-06-02 中科寒武纪科技股份有限公司 计算装置、计算方法及相关产品
CN111368987A (zh) * 2018-12-25 2020-07-03 上海寒武纪信息科技有限公司 一种神经网络计算装置和方法
CN111368985A (zh) * 2018-12-25 2020-07-03 上海寒武纪信息科技有限公司 一种神经网络计算装置和方法
CN111368990A (zh) * 2018-12-25 2020-07-03 上海寒武纪信息科技有限公司 一种神经网络计算装置和方法
CN111723834A (zh) * 2019-03-21 2020-09-29 杭州海康威视数字技术股份有限公司 语音深度学习训练方法及装置
CN111814965A (zh) * 2020-08-14 2020-10-23 Oppo广东移动通信有限公司 超参数调整方法、装置、设备及存储介质
CN112052939A (zh) * 2020-08-19 2020-12-08 国网山西省电力公司 一种基于神经网络算法的主动预警系统
CN112446485A (zh) * 2019-08-31 2021-03-05 安徽寒武纪信息科技有限公司 一种神经网络协同训练方法、装置以及相关产品
CN112907552A (zh) * 2021-03-09 2021-06-04 百度在线网络技术(北京)有限公司 图像处理模型的鲁棒性检测方法、设备及程序产品
US11934337B2 (en) 2019-08-31 2024-03-19 Anhui Cambricon Information Technology Co., Ltd. Chip and multi-chip system as well as electronic device and data transmission method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184366A (zh) * 2015-09-15 2015-12-23 中国科学院计算技术研究所 一种时分复用的通用神经网络处理器
CN105512723A (zh) * 2016-01-20 2016-04-20 南京艾溪信息科技有限公司 一种用于稀疏连接的人工神经网络计算装置和方法
CN105892989A (zh) * 2016-03-28 2016-08-24 中国科学院计算技术研究所 一种神经网络加速器及其运算方法
CN106022468A (zh) * 2016-05-17 2016-10-12 成都启英泰伦科技有限公司 人工神经网络处理器集成电路及该集成电路的设计方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184366A (zh) * 2015-09-15 2015-12-23 中国科学院计算技术研究所 一种时分复用的通用神经网络处理器
CN105512723A (zh) * 2016-01-20 2016-04-20 南京艾溪信息科技有限公司 一种用于稀疏连接的人工神经网络计算装置和方法
CN105892989A (zh) * 2016-03-28 2016-08-24 中国科学院计算技术研究所 一种神经网络加速器及其运算方法
CN106022468A (zh) * 2016-05-17 2016-10-12 成都启英泰伦科技有限公司 人工神经网络处理器集成电路及该集成电路的设计方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GAO: "Optimal Methods of Learning Rate for BP Neutral Network", JOURNAL OF CHANGCHUN TEACHERS COLLEGE (NATURAL SCIENCE), no. 2, 30 April 2010 (2010-04-30), pages 29 - 30 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222632A (zh) * 2018-11-27 2020-06-02 中科寒武纪科技股份有限公司 计算装置、计算方法及相关产品
CN111368987B (zh) * 2018-12-25 2023-03-24 上海寒武纪信息科技有限公司 一种神经网络计算装置和方法
CN111368990B (zh) * 2018-12-25 2023-03-07 上海寒武纪信息科技有限公司 一种神经网络计算装置和方法
CN111368985B (zh) * 2018-12-25 2023-11-28 上海寒武纪信息科技有限公司 一种神经网络计算装置和方法
CN111368990A (zh) * 2018-12-25 2020-07-03 上海寒武纪信息科技有限公司 一种神经网络计算装置和方法
CN111368987A (zh) * 2018-12-25 2020-07-03 上海寒武纪信息科技有限公司 一种神经网络计算装置和方法
CN111368985A (zh) * 2018-12-25 2020-07-03 上海寒武纪信息科技有限公司 一种神经网络计算装置和方法
CN111723834A (zh) * 2019-03-21 2020-09-29 杭州海康威视数字技术股份有限公司 语音深度学习训练方法及装置
CN111723834B (zh) * 2019-03-21 2024-01-26 杭州海康威视数字技术股份有限公司 语音深度学习训练方法及装置
CN112446485A (zh) * 2019-08-31 2021-03-05 安徽寒武纪信息科技有限公司 一种神经网络协同训练方法、装置以及相关产品
CN112446485B (zh) * 2019-08-31 2023-06-02 安徽寒武纪信息科技有限公司 一种神经网络协同训练方法、装置以及相关产品
US11934337B2 (en) 2019-08-31 2024-03-19 Anhui Cambricon Information Technology Co., Ltd. Chip and multi-chip system as well as electronic device and data transmission method
CN110782017B (zh) * 2019-10-25 2022-11-22 北京百度网讯科技有限公司 用于自适应调整学习率的方法和装置
CN110782017A (zh) * 2019-10-25 2020-02-11 北京百度网讯科技有限公司 用于自适应调整学习率的方法和装置
CN111814965A (zh) * 2020-08-14 2020-10-23 Oppo广东移动通信有限公司 超参数调整方法、装置、设备及存储介质
CN112052939A (zh) * 2020-08-19 2020-12-08 国网山西省电力公司 一种基于神经网络算法的主动预警系统
CN112907552A (zh) * 2021-03-09 2021-06-04 百度在线网络技术(北京)有限公司 图像处理模型的鲁棒性检测方法、设备及程序产品
CN112907552B (zh) * 2021-03-09 2024-03-01 百度在线网络技术(北京)有限公司 图像处理模型的鲁棒性检测方法、设备及程序产品

Similar Documents

Publication Publication Date Title
WO2018112699A1 (zh) 人工神经网络反向训练装置和方法
US11568258B2 (en) Operation method
US20200111007A1 (en) Apparatus and methods for training in convolutional neural networks
JP6635265B2 (ja) 予測装置、予測方法および予測プログラム
US20190065958A1 (en) Apparatus and Methods for Training in Fully Connected Layers of Convolutional Networks
KR102331978B1 (ko) 인공 신경망 정방향 연산 실행용 장치와 방법
US20180260710A1 (en) Calculating device and method for a sparsely connected artificial neural network
KR102410820B1 (ko) 뉴럴 네트워크를 이용한 인식 방법 및 장치 및 상기 뉴럴 네트워크를 트레이닝하는 방법 및 장치
WO2020173237A1 (zh) 一种类脑计算芯片及计算设备
CN111788585B (zh) 一种深度学习模型的训练方法、系统
US20190073591A1 (en) Execution of a genetic algorithm having variable epoch size with selective execution of a training algorithm
WO2017185347A1 (zh) 用于执行循环神经网络和lstm运算的装置和方法
KR102152615B1 (ko) 활성화 함수를 사용하는 딥러닝 모델의 안정적인 학습을 위한 가중치 초기화 방법 및 장치
Chen et al. Neural-network based adaptive self-triggered consensus of nonlinear multi-agent systems with sensor saturation
US11775832B2 (en) Device and method for artificial neural network operation
CN108205706B (zh) 人工神经网络反向训练装置和方法
CN108009635A (zh) 一种支持增量更新的深度卷积计算模型
WO2017185248A1 (zh) 用于执行人工神经网络自学习运算的装置和方法
WO2022142179A1 (zh) 业务任务执行方法、装置以及计算机可读存储介质
US20190130274A1 (en) Apparatus and methods for backward propagation in neural networks supporting discrete data
US11915141B2 (en) Apparatus and method for training deep neural network using error propagation, weight gradient updating, and feed-forward processing
WO2020195940A1 (ja) ニューラルネットワークのモデル縮約装置
CN110610231A (zh) 一种信息处理方法、电子设备和存储介质
Xue et al. An improved extreme learning machine based on variable-length particle swarm optimization
KR20200097103A (ko) 딥러닝 알고리즘을 위한 활성화 함수를 실행하는 방법, 및 상기 방법을 실행하는 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16924660

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16924660

Country of ref document: EP

Kind code of ref document: A1