CN108205706A

CN108205706A - Artificial neural network reverse train device and method

Info

Publication number: CN108205706A
Application number: CN201611180607.8A
Authority: CN
Inventors: 陈云霁; 郝帆; 郝一帆; 刘少礼; 陈天石
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2016-12-19
Filing date: 2016-12-19
Publication date: 2018-06-26
Anticipated expiration: 2036-12-19
Also published as: CN108205706B

Abstract

The present invention provides a kind of artificial neural network reverse train device and method, wherein device includes controller unit, storage unit, learning rate adjustment unit and arithmetic element, storage unit adjusts data for storing Neural Network Data, including instruction, weights, the derivative of activation primitive, learning rate, gradient vector and learning rate；Controller unit, for reading instruction from storage unit, and by Instruction decoding into control storage unit, the microcommand of learning rate adjustment unit and arithmetic element behavior；Learning rate adjustment unit before per generation training starts, adjusts data according to previous generation learning rates and learning rate, is obtained after operation for this generation learning rate；Arithmetic element, according to gradient vector, this generation learning rate, the derivative of activation primitive and previous generation weight computing this generation weights.Apparatus and method of the present invention causes trained iterative process more to stablize, and reduce neural metwork training to the required time is stablized, and improves training effectiveness.

Description

Artificial neural network reverse train device and method

Technical field

The present invention relates to artificial neural network, more particularly to a kind of artificial neural network reverse train device, Yi Jiyi Kind artificial neural network reverse train method.

Background technology

Artificial neural network (Artificial Neural Networks, ANNs) is referred to as neural network (NNs), it is A kind of imitation animal nerve network behavior feature carries out the algorithm mathematics model of distributed parallel information processing.This network according to By the complexity of system, by adjusting the interconnected relationship between internal great deal of nodes, so as to reach the mesh of processing information 's.The algorithm that neural network is used is exactly vector multiplication, and widely used sign function and its various is approached.

A kind of known method for supporting multi-layer artificial neural network reverse train is to use general processor.This method lacks One of point is that the operational performance of single general processor is relatively low, can not meet the performance of common multi-layer artificial neural network operation Demand.And multiple general processors, when performing parallel, the intercommunication of general processor becomes performance bottleneck again.In addition, General processor needs the reversed operation of multi-layer artificial neural network to be decoded into a queue of operation and access instruction sequence, processor Front end decoding brings larger power dissipation overhead.

Another kind supports that the known method of multi-layer artificial neural network reverse train is to use graphics processor (GPU).GPU Only smaller on piece caching, model data (weights) needs of multi-layer artificial neural network are carried outside piece repeatedly, and piece is in addition Width becomes main performance bottleneck, while brings huge power dissipation overhead.

Invention content

(1) technical problems to be solved

The object of the present invention is to provide a kind of dresses for the artificial neural network reverse train for supporting adaptivity learning rate Put and method, solve in prior art described above at least one of technical problem.

(2) technical solution

According to an aspect of the present invention, a kind of artificial neural network reverse train device is provided, including controller unit, is deposited Storage unit, learning rate adjustment unit and arithmetic element, wherein,

Storage unit, for storing Neural Network Data, including instruction, weights, the derivative of activation primitive, learning rate, ladder Spend vector sum learning rate adjustment data；

Controller unit, for from storage unit read instruction, and by Instruction decoding into control storage unit, learning rate Adjustment unit and the microcommand of arithmetic element behavior；

Learning rate adjustment unit before per generation training starts, adjusts data, after operation according to previous generation learning rates and learning rate Obtain the learning rate trained for this generation；

Arithmetic element, according to gradient vector, this generation learning rate, the derivative of activation primitive and previous generation weight computings this Dai Quan Value.

Further, the arithmetic element includes main arithmetic element, interconnection unit and multiple from arithmetic element, the gradient Vector includes input gradient vector sum and exports gradient vector, wherein：Main arithmetic element, in each layer of calculating process, Follow-up calculate is completed using the output gradient vector of this layer；Interconnection unit, based on starting in every layer of neural network reverse train The stage of calculation, main arithmetic element by interconnection unit to it is all from arithmetic element transmit this layer input gradients vector, from After the completion of the calculating process of arithmetic element, interconnection unit step by step will be respectively from the output gradient vector part of arithmetic element and two two-phases Add to obtain the output gradient vector of this layer；It is multiple from arithmetic element, utilize the identical respective weights number of input gradient vector sum According to, concurrently calculate corresponding output gradient vector part and.

Further, the storage unit is cached on piece.

Further, described instruction is SIMD instruction.

Further, the learning rate adjustment data include weights variable quantity and error function.

According to another aspect of the present invention, a kind of artificial neural network reverse train method is provided, including step：

S1：Before per generation training starts, data are adjusted according to previous generation learning rates and learning rate, is calculated and is instructed for this generation Experienced learning rate；

S2：Training starts, and according to the learning rate of this generation training, successively updates weights；

S3：After all right value updates, the learning rate adjustment data of this generation network are calculated, are stored；

S4：Judge whether neural network restrains, if so, operation terminates, otherwise, go to step S1.

Further, step S2 includes：

S21：For each layer of network, input gradient vector be weighted read group total go out the output gradient of this layer to Amount, the wherein weight of weighted sum are this layer of weights to be updated；

S22：This layer of output gradient vector be multiplied by activation primitive of next layer in forward operation derivative be worth under One layer of input gradient vector；

S23：Input gradient vector is multiplied to obtain the gradient of this layer of weights with input neuron contraposition during forward operation；

S24：The weights of this layer are updated according to the gradient of obtained layer weights and learning rate；

S25：Judge whether that all layers of update finish, if so, entering step S3；Otherwise, S21 is gone to step.

Further, when this generation trains, weights are using non-unified learning rate.

Further, when this generation trains, weights are using unified learning rate.

(3) advantageous effect

(1) by setting learning rate adjustment unit, using adaptivity learning rate training network, more appropriate determines Generated weights variable quantity in each circuit training, not only so that training iterative process is more stablized, but also reduce nerve Network training improves training effectiveness to the required time is stablized；

(2) by using the special on piece caching for multi-layer artificial neural network mathematical algorithm, input has fully been excavated The reusability of neuron and weight data avoids and reads these data to memory repeatedly, reduce EMS memory access bandwidth, avoids Memory bandwidth becomes the problem of multi-layer artificial neural network operation and its training algorithm performance bottleneck.

(3) arithmetic element for instructing and customizing by using the special SIM D for multi-layer artificial neural network operation, solution The problem of CPU and GPU operational performances of having determined are insufficient, and front end decoding overheads are big effectively increases and multi-layer artificial neural network is transported Calculate the support of algorithm.

Description of the drawings

Fig. 1 is the overall structure example block diagram of artificial neural network reverse train device according to an embodiment of the invention；

Fig. 2 is the structure diagram of interconnection unit in the artificial neural network reverse train device in Fig. 1；

Fig. 3 is that artificial neural network according to an embodiment of the invention reversely adjusts process schematic；

Fig. 4 is that use artificial neural network according to an embodiment of the invention reversely adjusts process schematic；

Fig. 5 is the operational flowchart according to an embodiment of the invention using artificial neural network reverse train method.

Fig. 6 is the operational flowchart according to another embodiment of the present invention using artificial neural network reverse train method.

Specific embodiment

The training method that traditional artificial neural network uses is back-propagation algorithm, two instead of between the variable quantities of weights be Error function is multiplied by the gradient of weights one constant, this constant is known as learning rate.Learning rate is determined in each circuit training Generated weights variable quantity.Value is too small, and effective update of weights is too small in each iteration, and small learning rate causes longer Training time, convergent speed are fairly slow；Value is excessive, and iterative process can be vibrated so that dissipating.The artificial neural network of the present invention Network reverse train device, is provided with learning rate adjustment unit, before in per generation, training starts, according to previous generation learning rates and Habit rate adjusts data, is obtained after operation for this generation learning rate.More appropriate determining is generated in each circuit training Weights variable quantity so that training iterative process is more stablized, and reduces neural metwork training to stablizing required time, training for promotion Efficiency.

To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific embodiment, and reference Attached drawing, the present invention is described in further detail.

Fig. 1 is a kind of overall structure example frame of artificial neural network reverse train device according to an embodiment of the invention Figure.An embodiment of the present invention provides a kind of device for the artificial neural network reverse train for supporting adaptivity learning rate, including：

Storage unit A, for storing Neural Network Data, including instruction, weights, the derivative of activation primitive, learning rate, ladder Degree vector (may include that input gradient vector sum exports gradient vector) and learning rate adjustment data (may include network error value, take It is worth variable quantity etc.)；The storage unit can be that on piece caches, and avoid and read these data and memory band to memory repeatedly Width becomes multi-layer artificial neural network operation and its training algorithm performance bottleneck.

Controller unit B, for from storage unit A read instruction, and by Instruction decoding into control storage unit, study Rate adjustment unit and the microcommand of arithmetic element behavior；

Can be SIMD instruction, by using needle for storage unit A and controller unit B accesses and the instruction read The special SIM D of multi-layer artificial neural network operation is instructed, solves existing CPU and GPU operational performances deficiency, front end decoding is opened Sell the problem of big.

Before per generation training starts, data, operation are adjusted according to previous generation learning rates and learning rate by learning rate adjustment unit E After obtain for this generation learning rate；

Arithmetic element (D, C, F), according to gradient vector, this generation learning rate, the derivative of activation primitive and previous generation weights meters Calculate this generation weights.

Wherein, for storage unit A, for storing, to include instruction and the input of storage neuron, weights, neuron defeated Go out, learning rate, weights variable quantity, activation primitive derivative, each layer gradient vector etc. Neural Network Data；

It for controller unit B, is used to read instruction from storage unit A, and the Instruction decoding is each into controlling The microcommand of unit behavior；

For arithmetic element, main arithmetic element C, interconnection unit D and multiple from arithmetic element F can be included.

Interconnecting unit D is used to connect main computing module and from computing module, can be implemented as different interconnection topologies (as set Shape structure, cyclic structure, fenestral fabric, classification interconnection, bus structures etc.).

Wherein, interconnection unit D, for the stage for starting to calculate in every layer of neural network reverse train, main arithmetic element C leads to Interconnection unit D is crossed to all input gradient vectors that this layer is transmitted from arithmetic element F, in the calculating process from arithmetic element F After the completion, interconnection unit D step by step by respectively from the output gradient vector part of arithmetic element F be added to obtain the output of this layer two-by-two Gradient vector.

Main arithmetic element C, in each layer of calculating process, follow-up meter to be completed using the output gradient vector of this layer It calculates；

It is multiple from arithmetic element F, using the identical respective weight data of input gradient vector sum, concurrently calculate phase The output gradient vector part answered and；

For learning rate adjustment unit E, before in per generation, training starts, according to the learning rate, weights, network of previous generation The information such as error amount, weights variable quantity (these information are previously stored in the memory unit, can be called), use is obtained after operation In the learning rate of this generation training.

Fig. 2 diagrammatically illustrates a kind of embodiment of interconnecting unit 4：Interconnection architecture.Interconnection unit D forms main operation list Data path between first C and multiple F from arithmetic element, and the structure with interconnection type.Interconnection includes multiple nodes, multiple Node forms binary tree access, i.e., there are one father's (parent) node and 2 son (child) nodes for each node.Each section The data of upstream are similarly issued two child nodes in downstream by point by father node, the number that two child nodes in downstream are returned According to merging, and return to the father node of upstream.

For example, in the reversed calculating process of neural network, the vector that two, downstream node returns can be added in present node Into a vector and return to upstream node.It is defeated in main arithmetic element C in the stage that every layer of artificial neural network starts to calculate Enter gradient to be sent to respectively from arithmetic element F by interconnection unit D；After the completion of the calculating process from arithmetic element F, each from fortune Calculate unit F output output gradient vector part and can be two-by-two added step by step in interconnection unit D, i.e., to it is all export gradients to Amount part and summation, as final output gradient vector.

In learning rate adjustment unit E, the fortune that is carried out wherein according to the difference of adaptivity learning rate method of adjustment, data It calculates also different.

First, in the back-propagation algorithm of standard：

W (k+1)=w (k)-η g (w (k)) (1)

In formula (1), w (k) is current training weights, i.e. this generation weights, and w (k+1) is next-generation weights, and η is fixed Learning rate, is a pre-determined constant, and g (w) is gradient vector.

Here, we allow learning rate to carry out the update by generation as other network parameters.The method of regularized learning algorithm rate It is：When training error increases, reduce learning rate；When training error reduces, increase learning rate.It is given below several specific The regular example of adaptivity learning rate adjustment, but it is not limited only to these types.

Method one：

In formula (2), η (k) is this generation learning rate, and η (k+1) is next-generation learning rate, and Δ E=E (k)-E (k-1) is error The variable quantity of function E, a ＞ 0, b ＞ 0, a, b are appropriate constant.

Method two：

η (k+1)=η (k) (1- Δ E) (3)

In formula (3), η (k) is this generation learning rate, and η (k+1) is next-generation learning rate, and Δ E=E (k)-E (k-1) is error The variable quantity of function E.

Method three：

In formula (4), η (k) is this generation learning rate, and η (k+1) is next-generation learning rate, and Δ E=E (k)-E (k-1) is error The variable quantity of function E, a ＞ 1,0 ＜ b ＜ 1, c ＞ 0, a, b, c is appropriate constant.

Method four：

In formula (5), η (k) is this generation learning rate, and η (k+1) is next-generation learning rate, and Δ E=E (k)-E (k-1) is error The variable quantity of function E, 0 ＜ a ＜ 1, b ＞ 1,0 ＜ α ＜ 1, a, b, α are appropriate constant,

More than learning rate η in four kinds of methods, can be general to all weights, i.e. each layer of each weights exist It is same learning rate during the training of every generation, we remember that this method is unified adaptivity learning rate training method； It may not be general, i.e., to each weights using different learning rates, we remember that this method is respective adaptive sexology Habit rate training method.Respective adaptivity learning rate training method can further improve training precision, reduce the training time.

It is more clear to compare, the schematic diagram of two methods, unified adaptivity learning rate instruction is set forth in we Practice method and respective adaptivity learning rate training method difference corresponding diagram 3 and Fig. 4.

In Fig. 3, the connection weight w between output layer P and hidden layer J_jp1, w_jp2..., w_jpnIn reversed adjust, uniformly adopt It is adjusted with learning rate η；In Fig. 4, the connection weight w between output layer P and hidden layer J_jp1, w_jp2..., w_jpnReversely adjusting During section, learning rate η is respectively adopted₁, η₂..., η_nIt is adjusted.Otherness between different nodes is reversely adjusted, can be with maximum limit The adaptive ability of learning rate is transferred on degree ground, farthest meets changeable requirement of the weight in study.

As for the method for adjustment of respective adaptivity learning rate, after the initial value for taking each learning rate, Ge Gexue The iteration update of habit rate still can equally be not limited only to these four according to method one to method four.Learning rate η in this up-to-date style It is the respective learning rate corresponding to each weights.

Based on same inventive concept, the present invention also provides a kind of artificial neural network reverse train method, operation flows Figure is as shown in figure 5, including step：

S4：Judge whether neural network restrains, if so, operation terminates, otherwise, go to step S1

For step S1, before per generation training starts, learning rate adjustment unit E calls storage unit A learning rate adjustment number Regularized learning algorithm rate according to this obtains the learning rate trained for this generation.

For step S2：Hereafter this generation training starts, and according to the learning rate of this generation training, successively updates weights and successively updates Weights.Step S2 can include following sub-step (shown in Figure 6)：

Step S21 for each layer, first, is weighted read group total to input gradient vector and goes out the defeated of this layer Go out gradient vector, the wherein weight of weighted sum is this layer of weights to be updated, this process is by main arithmetic element C, mutual receipts or other documents in duplicate It first D and is respectively completed jointly from arithmetic element F；

In step S22, main arithmetic element C, which is multiplied by activation primitive of next layer in forward operation Derivative value can obtain next layer input gradient vector；

In step S23, main arithmetic element C, input gradient vector is multiplied with input neuron contraposition during forward operation Obtain the gradient of this layer of weights；

Step S24 finally, in main arithmetic element C, updates this according to the gradient of obtained layer weights and learning rate The weights of layer；

Step S25：Judge whether that all layers of weights all update to finish, if so, carrying out step S3, otherwise, go to step S21。

For step S3, after all right value updates, main arithmetic element C calculate this generation network error etc. for adjust learning Other data of habit rate, and storage unit A is put into, the training of this generation terminates.

Step S4：Judge whether network restrains, if so, operation terminates, otherwise, go to step S1.

Weights are refused herein using non-unified learning rate or unified learning rate, specific introduction with reference to content described above It repeats.

Particular embodiments described above has carried out the purpose of the present invention, technical solution and advantageous effect further in detail Describe in detail bright, it should be understood that the above is only a specific embodiment of the present invention, is not intended to restrict the invention, it is all Within the spirit and principles in the present invention, any modification, equivalent substitution, improvement and etc. done should be included in the protection of the present invention Within the scope of.

Claims

1. a kind of artificial neural network reverse train device, including controller unit, storage unit, learning rate adjustment unit and fortune Unit is calculated, wherein,

Storage unit, for storing Neural Network Data, Neural Network Data includes instruction, weights, the derivative of activation primitive, Habit rate, gradient vector and learning rate adjustment data；

Controller unit for reading instruction from storage unit, and Instruction decoding is adjusted into control storage unit, learning rate Unit and the microcommand of arithmetic element behavior；

Learning rate adjustment unit before per generation training starts, adjusts data according to previous generation learning rates and learning rate, is obtained after operation For this generation learning rate；

Arithmetic element, according to gradient vector, this generation learning rate, the derivative of activation primitive and previous generation weight computing this generation weights.

2. the apparatus according to claim 1, which is characterized in that the arithmetic element include main arithmetic element and it is multiple from Arithmetic element, the gradient vector include input gradient vector sum and export gradient vector, wherein：

Main arithmetic element, in each layer of calculating process, follow-up calculate to be completed using the output gradient vector of this layer；

Interconnection unit, for the stage for starting to calculate in every layer of neural network reverse train, main arithmetic element passes through interconnection unit To all input gradient vectors that this layer is transmitted from arithmetic element, after the completion of the calculating process from arithmetic element, mutual receipts or other documents in duplicate Member step by step by respectively from the output gradient vector part of arithmetic element be added to obtain the output gradient vector of this layer two-by-two；

It is multiple from arithmetic element, using the identical respective weight data of input gradient vector sum, concurrently calculate corresponding Export gradient vector part and.

3. the apparatus according to claim 1, which is characterized in that the storage unit is cached on piece.

4. the apparatus according to claim 1, which is characterized in that described instruction is SIMD instruction.

5. the apparatus according to claim 1, which is characterized in that the learning rate adjustment data include weights variable quantity and mistake Difference function.

6. a kind of artificial neural network reverse train method, including step：

S1：Before per generation training starts, data are adjusted according to previous generation learning rates and learning rate, are calculated what is trained for this generation Learning rate；

7. according to the method described in claim 6, it is characterized in that, step S2 includes：

S21：For each layer of network, input gradient vector is weighted the output gradient vector that read group total goes out this layer, The weight of middle weighted sum is this layer of weights to be updated；

S22：The derivative that this layer of output gradient vector is multiplied by activation primitive of next layer in forward operation is worth to next layer Input gradient vector；

8. according to the method described in claim 6, it is characterized in that, when this generation trains, weights are using non-unified learning rate.

9. according to the method described in claim 6, it is characterized in that, when this generation trains, weights are using unified learning rate.