CN110880037A

CN110880037A - Neural network operation module and method

Info

Publication number: CN110880037A
Application number: CN201811040961.XA
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2018-09-06
Filing date: 2018-09-06
Publication date: 2020-03-13

Abstract

The invention discloses a neural network operation module, which comprises a storage unit, a neural network unit and a neural network unit, wherein the neural network unit is used for acquiring the input neuron precision, the weight precision and the output neuron gradient precision of an L-th layer from the storage unit; acquiring gradient updating precision T according to the input neuron precision, the weight precision and the output neuron gradient precision; when the gradient updating precision T is larger than the preset precision T_rAdjusting the precision of an input neuron, the precision of weight and the precision of gradient of an output neuron; and an operation unit for representing the output neurons and weights of the L-th layer according to the increased input neuron precision and weight precision and representing the gradient of the L-th layer output neurons obtained by operation according to the increased output neuron gradient precision so as to perform subsequent operation. By adopting the embodiment of the invention, the operation requirement can be met, the error of the operation result and the operation overhead are reduced, and the operation resource is saved.

Description

Neural network operation module and method

Technical Field

The invention relates to the field of neural networks, in particular to a neural network operation module and a method.

Background

The fixed point number is a data format capable of specifying the position of a decimal point, and we usually use bit width to represent the data length of one fixed point number. For example, the bit width of a 16-bit fixed point number is 16. For a given number of fixed points of bit width, the precision of the representable data and the range of numbers that can be represented are traded off against each other, the greater the precision that can be represented, the smaller the range of numbers that can be represented. As shown in FIG. 1a, for a fixed point data format with bitnum bit widthThe first bit is sign bit, the integer part occupies x bit, the decimal part occupies S bit, the maximum fixed point precision S that the fixed point data format can represent is 2^-s. The fixed point data format may represent a range of [ neg, pos [ ]]Wherein pos is (2)^bitnum-1-1)*2^-s，neg＝-(2^bitnum-1)*2^-s。

In the neural network operation, data can be expressed and operated by using a fixed point data format. For example, during forward operation, the data of layer L includes input neuron X^(l)And output neuron Y^(l)Weight W^(l). During the inverse operation, the data of the L < th > layer includes input neuron gradients

Output neuron gradient

Gradient of weight

The above data may be expressed by fixed-point numbers, or may be calculated by fixed-point numbers.

The training process of the neural network generally comprises two steps of forward operation and reverse operation, and during the reverse operation, the precision required by the input neuron gradient, the weight gradient and the output neuron gradient may change, which may increase along with the training process, and if the precision of the fixed point number is not enough, a larger error may occur in the operation result, and even training failure may occur.

Disclosure of Invention

The technical problem to be solved by the embodiments of the present invention is that in the neural network operation process, the accuracy of input neurons, the accuracy of weight, or the accuracy of output neuron gradient is not sufficient, which causes errors in the operation or training result.

In a first aspect, the present invention provides a neural network operation module, configured to perform operations on a multilayer neural network, including:

the storage unit is used for storing the input neuron precision, the weight precision and the output neuron gradient precision;

a controller unit for obtaining the input neuron precision S of the L-th layer of the multilayer neural network from the storage unit_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

Wherein L is an integer greater than 0; according to the input neuron precision S_x(l)The weight accuracy S_w(l)And the output neuron gradient precision

Obtaining gradient updating precision T; when the gradient updating precision T is larger than the preset precision T_rAdjusting the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

So as to update the gradient with the predetermined precision T_rThe absolute value of the difference of (a) is minimum;

an arithmetic unit for adjusting the input neuron precision S_x(l)Sum weight precision S_w(l)To represent output neurons and weights of the L-th layer according to the adjusted output neuron gradient precision

To express the gradient of the output neuron of the L-th layer obtained by the operation so as to carry out the subsequent operation.

In a possible embodiment, the controller unit is adapted to determine the input neuron precision S_x(l)The weight accuracy S_w(l)And the output neuron gradient precision

Obtaining the gradient update precision T, specifically comprising:

the controller unit is used for carrying out precision S on the input neuron according to a preset formula_x(l)The weight accuracy S_w(l)And the output neuron gradient precision

Calculating to obtain the gradient updating precision T;

wherein the first preset formula is as follows:

in a possible embodiment, the controller unit adjusts the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The method comprises the following steps:

the controller unit maintains the input neuron precision S_x(l)And the weight precision S_w(l)Unchanged, reduced gradient precision of the output neuron

In a possible embodiment, the controller unit reduces the output neuron gradient accuracy

And increasing the bit width of the fixed point data format representing the output neuron gradient.

In a possible embodiment, the controller unit increases the output neuron gradient precision

Thereafter, the controller unit is further configured to:

judging whether the output neuron gradient overflows when in a fixed point data format for representing the output neuron gradient;

when overflow is determined, increasing a bit width of a fixed-point data format representing the output neuron gradient.

In one possible embodiment, the controller unit increases a bit width of a fixed-point data format representing the output neuron gradient, including:

the controller unit increases the bit width of the fixed point data format representing the gradient of the output neuron according to a first preset step length N1;

the first preset step N1 is 1, 2, 4, 6, 7, 8 or other positive integer.

the controller unit increases the bit width of the fixed-point data format representing the gradient of the output neuron in a 2-fold increasing manner.

In a possible embodiment, the controller unit is further configured to:

obtaining the preset precision T according to a machine learning method_rOr alternatively;

obtaining the preset precision T according to the number of output neurons of the L-1 layer, the learning rate and the number of samples in batch processing_r(ii) a And the more the number of the L-1 layer output neurons and the number of samples in batch processing and the higher the learning rate are, the preset precision T is_rThe larger.

In a second aspect, an embodiment of the present invention provides a neural network operation module, where the neural network operation module is configured to perform operations on a multilayer neural network, and includes:

a storage unit for storing output neuron gradients of the multilayer neural network;

a controller unit for acquiring an input neuron gradient of an Lth layer of the multilayer neural network from the storage unit; l is an integer greater than 0; obtaining the L < th > layerThe number n1 of output neuron gradients with absolute values smaller than a first preset threshold value in the output neuron gradients; acquiring proportional data a according to the number n1 and the number n2 of the L-th layer output neuron gradients, wherein a is n1/n 2; when the proportion data a is larger than a second preset threshold value, reducing the gradient precision of the L-th layer output neuron

An arithmetic unit for calculating gradient accuracy of the output neuron according to the reduced gradient

And representing the L-th output neuron gradient for subsequent operation.

In one possible embodiment, the controller unit increases the L < th > layer output neuron gradient precision

And increasing the bit width of the fixed point data format representing the L-th layer output neuron gradient.

In one possible embodiment, the controller unit reduces the L < th > layer output neuron gradient precision

Thereafter, the controller unit is further configured to:

judging whether overflow occurs when the L-th layer output neuron gradient is in a fixed point data format for representing the L-th layer output neuron gradient;

when overflow is determined, increasing bit width of a fixed point data format representing gradient of the L-th layer output neuron.

In one possible embodiment, the increasing the bit width of the fixed-point data format representing the L-th layer output neuron gradient includes:

and the controller unit increases the bit width of the fixed point data format representing the L-th layer output neuron gradient according to a second preset step length N2.

In one possible embodiment, the controller unit increasing a bit width of the fixed-point data format representing the L-th layer output neuron gradient comprises:

and controlling the controller unit to increase the bit width of the fixed point data format representing the L-th layer output neuron gradient in a 2-time incremental mode.

In a third aspect, an embodiment of the present invention provides a neural network operation method, including:

obtaining L-th layer input neuron precision S of neural network_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

According to the input neuron precision S_x(l)The weight accuracy S_w(l)And the output neuron gradient precision

Calculating to obtain gradient updating precision T;

when the gradient updating precision T is larger than the preset precision T_rTime, adjust input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient

according to the adjusted input neuron precision S_x(l)Sum weight precision S_w(l)To represent output neurons and weights for layer L; according to the gradient precision of the adjusted output neuron

In one possible embodiment, said determining the input neuron precision S_x(l)The weight accuracy S_w(l)And the output neuron gradient precision

Calculating to obtain gradient updating precision T, comprising:

according to a preset formula, the precision S of the input neuron_x(l)The weight accuracy S_w(l)And the output neuron gradient precision

Calculating to obtain the gradient updating precision T;

wherein the preset formula is as follows:

in one possible embodiment, the adjusting the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The method comprises the following steps:

maintaining the input neuron precision S_x(l)And the weight precision S_w(l)Unchanged, reduced gradient precision of the output neuron

In one possible embodiment, the reducing the output neuron gradient precision

Increasing bit width of fixed point data format representing gradient of output neuron

In one possible embodiment, the reducing the output neuron gradient precision

Thereafter, the method further comprises:

In one possible embodiment, the increasing the bit width of the fixed-point data format representing the output neuron gradient comprises:

increasing the bit width of the fixed point data format representing the gradient of the output neuron according to a first preset step length N1;

the first preset step N1 is 1, 2, 4, 6, 7, 8 or other positive integer.

and increasing the bit width of the fixed point data format representing the output neuron gradient in a 2-time increment mode.

In a possible embodiment, the method further comprises:

In a fourth aspect, an embodiment of the present invention provides a neural network operation method, including:

obtaining an input neuron gradient of an L-th layer of the multilayer neural network, wherein L is an integer greater than 0;

acquiring the number n1 of output neuron gradients of which the absolute values are smaller than a first preset threshold value in the L-th layer of output neuron gradients;

acquiring proportional data a according to the number n1 and the number n2 of the L-th layer output neuron gradients, wherein a is n1/n 2;

when the proportion data a is larger than the secondWhen a threshold value is preset, reducing gradient precision of the L-th layer output neuron

According to reduced output neuron gradient precision

And representing the L-th output neuron gradient for subsequent operation.

In one possible embodiment, the reducing the L < th > layer output neuron gradient precision

Thereafter, the method further comprises:

judging whether the weight overflows when in a fixed point data format representing the gradient of the L-th layer output neuron;

and increasing the bit width of the fixed point data format representing the gradient of the L-th layer output neuron according to a third preset step N2.

and increasing the bit width of the fixed point data format representing the L-th layer output neuron gradient in a 2-time incremental mode.

It can be seen that in the solution of the embodiment of the present invention, the input neuron precision S is dynamically adjusted (including increasing or decreasing) in the neural network operation process_xWeight accuracy S_wAnd output neuron gradient accuracy

The method and the device can reduce the error of the operation result and improve the precision of the operation result while meeting the operation requirement.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1a is a schematic diagram of a fixed-point data format;

fig. 1b is a schematic structural diagram of a neural network operation module according to an embodiment of the present invention;

fig. 2 is a schematic flow chart illustrating a neural network operation method according to an embodiment of the present invention;

fig. 3 is a schematic flow chart of another neural network operation method according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It is to be understood that the terminology used in the embodiments of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

In the neural network operation process, because of a series of operations such as addition, subtraction, multiplication, division, convolution and the like, the input neuron, the weight and the output neuron included in the forward operation process and the input neuron gradient, the weight gradient and the output neuron gradient included in the reverse training process are also changed. The precision with which input neurons, weights, output neurons, input neuron gradients, weight gradients, and output neuron gradients are represented in fixed-point data format may need to be increased or decreased. If the precision of the input neuron, the weight, the output neuron, the input neuron gradient, the weight gradient and the output neuron gradient is not enough, a larger error occurs in an operation result, and even a reverse training failure is caused; if the precision of the input neuron, the weight, the output neuron, the input neuron gradient, the weight gradient and the output neuron gradient is redundant, unnecessary operation overhead is increased, and operation resources are wasted. The application provides a neural network operation module and a method, which dynamically adjust the precision of the data in the neural network operation process, so as to reduce the error of an operation result and improve the precision of the operation result while meeting the operation requirement.

In the embodiment of the present application, the purpose of adjusting the data precision is achieved by adjusting the bit width of the data. For example, when the precision of the fixed-point data format cannot meet the requirement of operation, the precision of the fixed-point data format can be increased by increasing the bit width of the decimal part in the fixed-point data format, i.e. increasing s in fig. 1 a; however, since the bit width of the fixed-point data format is fixed, when the bit width of the fractional part is increased, the bit width of the integer part is decreased, and therefore, the data range that can be represented by the fixed-point data format is decreased.

Referring to fig. 1b, fig. 1b is a schematic structural diagram of a neural network operation module according to an embodiment of the present invention. The neural network operation module is used for performing operation of a multilayer neural network. As shown in fig. 1b, the neural network operation module 100 includes:

and the storage unit 101 is used for storing the input neuron precision, the weight precision and the output neuron gradient precision.

A controller unit 102 for obtaining the input neuron precision S of the L-th layer of the multi-layer neural network from the storage unit 101_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

In a possible embodiment, the storage unit 101 is further configured to store input neurons, weights, output neurons, and output neuron gradients, the controller unit 102 obtains L-th layer input neurons, weights, and output neuron gradients from the storage unit 101, and the controller unit 102 obtains the input neuron precision S according to the L-th layer input neurons, weights, and output neuron gradients_x(l)Precision of weightS_w(l)And output neuron gradient accuracy

The bit width of the fixed-point data format used for representing the number of the fixed-point data of the input neuron and the bit width of the fixed-point data format used for representing the weight are first bit width, and the bit width of the fixed-point data format used for representing the gradient of the output neuron is second bit width.

Optionally, the second bit width is greater than the first bit width.

Further, the second bit width is twice the first bit width, so as to facilitate processing by an electronic computer.

Further, the first bit width is preferably 8 bits, and the second bit width is preferably 16 bits.

The controller unit 102 may preset the preset accuracy T according to experience_r(ii) a Or a second preset formula is adopted to obtain the preset precision T matched with the input parameters in a mode of changing the input parameters_r(ii) a T can also be acquired by a machine learning method_r。

Alternatively, the controller unit 102 sets the preset accuracy T according to a learning rate and a batch size (number of samples in batch processing)_r。

Further, if there is a parameter sharing layer (such as convolutional layer and recurrent neural network layer) in the neural network, the controller unit 102 sets the predetermined precision T according to the number of output neurons in the previous layer, the blocksize, and the learning rate_rThat is, the higher the number of output neurons of the previous layer, the larger the blocksize, and the higher the learning rate, the preset accuracy T_rThe larger.

Specifically, the controller unit 102 obtains the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

Then, according to a first preset formula, the precision S of the input neuron is determined_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

Calculating to obtain the gradient update precision T, wherein the first preset formula may be:

wherein the controller unit 102 adjusts the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The method comprises the following steps:

the controller unit 102 maintains the input neuron precision S_x(l)Sum weight precision S_w(l)The gradient precision of the output neuron is not changed, and the gradient precision of the output neuron is reduced

It is noted that due to the above-mentioned output neuron gradient accuracy

The controller unit 102 reduces the gradient accuracy of the output neuron

This means that the decimal part bit width s1 in the fixed point data format indicating the gradient of the output neuron is increased.

Alternatively, the controller unit 102 may increase the decimal part bit width s1 of the fixed point data format indicating the weight by a first preset step N1 according to the value of Tr-T.

Specifically, for the decimal part bit width s1 in the fixed point data format indicating the gradient of the output neuron, the controller unit 102 increments N1 bits at a time, that is, the decimal part bit width is s1+ N1, and obtains the gradient precision of the output neuron

Then according to the above-mentioned preset formula

Judging whether the absolute value of the difference value between the gradient updating precision T and the preset precision Tr is smaller or not; when determining that the absolute value of the difference between the gradient update precision T and the preset precision Tr is smaller, the controller unit 102 continues to increase the bit width of the decimal part in the fixed point data format representing the gradient of the output neuron by N1, that is, the bit width is s1+2 × N1, and obtains the gradient precision of the output neuron

Continuously judging whether the absolute value of the difference value between the gradient updating precision T and the preset precision Tr is reduced or not; if the size of the sample is smaller, continuing to process according to the method; if the absolute value of the difference between the gradient update precision T and the preset precision Tr increases during the nth processing, the controller unit 102 uses the bit width obtained by the nth-1 processing, i.e., s1+ (N-1) × N1, as the bit width of the decimal part of the fixed point data format indicating the gradient of the output neuron, and the gradient precision of the output neuron after increasing the bit width of the decimal part is set to be equal to

Optionally, the first preset step N1 is 1, 2, 4, 6, 7, 8 or other positive integer.

Alternatively, the controller unit 102 increases the bit width of the decimal part in the fixed point data format indicating the gradient of the output neuron in increments of 2 times.

For example, the decimal part bit width of the fixed point data format indicating the gradient of the output neuron is 3, that is, the precision of the weight is 2^-3Then, the bit width of the decimal part of the fixed point data format representing the gradient of the output neuron after increasing the bit width in a 2-fold increasing manner is 6, that is, the gradient precision of the reduced output neuron is 2^-6。

In one possible embodiment, after the controller unit 102 determines the increment width b of the decimal part bit width of the fixed-point data format representing the gradient of the output neuron, the controller unit 102 increments the decimal part bit width of the fixed-point data format multiple times, for example, the controller unit 102 increments the decimal part bit width of the fixed-point data format two times, the first increment is b1, the second increment is b2, and b is 1+ b 2.

Wherein, the b1 and the b2 can be the same or different.

Optionally, the controller unit 102 reduces the gradient precision of the output neuron

The bit width of the fixed point data format representing the gradient of the output neuron is increased.

Further, the gradient precision of the output neuron is increased

The bit width of the decimal part in the fixed point data format representing the gradient of the output neuron is increased, and since the bit width of the fixed point data format representing the gradient of the output neuron is unchanged, if the bit width of the decimal part is increased, the bit width of the integer part is reduced, and the data range represented by the fixed point data format is reduced, the precision of the gradient of the output neuron is reduced in the controller unit 102

Then, the controller unit 102 increases the bit width of the fixed-point data format, and after the bit width of the fixed-point data format is increased, the bit width of the integer portion remains unchanged, i.e. the increased value of the bit width of the integer portion is the same as the increased value of the bit width of the fractional portion.

For example, the bit width of the fixed-point data format is 9, where the bit width of the sign bit is 1, the bit width of the integer portion is 5, and the bit width of the fractional portion is 3, and after the bit width of the fractional portion and the bit width of the integer portion are increased by the controller unit 102, the bit width of the fractional portion is 6, and then the bit width of the integer portion is 5, that is, the bit width of the fractional portion is increased, and the bit width of the integer portion remains unchanged.

In one possible embodiment, the controller unit 102 reduces the gradient precision of the output neurons

The controller unit 102 is then further configured to:

Specifically, as can be seen from the above description, the controller unit 102 reduces the gradient precision of the output neuron

In this case, since the range of the fixed-point data format indicating the gradient of the output neuron is narrowed, the controller unit 102 decreases the gradient accuracy of the output neuron

Then, judging whether the output neuron gradient overflows when being expressed in the fixed point data format; when overflow is determined, the controller unit 102 increases the bit width of the fixed-point data format, thereby expanding the range of data represented by the fixed-point data format so that the output neuron gradient is not overflowed when represented by the fixed-point data format.

It should be noted that the controller unit 102 increases the bit width of the fixed-point data format, specifically, increases the bit width of the integer part of the fixed-point data format.

Further, the increasing, by the controller unit 102, the bit width of the fixed-point data format indicating the gradient of the output neuron includes:

the controller unit 102 increases the bit width of the fixed-point data format representing the gradient of the output neuron according to a second preset step N2, where the second preset step N2 may be 1, 2, 3, 4, 5, 7, 8 or other positive integer.

Specifically, when determining to increase the bit width of the fixed-point data format, the controller unit 102 increases the bit width of the fixed-point data format by the second preset step N2 each time.

In one possible embodiment, the controller unit 102 increases the bit width of the fixed-point data format representing the gradient of the output neuron, including:

the controller unit 102 increases the bit width of the fixed-point data format indicating the gradient of the output neuron in increments of 2 times.

For example, if the bit width of the fixed-point data format excluding the sign bit is 8, the bit width of the fixed-point data format excluding the sign bit is 16 after the bit width of the fixed-point data format is increased in a 2-time increasing manner; and increasing the bit width of the fixed-point data format in a 2-time incremental mode, wherein the bit width of the fixed-point data format except the sign bit is 32.

In one possible embodiment, the controller unit 102 adjusts the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

Comprises that

The controller unit 102 reduces the input neuron precision S_x(l)And/or accuracy of gradient of the output neurons

Maintaining the above weight precision S_w(l)Unchanged, or;

the controller unit 102 reduces the input neuron precision S_x(l)Increasing the gradient precision of the output neuron

Maintaining the above weight precision S_w(l)Unchanged and the input neuron precision S_x(l)The reduced amplitude is larger than the gradient precision of the output neuron

Or, or;

the controller unit 102 increases the gradient accuracy of the output neuron

Reducing the precision S of the input neuron_x(l)Maintaining the above weight accuracy S_w(l)Unchanged and the gradient precision of the output neuron

The increased amplitude is smaller than the input neuron precision S_x(l)Or is reduced by a reduced magnitude of;

the controller unit 102 increases or decreases the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

So as to update the gradient with the predetermined accuracy T_rThe absolute value of the difference of (a) is smallest.

Here, it should be noted that the controller unit 102 applies the weight precision S to the weight precision S_w(l)The accuracy S of the input neuron_x(l)And output neuron gradient accuracy

The specific process of performing the increase operation in any one of the above-mentioned embodiments can be referred to the above-mentioned related operation of the controller unit 102, and will not be described here.

Adjusting the input neuron precision S according to the method_x(l)Precision of weight S_w(l)And output neuron gradient accuracy

Then, the arithmetic unit 103 adjusts the input neuron precision S in accordance with the adjusted input neuron precision S during the arithmetic process_x(l)Precision of weight S_w(l)And output neuron gradient accuracy

And expressing the input neurons, the weights and the output neuron gradients of the L-th layer in a fixed point data format, and then carrying out subsequent operation.

It should be noted that the frequency of calculating the gradient update accuracy T by the controller unit 102 can be flexibly set according to the requirement.

The controller unit 102 may adjust and calculate the frequency of the gradient update precision T according to the number of training iterations in the neural network training process.

Optionally, the controller unit 102 recalculates the gradient update precision T once per iteration in the neural network training process; or recalculating the gradient updating precision T every iteration preset times; or the frequency is set according to the change of the gradient update accuracy T.

Alternatively, the controller unit 102 sets the frequency of calculating the gradient update accuracy T according to the number of training iterations in the neural network training.

An arithmetic unit 103 for calculating the precision S of the input neuron according to the increased or decreased input neuron_x(l)Sum weight precision S_w(l)To represent input neurons and weights for layer L; according to increased or decreased output neuron gradient precision

To represent the computed level L output neuron gradient.

In other words, the above-mentioned arithmetic unit is used for increasing or decreasing the input neuron precision S_x(l)The fixed point data format of (1) represents the L-th input neuron by increasing or decreasing the weight precision S_w(l)The fixed point data format of (1) represents the weight of the L-th layer, and the gradient precision of the output neuron is increased or decreased

The fixed point data format of (1) represents the gradient of the output neuron of the L-th layer for subsequent operations.

By dynamically adjusting (including increasing or decreasing) the precision S of the input neuron in the operation process of the neural network_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The method and the device can meet the operation requirement, reduce the error of the operation result and the operation overhead and save the operation resource.

In another alternative embodiment, the controller unit 102 obtains an L-th layer output neuron gradient of the multilayer neural network; .

In one possible embodiment, the controller unit 102 acquires the output neurons of the L-th layer and the output neurons of the L-1 th layer, and then acquires the gradient of the L-th layer output neurons according to the output neurons of the L-th layer and the output neurons of the L-1 th layer.

The controller unit 102 obtains proportional data a of the output neuron gradient whose absolute value is smaller than a first preset threshold value.

Alternatively, the first preset threshold may be 0, 0.01, 0.05, 0.1, 0.12, 0.05 or other values.

Specifically, after acquiring the L-th layer output neuron gradient, the controller unit 102 acquires the number n1 of gradient values having an absolute value smaller than the first preset threshold value in the L-th layer output neuron gradient, and then acquires the proportional data a, that is, a ═ n1/n2, based on the number n1 and the number n2 of the L-th layer output neuron gradient.

Alternatively, the above ratio data may be 50%, 60%, 65%, 70%, 80%, 85%, 90%, or other values.

Optionally, the above ratio data is 80%.

When the proportion data a is larger than a second preset threshold valueThe controller unit 102 reduces the gradient accuracy of the L-th layer output neuron

In one possible embodiment, the controller unit 102 reduces the gradient precision of the L-th layer output neurons

Then, the bit width of the fixed point data format indicating the gradient of the L-th layer output neuron is increased.

Then, the controller unit 102 is further configured to:

when the overflow is determined, the bit width of the fixed point data format indicating the gradient of the L-th layer output neuron is increased.

In a possible embodiment, the controller unit 102 increases the bit width of the fixed-point data format representing the gradient of the L-th layer output neuron, including:

the controller unit 102 increases the bit width of the fixed point data format indicating the gradient of the L-th layer output neuron according to a third preset step N3.

In a possible embodiment, the controller unit 102 increasing the bit width of the fixed-point data format indicating the L-th layer output neuron gradient includes:

the controller unit 102 increases the bit width of the fixed-point data format indicating the gradient of the L-th layer output neuron by a 2-fold increment.

It should be noted that the controller unit 102 reduces the gradient precision of the output neuron

The specific processes of (1) can be seen from the above description, and will not be described again.

Adjusting the gradient precision of the output neuron according to the method

Then, the arithmetic unit 103 adjusts the gradient accuracy of the output neuron according to the adjusted output neuron during the arithmetic process

The gradient of the output neuron of the L-th layer is expressed in the form of fixed point number, and then the subsequent operation is performed.

The precision of the neural network is adjusted according to the gradient of the output neurons in the operation process of the neural network, so that the error of the output neurons is reduced, and the training is ensured to be normally carried out.

Referring to fig. 2, fig. 2 is a schematic flow chart of a neural network operation method according to an embodiment of the present invention, and as shown in fig. 2, the method includes:

s201, the neural network operation module acquires the precision, the weight precision and the gradient precision of output neurons of the L-th layer of the neural network.

Wherein the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The values of (A) may be the same, or partially the same or different from each other two by two.

Wherein the neural network is a multilayer neural network, and the L-th input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The input neuron precision, the weight precision and the output neuron gradient precision of any layer of the multilayer neural network are respectively.

In a possible embodiment, the neural network operation module obtains the input neurons, the weights and the output neurons of the L-th layer; obtaining the precision S of the L-th layer input neuron according to the L-th layer input neuron, the weight and the output neuron_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

S202, the neural network operation module calculates to obtain gradient updating precision T according to the precision of the L-th layer input neurons, the weight precision and the gradient precision of the output neurons.

Specifically, the neural network operation module performs precision S on the input neuron according to a first preset formula_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

And calculating to obtain the gradient updating precision T.

Wherein the first predetermined formula is

S203, when the gradient updating precision T is larger than the preset precision T_rThe neural network operation module adjusts the precision, the weight precision and the output neuron gradient of the L-th layer input neuron so as to update the gradient precision T and preset precision T_rThe absolute value of the difference of (a) is smallest.

The bit width of the fixed-point data format used for representing the input neuron and the fixed-point data format used for representing the weight is a first bit width, and the bit width of the fixed-point data format used for representing the gradient of the output neuron is a second bit width.

Optionally, the second bit width is greater than the first bit width.

Wherein the predetermined accuracy T_rThe setting can be carried out according to experience in advance; the T matched with the input parameters can be obtained by changing the input parameters through a second preset formula_r(ii) a T can also be acquired by a machine learning method_r。

Optionally, the neural network operation module sets the preset precision T according to a learning rate and a batch size (number of samples in batch processing)_r。

Further, if there is a parameter sharing layer (such as convolutional layer and recurrent neural network layer) in the neural network, the predetermined precision T is set according to the number of output neurons in the previous layer, the blocksize, and the learning rate_rThat is, the higher the number of output neurons in the previous layer is, the larger the blocksize is, the higher the learning rate is, and the preset precision T is_rThe larger.

Wherein the neural network operation module adjusts the precision S of the input neuron_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The method comprises the following steps:

maintaining the precision S of the input neuron_x(l)Sum weight precision S_w(l)The gradient precision of the output neuron is increased without changing

It should be noted that the neural network operation module reduces the gradient precision of the output neuron

Optionally, the neural network operation module controller unit increases the decimal part bit width s1 in the fixed point data format indicating the gradient of the output neuron according to the value of Tr-T by a first preset step N1.

Specifically, for the decimal part bit width s1 in the fixed point data format representing the gradient of the output neuron, the neural network operation module increases N1 each time, that is, the bit width of the decimal part is s1+ N1, and obtains the gradient precision of the output neuron

Then according to the above-mentioned preset formula

Judging whether the absolute value of the difference value between the gradient updating precision T and the preset precision Tr is smaller or not; when the absolute value of the difference between the gradient update precision T and the preset precision Tr is determined to be smaller, the neural network operation module continues to increase the bit width of the decimal part in the fixed point data format representing the gradient of the output neuron by N1, namely the bit width is s1+2 × N1, and obtains the gradient precision of the output neuron

Continuously judging whether the absolute value of the difference value between the gradient updating precision T and the preset precision Tr is reduced or not; if the size of the sample is smaller, continuing to process according to the method; if the absolute value of the difference between the gradient update precision T and the preset precision Tr is increased during the nth processing, the neural network operation module uses the bit width obtained by the nth-1 processing, i.e., s1+ (N-1) × N1, as the bit width of the decimal part of the fixed point data format representing the gradient of the output neuron, and the gradient precision of the output neuron after increasing the bit width of the decimal part is equal to

Optionally, the neural network operation module increases a bit width of a decimal part in a fixed point data format indicating the gradient of the output neuron in a 2-fold increasing manner.

For example, the output neuron ladder is representedThe bit width of the decimal part of the fixed point data format of the degree is 3, namely the gradient precision of the output neuron is 2^-3Then, the decimal part bit width of the fixed point data format representing the gradient of the output neuron after increasing in a 2-fold increasing manner is 6, that is, the gradient precision of the output neuron after decreasing is 2^-6。

In one possible embodiment, after the neural network operation module determines the increment b of the decimal part bit width of the fixed point data format representing the gradient of the output neuron, the neural network operation module increments the decimal part bit width of the fixed point data format multiple times, for example, the neural network operation module increments the decimal part bit width of the fixed point data format two times, the first increment is b1, the second increment is b2, and b is b1+ b 2.

Wherein, the b1 and the b2 can be the same or different.

Optionally, when the neural network operation module decreases the gradient precision of the output neuron, the bit width of the fixed point data format indicating the weight is increased.

Further, the gradient accuracy S of the output neuron is reduced_w(l)The bit width of the decimal part in the fixed point data format representing the weight is increased, the bit width of the decimal part in the fixed point data format representing the output neuron gradient is unchanged, if the bit width of the decimal part is increased, the bit width of the integer part is reduced, the data range represented by the fixed point data format is reduced, and therefore the gradient precision S of the output neuron is reduced in the neural network operation module_w(l)And then, the neural network operation module increases the bit width of the fixed point data format, and after the bit width of the fixed point data format is increased, the bit width of the integer part is kept unchanged, namely the increased value of the bit width of the integer part is the same as the increased value of the bit width of the decimal part.

For example, the bit width of the fixed-point data format is 9, where the bit width of the sign bit is 1, the bit width of the integer portion is 5, and the bit width of the decimal portion is 3, and after the bit width of the decimal portion and the bit width of the integer portion are increased by the controller unit 102, the bit width of the decimal portion is 6, and then the bit width of the integer portion is 5, that is, the bit width of the decimal portion is increased, and the bit width of the integer portion remains unchanged.

In a possible embodiment, after the neural network operation module reduces the gradient precision of the output neuron, the neural network operation module is further configured to:

Specifically, as can be seen from the above description, when the neural network operation module reduces the precision of the output neuron gradient, the range of the fixed point data format representing data representing the output neuron gradient is reduced, and therefore, after the neural network operation module reduces the precision of the output neuron gradient, it is determined whether the output neuron gradient is overflowed when represented in the fixed point data format; when overflow is determined, the neural network operation module increases the bit width of the fixed point data format, so that the range of data represented by the fixed point data format is expanded, and the output neuron gradient is not overflowed when represented by the fixed point data format.

It should be noted that, the neural network operation module increases the bit width of the fixed-point data format, specifically, increases the bit width of the integer part of the fixed-point data format.

Further, the increasing, by the neural network operation module, a bit width of the fixed-point data format indicating the gradient of the output neuron includes:

the neural network operation module increases the bit width of the fixed point data format representing the gradient of the output neuron according to a second preset step N2, wherein the second preset step N2 may be 1, 2, 3, 4, 5, 7, 8 or other positive integers.

Specifically, when determining to increase the bit width of the fixed point data format, the neural network operation module increases the bit width of the fixed point data format by the second preset step N2 each time.

In one possible embodiment, the neural network operation module increases the bit width of the fixed-point data format representing the gradient of the output neuron, including:

the neural network operation module increases the bit width of the fixed point data format representing the gradient of the output neuron in a 2-fold increasing manner.

For example, if the bit width of the fixed-point data format excluding the sign bit is 8, the bit width of the fixed-point data format excluding the sign bit is 16 after the bit width of the fixed-point data format is increased in a 2-time increasing manner; after the bit width of the fixed-point data format is increased again in a 2-time increasing mode, the bit width of the fixed-point data format excluding the sign bit is 32.

In one embodiment, the neural network operation module adjusts the precision S of the input neuron_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The method comprises the following steps:

reducing the precision S of the input neuron_x(l)And/or accuracy of gradient of the output neurons

Maintaining the above weight precision S_w(l)Unchanged, or;

reducing the precision S of the input neuron_x(l)Increasing the gradient precision of the output neuron

Or, or;

increasing the gradient precision of the output neuron

increasing or decreasing the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

It should be noted that, the neural network operation module applies the weight precision S_w(l)The accuracy S of the input neuron_x(l)And output neuron gradient accuracy

The specific process of performing the increasing operation in any one of the above-mentioned embodiments can be referred to the above-mentioned neural network operation module to increase the above-mentioned related operation, and will not be described herein.

S204, the neural network operation module represents the output neurons and the weights of the L-th layer according to the adjusted input neuron precision and the adjusted weight precision; and expressing the gradient of the L-th layer output neuron obtained by operation according to the adjusted gradient precision of the output neuron so as to perform subsequent operation.

Then, the neural network operation module recalculates the gradient updating precision T; when the gradient updating precision is no longer greater than the preset precision T_rThe neural network operation module reduces the input neuron precision S by referring to the method of step S203_x(l)Precision of weight S_w(l)And output neuron gradient accuracy

It should be noted that the frequency of calculating the gradient update precision T by the neural network operation module can be flexibly set according to requirements.

The neural network operation module can adjust and calculate the frequency of the gradient updating precision T according to the training iteration times in the neural network training process.

Optionally, during the neural network training process, the neural network operation module recalculates the gradient update precision T once per iteration; or recalculating the gradient updating precision T every iteration preset times; or the frequency is set according to the change of the gradient update accuracy T.

Optionally, the neural network operation module sets a frequency of calculating the gradient update precision T according to a training iteration number in the neural network training.

It can be seen that, in the solution of the embodiment of the present invention, in the operation process of the neural network, the precision S of the input neuron is dynamically adjusted_xWeight accuracy S_wAnd output neuron gradient accuracy

Fig. 3 and fig. 3 are schematic flow charts of a neural network operation method according to an embodiment of the present invention. As shown in fig. 3, the method includes:

s301, the neural network operation module obtains the L-th layer output neuron gradient.

In a possible embodiment, the neural network operation module obtains the output neurons of the L-th layer and the output neurons of the L-1 th layer, and then obtains the gradient of the L-th layer output neurons according to the output neurons of the L-th layer and the output neurons of the L-1 th layer.

S302, the neural network operation module obtains proportion data a of which the absolute value in the L-th layer output neuron gradient is smaller than a first preset threshold value.

Specifically, the neural network operation module acquires the number n1 of gradient values having an absolute value smaller than the first preset threshold value in the L-th layer output neuron gradient after acquiring the L-th layer output neuron gradient, and then acquires the proportional data a, that is, a ═ n1/n2, based on the number n1 and the number n2 of the L-th layer output neuron gradient.

Optionally, the above ratio data is 80%.

And S303, when the proportion data a is larger than a second preset threshold value, reducing the gradient precision of the L-th layer output neuron by the neural network operation module.

In one possible embodiment, the neural network operation module reduces the gradient precision of the L-th layer output neurons

Increasing the number of output neurons in the L-th layerBit width in the fixed point data format of the gradient.

Then, the neural network operation module is further configured to:

In a possible embodiment, the neural network operation module increases a bit width of a fixed-point data format representing a gradient of the L-th layer output neuron, including:

and increasing the bit width of the fixed point data format representing the gradient of the L-th layer output neuron according to a third preset step N3.

and increasing the bit width of the fixed point data format representing the gradient of the L-th layer output neuron according to a 2-time incremental mode.

Adjusting the gradient precision of the output neuron according to the method

Then, the neural network operation module adjusts the gradient precision of the output neurons in the operation process

And expressing the gradient of the output neuron of the L-th layer in a fixed point data format, and then carrying out subsequent operation.

It can be seen that in the scheme of the embodiment of the invention, the precision of the output neurons is adjusted according to the gradient of the output neurons in the operation process of the neural network, so that the error of the output neurons is reduced, and the normal training is ensured.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A neural network operation module, wherein the neural network operation module is used for performing operations of a multilayer neural network, and comprises:

an arithmetic unit for adjusting the input neuron precision S_x(l)Sum weight precision S_w(l)To represent input neurons and weights of the L-th layer and to adjust the gradient precision of the output neurons according to the adjusted weights

2. The module of claim 1, wherein the controller unit is configured to determine the input neuron precision S based on the input neuron precision_x(l)The weight accuracy S_w(l)And the output neuron gradient precision

Obtaining the gradient update precision T, specifically comprising:

Calculating to obtain the gradient updating precision T;

wherein the preset formulaComprises the following steps:

3. the module of claim 2, wherein the controller unit adjusts the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The method comprises the following steps:

4. The module of claim 3, wherein the controller unit reduces the output neuron gradient precision

5. A module according to claim 3 or 4, characterized in that the controller unit reduces the output neuron gradient precision

Thereafter, the controller unit is further configured to:

6. The module according to claim 4 or 5, wherein said controller unit increases bit width of said fixed-point data format representing said weight, comprising:

the controller unit increases the bit width of the fixed point data format representing the gradient of the output neuron according to a preset step length N1;

the preset step length N1 is 1, 2, 4, 6, 7, 8 or other positive integer.

7. The module of claim 4 or 5, wherein the controller unit increases the bit width of the fixed-point data format representing the output neuron gradient, comprising:

8. The module according to any one of claims 1 to 7, wherein the controller unit is further configured to:

9. A neural network operation method, comprising:

According to the input neuron precision S_x(l)The weight accuracy S_w(l)And said output spiritPrecision of meridian element gradient

Calculating to obtain gradient updating precision T;

when the gradient updating precision T is larger than the preset precision T_rAdjusting the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient

To represent the gradient of the L-th output neuron obtained by operation; for subsequent operations.

10. The method of claim 9, wherein said determining is based on said input neuron precision S_x(l)The weight accuracy S_w(l)And the output neuron gradient precision

Calculating to obtain gradient updating precision T, comprising:

Calculating to obtain the gradient updating precision T;

wherein the preset formula is as follows:

11. the method of claim 10, wherein the adjusting the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The method comprises the following steps:

12. The method of claim 11, wherein the reducing the output neuron gradient precision

13. The method of claim 11 or 12, wherein the reducing the output neuron gradient precision

Thereafter, the method further comprises:

14. The method of claim 12 or 13, wherein said increasing a bit width of a fixed-point data format representing said output neuron gradient comprises:

increasing the bit width of the fixed point data format representing the gradient of the output neuron according to a preset step length N1;

the preset step length N1 is 1, 2, 4, 6, 7, 8 or other positive integer.

15. The method of claim 12 or 13, wherein said increasing a bit width of a fixed-point data format representing said output neuron gradient comprises:

16. The method according to any one of claims 9-15, further comprising: