CN110880033A

CN110880033A - Neural network operation module and method

Info

Publication number: CN110880033A
Application number: CN201811041573.3A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2018-09-06
Filing date: 2018-09-06
Publication date: 2020-03-13

Abstract

The invention discloses a neural network operation module, which comprises a storage unit, a neural network unit and a neural network unit, wherein the neural network unit is used for acquiring the input neuron precision, the weight precision and the output neuron gradient precision of an L-th layer from the storage unit; acquiring gradient updating precision T according to the input neuron precision, the weight precision and the output neuron gradient precision; when the gradient updating precision T is smaller than the preset precision T_rAdjusting the precision of an input neuron, the precision of weight and the precision of gradient of an output neuron; and an operation unit for representing the output neurons and weights of the L-th layer according to the increased input neuron precision and weight precision and representing the gradient of the L-th layer output neurons obtained by operation according to the increased output neuron gradient precision so as to perform subsequent operation. By adopting the embodiment of the invention, the operation requirement can be met, the error of the operation result and the operation overhead are reduced, and the operation resource is saved.

Description

Neural network operation module and method

Technical Field

The invention relates to the field of neural networks, in particular to a neural network operation module and a method.

Background

The fixed point number is a data format capable of specifying the position of a decimal point, and we usually use bit width to represent the data length of one fixed point number. For example, the bit width of a 16-bit fixed point number is 16. For a given number of fixed points of bit width, the precision of the representable data and the range of numbers that can be represented are traded off against each other, the greater the precision that can be represented, the smaller the range of numbers that can be represented. As shown in FIG. 1a, for a fixed point data format with bitnum bit widthThe first bit is sign bit, the integer part occupies x bit, the decimal part occupies S bit, the maximum fixed point precision S that the fixed point data format can represent is 2^-s. The fixed point data format may represent a range of [ neg, pos [ ]]Wherein pos is (2)^bitnum-1-1)*2^-s，neg＝-(2^bitnum-1)*2^-s。

In the neural network operation, data can be expressed and operated by using a fixed point data format. For example, during forward operation, the data of layer L includes input neuron X^(l)And output neuron Y^(l)Weight W^(l). During the inverse operation, the data of the L < th > layer includes input neuron gradients

Output neuron gradient

Gradient of weight

The above data may be expressed by fixed-point numbers, or may be calculated by fixed-point numbers.

The training process of the neural network generally comprises two steps of forward operation and reverse operation, during the reverse operation, the precision required by the input neuron gradient, the weight gradient and the output neuron gradient may change, and may decrease along with the training process, if the precision of the fixed point number is redundant, the operation overhead may be increased, and the operation resource is wasted.

Disclosure of Invention

The technical problem to be solved by the embodiments of the present invention is that in the neural network operation process, the precision of an input neuron, the precision of a weight, or the gradient precision of an output neuron is insufficient, which causes an error in the result of operation or training.

In a first aspect, the present invention provides a neural network operation module, configured to perform operations on a multilayer neural network, including:

the storage unit is used for storing the input neuron precision, the weight precision and the output neuron gradient precision;

a controller unit for obtaining the input neuron precision S of the L-th layer of the multilayer neural network from the storage unit_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

Wherein L is an integer greater than 0; according to the input neuron precision S_x(l)The weight accuracy S_w(l)And the output neuron gradient precision

Obtaining gradient updating precision T; when the gradient updating precision T is smaller than the preset precision T_rAdjusting the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

So as to update the gradient with the predetermined precision T_rThe absolute value of the difference of (a) is minimum;

an arithmetic unit for adjusting the input neuron precision S_x(l)Sum weight precision S_w(l)To represent output neurons and weights of the L-th layer according to the adjusted output neuron gradient precision

To express the gradient of the output neuron of the L-th layer obtained by the operation so as to carry out the subsequent operation.

In a possible embodiment, the controller unit is adapted to determine the input neuron precision S_x(l)The weight accuracy S_w(l)And the output neuron gradient precision

Obtaining the gradient update precision T, specifically comprising:

the controller unit is used for carrying out precision S on the input neuron according to a preset formula_x(l)The weight accuracy S_w(l)And the output neuron gradient precision

Calculating to obtain the gradient updating precision T;

wherein the first preset formula is as follows:

in a possible embodiment, the controller unit adjusts the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The method comprises the following steps:

the controller unit maintains the input neuron precision S_x(l)And the weight precision S_w(l)Unchanged, increasing the output neuron gradient precision

In a possible embodiment, the controller unit increases the output neuron gradient precision

The bit width of a fixed-point data format representing the output neuron gradient is reduced.

Thereafter, the controller unit is further configured to:

judging the gradient precision of the output neuron

Whether the precision is smaller than the required precision, wherein the required precision is the minimum precision of the output neuron gradient when multilayer neural network operation is carried out;

when the gradient precision of the output neuron

And when the required precision is smaller than the required precision, reducing the bit width of the fixed point data format representing the output neuron gradient.

In one possible embodiment, the controller unit reduces bit widths of a fixed-point data format representing the output neuron gradient, including:

the controller unit reduces the bit width of the fixed point data format representing the gradient of the output neuron according to a first preset step length N1;

the first preset step N1 is 1, 2, 4, 6, 7, 8 or other positive integer.

and the controller unit reduces the bit width of the fixed point data format representing the output neuron gradient in a 2-time decreasing mode.

In a possible embodiment, the controller unit is further configured to:

obtaining the preset precision T according to a machine learning method_rOr alternatively;

obtaining the preset precision T according to the number of output neurons of the L-1 layer, the learning rate and the number of samples in batch processing_r(ii) a And the more the number of the L-1 layer output neurons and the number of samples in batch processing and the higher the learning rate are, the preset precision T is_rThe larger.

In a second aspect, an embodiment of the present invention provides a neural network operation method, including:

obtaining L-th layer input neuron precision S of neural network_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

According to the input neuron precision S_x(l)The weight accuracy S_w(l)And the output neuron gradient precision

Calculating to obtain gradient updating precision T;

when the gradient updating precision T is smaller than the preset precision T_rTime, adjust input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient

according to the adjusted input neuron precision S_x(l)Sum weight precision S_w(l)To represent output neurons and weights for layer L; according to the gradient precision of the adjusted output neuron

In one possible embodiment, said determining the input neuron precision S_x(l)The weight accuracy S_w(l)And the output neuron gradient precision

Calculating to obtain gradient updating precision T, comprising:

according to a preset formula, the precision S of the input neuron_x(l)The weight accuracy S_w(l)And the output neuron gradient precision

Calculating to obtain the gradient updating precision T;

wherein the preset formula is as follows:

in one possible embodiment, the adjusting the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The method comprises the following steps:

maintaining the input neuron precision S_x(l)And the weight precision S_w(l)Unchanged, increasing the output neuron gradient precision

In one possible embodiment, the increasing the output neuron gradient precision

Reducing bit width of a fixed point data format representing the output neuron gradient

In one possible embodiment, the increasing the output neuron gradient precision

Thereafter, the method further comprises:

judging the gradient precision of the output neuron

when the gradient precision of the output neuron

In one possible embodiment, the reducing the bit width of the fixed-point data format representing the output neuron gradient comprises:

reducing the bit width of the fixed point data format representing the gradient of the output neuron according to a first preset step length N1;

the first preset step N1 is 1, 2, 4, 6, 7, 8 or other positive integer.

and reducing the bit width of the fixed point data format representing the output neuron gradient in a 2-time decreasing mode.

In a possible embodiment, the method further comprises:

It can be seen that in the solution of the embodiment of the present invention, the input neuron precision S is dynamically adjusted (including increasing or decreasing) in the neural network operation process_xWeight accuracy S_wAnd output neuron gradient accuracy

The method has the advantages that the calculation requirements are met, meanwhile, the precision redundancy is reduced, the calculation overhead is reduced, and the waste of calculation resources is avoided.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1a is a schematic diagram of a fixed-point data format;

fig. 1b is a schematic structural diagram of a neural network operation module according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a neural network operation method according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It is to be understood that the terminology used in the embodiments of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

In the neural network operation process, because of a series of operations such as addition, subtraction, multiplication, division, convolution and the like, the input neuron, the weight and the output neuron included in the forward operation process and the input neuron gradient, the weight gradient and the output neuron gradient included in the reverse training process are also changed. The precision with which input neurons, weights, output neurons, input neuron gradients, weight gradients, and output neuron gradients are represented in fixed-point data format may need to be increased or decreased. If the precision of the input neuron, the weight, the output neuron, the input neuron gradient, the weight gradient and the output neuron gradient is not enough, a larger error occurs in an operation result, and even a reverse training failure is caused; if the precision of the input neuron, the weight, the output neuron, the input neuron gradient, the weight gradient and the output neuron gradient is redundant, unnecessary operation overhead is increased, and operation resources are wasted. The application provides a neural network operation module and a method, which dynamically adjust the precision of the data in the neural network operation process, so as to reduce the error of an operation result and improve the precision of the operation result while meeting the operation requirement.

In the embodiment of the present application, the purpose of adjusting the data precision is achieved by adjusting the bit width of the data. For example, when the precision of the fixed-point data format exceeds the requirement of operation, the precision of the fixed-point data format can be reduced by reducing the bit width of the decimal part in the fixed-point data format, namely reducing s in fig. 1 a; however, the precision of the fixed-point data format is related to the bit width of the fractional part, i.e. the precision of the fixed-point data format can be adjusted by increasing or decreasing the bit width of the fractional part. When the precision of the fixed point data format is smaller than the required precision, the bit width of the decimal part can be reduced, so that the precision of the fixed point data format is increased, the precision redundancy of the fixed point data format is further reduced, the operation overhead is reduced, and the waste of operation resources is avoided.

Referring to fig. 1b, fig. 1b is a schematic structural diagram of a neural network operation module according to an embodiment of the present invention. The neural network operation module is used for performing operation of a multilayer neural network. As shown in fig. 1b, the neural network operation module 100 includes:

and the storage unit 101 is used for storing the input neuron precision, the weight precision and the output neuron gradient precision.

A controller unit 102 for obtaining the input neuron precision S of the L-th layer of the multi-layer neural network from the storage unit 101_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

In a possible embodiment, the storage unit 101 is further configured to store input neurons, weights, output neurons, and output neuron gradients, the controller unit 102 obtains L-th layer input neurons, weights, and output neuron gradients from the storage unit 101, and the controller unit 102 obtains the input neuron precision S according to the L-th layer input neurons, weights, and output neuron gradients_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The bit width of the fixed-point data format used for representing the number of the fixed-point data of the input neuron and the bit width of the fixed-point data format used for representing the weight are first bit width, and the bit width of the fixed-point data format used for representing the gradient of the output neuron is second bit width.

Optionally, the second bit width is greater than the first bit width.

Further, the second bit width is twice the first bit width, so as to facilitate processing by an electronic computer.

Further, the first bit width is preferably 8 bits, and the second bit width is preferably 16 bits.

The controller unit 102 may be the root ofPresetting preset precision T according to experience_r(ii) a Or a second preset formula is adopted to obtain the preset precision T matched with the input parameters in a mode of changing the input parameters_r(ii) a T can also be acquired by a machine learning method_r。

Alternatively, the controller unit 102 sets the preset accuracy T according to a learning rate and a batch size (number of samples in batch processing)_r。

Further, if there is a parameter sharing layer (such as convolutional layer and recurrent neural network layer) in the neural network, the controller unit 102 sets the predetermined precision T according to the number of output neurons in the previous layer, the blocksize, and the learning rate_rThat is, the higher the number of output neurons of the previous layer, the larger the blocksize, and the higher the learning rate, the preset accuracy T_rThe larger.

Specifically, the controller unit 102 obtains the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

Then, according to a first preset formula, the precision S of the input neuron is determined_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

Calculating to obtain the gradient update precision T, wherein the first preset formula may be:

wherein the controller unit 102 adjusts the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The method comprises the following steps:

the controller unit 102 maintains the input neuron precision S_x(l)And weightPrecision S_w(l)The gradient precision of the output neuron is increased without changing

It is noted that due to the above-mentioned output neuron gradient accuracy

The controller unit 102 increases the gradient accuracy of the output neuron

This means that the decimal part bit width s1 in the fixed point data format indicating the gradient of the output neuron is reduced.

Alternatively, the controller unit 102 may decrease the decimal part bit width s1 of the fixed point data format indicating the weight by a first preset step N1 according to the value of Tr-T.

Specifically, for the decimal part bit width s1 in the fixed point data format representing the gradient of the output neuron, the controller unit 102 decreases N1 bits at a time, that is, the bit width of the decimal part is s1-N1, and obtains the gradient precision of the output neuron

Then according to the above-mentioned preset formula

Judging whether the absolute value of the difference value between the gradient updating precision T and the preset precision Tr is smaller or not; when determining that the absolute value of the difference between the gradient update precision T and the preset precision Tr is smaller, the controller unit 102 continues to decrease the bit width of the decimal part in the fixed point data format indicating the gradient of the output neuron by N1, that is, the bit width is s1-2 × N1, and obtains the gradient precision of the output neuron

Continuously judging whether the absolute value of the difference value between the gradient updating precision T and the preset precision Tr is reduced or not; if it becomes smaller, thenContinuing to process according to the method; if the absolute value of the difference between the gradient update precision T and the preset precision Tr increases during the nth processing, the controller unit 102 uses the bit width obtained by the nth-1 processing, i.e., s1- (N-1) × N1, as the bit width of the decimal part of the fixed point data format indicating the gradient of the output neuron, and reduces the bit width of the decimal part to obtain the gradient precision of the output neuron

Optionally, the first preset step N1 is 1, 2, 4, 6, 7, 8 or other positive integer.

Alternatively, the controller unit 102 may decrease the bit width of the decimal part in the fixed-point data format indicating the gradient of the output neuron in a 2-fold decreasing manner.

For example, the decimal part bit width of the fixed point data format indicating the gradient of the output neuron is 4, that is, the precision of the weight is 2^-4Then, the bit width of the decimal part of the fixed point data format representing the gradient of the output neuron after reducing the bit width in a 2-fold decreasing mode is 2, that is, the gradient precision of the reduced output neuron is 2^-2。

In one possible embodiment, after the controller unit 102 determines the reduction amplitude b of the decimal part bit width of the fixed-point data format representing the gradient of the output neuron, the controller unit 102 reduces the decimal part bit width of the fixed-point data format for a plurality of times, for example, the controller unit 102 reduces the decimal part bit width of the fixed-point data format twice, the first reduction amplitude is b1, the second reduction amplitude is b2, and b is 1+ b 2.

Wherein, the b1 and the b2 can be the same or different.

Optionally, the controller unit 102 increases the gradient precision of the output neuron

The bit width of the fixed-point data format representing the gradient of the output neuron is reduced.

Further, the gradient precision of the output neuron is increased

Since the bit width of the fixed point data format indicating the gradient of the output neuron is reduced, the bit width of the fixed point data format indicating the gradient of the output neuron is not changed, and if the bit width of the fixed point data format is reduced, the bit width of the integer part is increased, the data range indicated by the fixed point data format is increased, but the accuracy indicated by the fixed point data format is also increased, so that the accuracy of the gradient of the output neuron is increased in the controller unit 102

Then, the controller unit 102 reduces the bit width of the fixed-point data format, and after the bit width of the fixed-point data format is reduced, the bit width of the integer part remains unchanged, that is, the reduced value of the bit width of the integer part is the same as the reduced value of the bit width of the fractional part, that is, the maximum value represented by the fixed-point data format is guaranteed to be unchanged under the condition that the bit width of the fractional part is changed.

For example, the bit width of the fixed-point data format is 9, where the bit width of the sign bit is 1, the bit width of the integer portion is 5, and the bit width of the fractional portion is 4, and after the controller unit 102 decreases the bit width of the fractional portion and the bit width of the integer portion, the bit width of the fractional portion is 2, and then the bit width of the integer portion is 5, that is, the bit width of the fractional portion is decreased, and the bit width of the integer portion remains unchanged.

In one possible embodiment, the controller unit 102 reduces the gradient precision of the output neurons

The controller unit 102 is then further configured to:

judging the gradient precision of the output neuron

Whether the required precision is less than the required precision, wherein the required precision is to carry out multilayer nervesOutputting the minimum precision of the neuron gradient during network operation;

when the gradient precision of the output neuron

It should be noted that the controller unit 102 increases the gradient precision of the output neuron

The reason for (a) is the output neuron gradient accuracy

The control is smaller than the required precision, namely precision redundancy exists, at the moment, the operation expense is increased, and the operation resource is wasted. Therefore, in order to reduce the operation overhead and avoid the waste of operation resources, the gradient precision of the output neuron needs to be increased

Specifically, as can be seen from the above description, the controller unit 102 increases the gradient precision of the output neuron

Then, whether precision redundancy exists needs to be further judged, namely, the gradient precision of the output neuron is judged

Whether less than the required accuracy. When determining the gradient accuracy of the output neuron

When the required precision is less than the required precision, bit width of a fixed point data format representing the output neuron gradient is reduced so as to increase the output neuron gradient precision

Reducing the accuracy redundancy.

It should be noted that the controller unit 102 reduces the bit width of the fixed-point data format, specifically, reduces the bit width of the integer part of the fixed-point data format.

Further, the reducing, by the controller unit 102, the bit width of the fixed-point data format indicating the gradient of the output neuron includes:

the controller unit 102 reduces the bit width of the fixed-point data format representing the gradient of the output neuron according to a second preset step N2, where the second preset step N2 may be 1, 2, 3, 4, 5, 7, 8 or other positive integer.

Specifically, when determining to reduce the bit width of the fixed-point data format, the reduction value of the controller unit 102 every time the bit width of the fixed-point data format is reduced is the second preset step N2.

In one possible embodiment, the controller unit 102 reduces the bit width of the fixed-point data format representing the gradient of the output neuron, including:

the controller unit 102 reduces the bit width of the fixed-point data format indicating the gradient of the output neuron in a 2-fold decreasing manner.

For example, if the bit width of the fixed-point data format excluding the sign bit is 16, the bit width of the fixed-point data format excluding the sign bit is 8 after the bit width of the fixed-point data format is reduced in a 2-time decreasing manner; and reducing the bit width of the fixed-point data format by a 2-time decreasing mode, wherein the bit width of the fixed-point data format except the sign bit is 4.

In one possible embodiment, the controller unit 102 adjusts the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

Comprises that

The controller unit 102 increases the input neuron precision S_x(l)And/or the above-mentioned spirit of outputPrecision of meridian element gradient

Maintaining the above weight precision S_w(l)Unchanged, or;

the controller unit 102 increases the input neuron precision S_x(l)Reducing the gradient precision of the output neuron

Maintaining the above weight precision S_w(l)Unchanged and the input neuron precision S_x(l)The increased amplitude is greater than the gradient precision of the output neuron

Or is reduced by a reduced magnitude of;

the controller unit 102 reduces the gradient accuracy of the output neuron

Increasing the precision S of the input neuron_x(l)Maintaining the above weight accuracy S_w(l)Unchanged and the gradient precision of the output neuron

The amplitude of increase and decrease is less than the input neuron precision S_x(l)Or, or;

the controller unit 102 increases or decreases the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

So as to update the gradient with the predetermined accuracy T_rThe absolute value of the difference of (a) is smallest.

Here, it should be noted that the controller unit 102 applies the weight precision S to the weight precision S_w(l)The accuracy S of the input neuron_x(l)And output neuron gradient accuracy

The specific process of performing the decreasing operation in any one of the above embodiments may be referred to the above related operation of increasing by the controller unit 102, and will not be described here.

Adjusting the input neuron precision S according to the method_x(l)Precision of weight S_w(l)And output neuron gradient accuracy

Then, the arithmetic unit 103 adjusts the input neuron precision S in accordance with the adjusted input neuron precision S during the arithmetic process_x(l)Precision of weight S_w(l)And output neuron gradient accuracy

And expressing the input neurons, the weights and the output neuron gradients of the L-th layer in a fixed point data format, and then carrying out subsequent operation.

It should be noted that the frequency of calculating the gradient update accuracy T by the controller unit 102 can be flexibly set according to the requirement.

The controller unit 102 may adjust and calculate the frequency of the gradient update precision T according to the number of training iterations in the neural network training process.

Optionally, the controller unit 102 recalculates the gradient update precision T once per iteration in the neural network training process; or recalculating the gradient updating precision T every iteration preset times; or the frequency is set according to the change of the gradient update accuracy T.

Alternatively, the controller unit 102 sets the frequency of calculating the gradient update accuracy T according to the number of training iterations in the neural network training.

An arithmetic unit 103 for calculating the precision S of the input neuron according to the increased or decreased input neuron_x(l)Sum weight precision S_w(l)To represent input neurons and weights for layer L; according to increased or decreased output neuron gradient precision

To represent the computed level L output neuron gradient.

In other words, the above-mentioned arithmetic unit is used for increasing or decreasing the input neuron precision S_x(l)The fixed point data format of (1) represents the L-th input neuron by increasing or decreasing the weight precision S_w(l)The fixed point data format of (1) represents the weight of the L-th layer, and the gradient precision of the output neuron is increased or decreased

The fixed point data format of (1) represents the gradient of the output neuron of the L-th layer for subsequent operations.

By dynamically adjusting (including increasing or decreasing) the precision S of the input neuron in the operation process of the neural network_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

Referring to fig. 2, fig. 2 is a schematic flow chart of a neural network operation method according to an embodiment of the present invention, and as shown in fig. 2, the method includes:

s201, the neural network operation module acquires the precision, the weight precision and the gradient precision of output neurons of the L-th layer of the neural network.

Wherein the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The values of (A) may be the same, or partially the same or different from each other two by two.

Wherein the neural network is a multilayer neural network, and the L-th input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The input neuron precision, the weight precision and the output neuron gradient precision of any layer of the multilayer neural network are respectively.

In a possible embodiment, the neural network operation module obtains the input neurons, the weights and the output neurons of the L-th layer; obtaining the precision S of the L-th layer input neuron according to the L-th layer input neuron, the weight and the output neuron_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

S202, the neural network operation module calculates to obtain gradient updating precision T according to the precision of the L-th layer input neurons, the weight precision and the gradient precision of the output neurons.

Specifically, the neural network operation module performs precision S on the input neuron according to a first preset formula_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

And calculating to obtain the gradient updating precision T.

Wherein the first predetermined formula is

S203, when the gradient updating precision T is smaller than the preset precision T_rThe neural network operation module adjusts the precision, the weight precision and the output neuron gradient of the L-th layer input neuron so as to update the gradient precision T and preset precision T_rThe absolute value of the difference of (a) is smallest.

The bit width of the fixed-point data format used for representing the input neuron and the fixed-point data format used for representing the weight is a first bit width, and the bit width of the fixed-point data format used for representing the gradient of the output neuron is a second bit width.

Optionally, the second bit width is greater than the first bit width.

Wherein the predetermined accuracy T_rThe setting can be carried out according to experience in advance; the T matched with the input parameters can be obtained by changing the input parameters through a second preset formula_r(ii) a T can also be acquired by a machine learning method_r。

Optionally, the neural network operation module sets the preset precision T according to a learning rate and a batch size (number of samples in batch processing)_r。

Further, if there is a parameter sharing layer (such as convolutional layer and recurrent neural network layer) in the neural network, the predetermined precision T is set according to the number of output neurons in the previous layer, the blocksize, and the learning rate_rThat is, the higher the number of output neurons in the previous layer is, the larger the blocksize is, the higher the learning rate is, and the preset precision T is_rThe larger.

Wherein, the neural network operation module adjusts the precision S of the input neuron_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The method comprises the following steps:

the neural network operation module keeps the precision S of the input neuron_x(l)Sum weight precision S_w(l)The gradient precision of the output neuron is increased without changing

It is noted that due to the above-mentioned output neuron gradient accuracy

The neural network operation module increases the gradient precision of the output neuron

Optionally, the neural network operation module controller unit decreases the decimal part bit width s1 of the fixed point data format representing the weight by a first preset step N1 according to the value of Tr-T.

Specifically, for the decimal part bit width s1 in the fixed point data format representing the gradient of the output neuron, the neural network operation module reduces N1 bits each time, that is, the bit width of the decimal part is s1-N1, and obtains the gradient precision of the output neuron

Then according to the above-mentioned preset formula

Judging whether the absolute value of the difference value between the gradient updating precision T and the preset precision Tr is smaller or not; when the absolute value of the difference between the gradient update precision T and the preset precision Tr is determined to be smaller, the neural network operation module continuously reduces the bit width of the decimal part in the fixed point data format representing the gradient of the output neuron by N1, namely the bit width is s1-2 × N1, and obtains the gradient precision of the output neuron

Continuously judging whether the absolute value of the difference value between the gradient updating precision T and the preset precision Tr is reduced or not; if the size of the sample is smaller, continuing to process according to the method; if the absolute value of the difference between the gradient update precision T and the preset precision Tr is increased during the nth processing, the neural network operation module uses the bit width obtained by the nth-1 processing, i.e., s1- (N-1) × N1, as the bit width of the decimal part of the fixed point data format representing the gradient of the output neuron, and reduces the bit width of the decimal part to obtain the output neuronThe accuracy of the element gradient is

Optionally, the neural network operation module reduces the bit width of the decimal part in the fixed point data format indicating the gradient of the output neuron in a 2-fold decreasing manner.

In one possible embodiment, after the neural network operation module determines the reduction amplitude b of the decimal part bit width of the fixed point data format representing the gradient of the output neuron, the neural network operation module reduces the decimal part bit width of the fixed point data format for a plurality of times, for example, the neural network operation module reduces the decimal part bit width of the fixed point data format for two times, the first reduction amplitude is b1, the second reduction amplitude is b2, and b is b1+ b 2.

Wherein, the b1 and the b2 can be the same or different.

Optionally, the neural network operation module increases the gradient precision of the output neuron

Further, the gradient precision of the output neuron is increased

By reducing the bit width of the decimal part of the fixed point data format representing the gradient of the output neuron, the output neuron is represented byThe bit width of the fixed point data format via the element gradient is not changed, when the bit width of the decimal part is reduced, the bit width of the integer part is increased, the data range represented by the fixed point data format is increased, but the precision represented by the fixed point data format is also increased, so the gradient precision of the output neuron is increased in the neural network operation module

And then, the neural network operation module reduces the bit width of the fixed point data format, and after the bit width of the fixed point data format is reduced, the bit width of the integer part is kept unchanged, namely the reduced value of the bit width of the integer part is the same as the reduced value of the bit width of the decimal part, namely the maximum value represented by the fixed point data format is ensured to be unchanged under the condition that the bit width of the decimal part is changed.

For example, the bit width of the fixed point data format is 9, where the bit width of the sign bit is 1, the bit width of the integer part is 5, and the bit width of the fractional part is 4, and after the bit width of the fractional part and the bit width of the integer part are reduced by the neural network operation module, the bit width of the fractional part is 2, and then the bit width of the integer part is 5, that is, the bit width of the fractional part is reduced, and the bit width of the integer part remains unchanged.

In one possible embodiment, the neural network operation module reduces the gradient precision of the output neuron

Then, the neural network operation module is further configured to:

judging the gradient precision of the output neuron

when the gradient precision of the output neuron

Less than the required accuracyReducing a bit width of a fixed-point data format representing the output neuron gradient.

It should be noted that the neural network operation module increases the gradient precision of the output neuron

The reason for (a) is the output neuron gradient accuracy

Specifically, as can be seen from the above description, the neural network operation module increases the gradient precision of the output neuron

Reducing the accuracy redundancy.

It should be noted that the neural network operation module reduces the bit width of the fixed-point data format, specifically, reduces the bit width of the integer portion of the fixed-point data format.

Further, the reducing, by the neural network operation module, bit width of the fixed-point data format indicating the gradient of the output neuron includes:

the neural network operation module reduces the bit width of the fixed point data format representing the gradient of the output neuron according to a second preset step N2, wherein the second preset step N2 may be 1, 2, 3, 4, 5, 7, 8 or other positive integers.

Specifically, when the bit width of the fixed point data format is determined to be decreased, the decrease value of the neural network operation module every time the bit width of the fixed point data format is decreased is the second preset step N2.

In one possible embodiment, the neural network operation module reduces the bit width of the fixed-point data format representing the gradient of the output neuron, including:

the neural network operation module reduces the bit width of the fixed point data format representing the gradient of the output neuron in a 2-fold decreasing manner.

In one embodiment, the neural network operation module adjusts the precision S of the input neuron_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

Comprises that

The neural network operation module increases the precision S of the input neuron_x(l)And/or accuracy of gradient of the output neurons

Maintaining the above weight precision S_w(l)Unchanged, or;

the neural network operation module increases the precision S of the input neuron_x(l)Reducing the gradient precision of the output neuron

Or is reduced by a reduced magnitude of;

the neural network operation module reduces the gradient precision of the output neurons

the neural network operation module increases or decreases the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

It should be noted that, the neural network operation module applies the weight precision S_w(l)The accuracy S of the input neuron_x(l)And output neuron gradient accuracy

The specific process of performing the reduction operation in any one of the above can be seen in the above neural network operationThe modules augment the above-described related operations and are not described further herein.

S204, the neural network operation module represents the output neurons and the weights of the L-th layer according to the adjusted input neuron precision and the adjusted weight precision; and expressing the gradient of the L-th layer output neuron obtained by operation according to the adjusted gradient precision of the output neuron so as to perform subsequent operation.

Then, the neural network operation module recalculates the gradient updating precision T; when the gradient updating precision is no longer greater than the preset precision T_rThe neural network operation module reduces the input neuron precision S by referring to the method of step S203_x(l)Precision of weight S_w(l)And output neuron gradient accuracy

It should be noted that the frequency of calculating the gradient update precision T by the neural network operation module can be flexibly set according to requirements.

The neural network operation module can adjust and calculate the frequency of the gradient updating precision T according to the training iteration times in the neural network training process.

Optionally, during the neural network training process, the neural network operation module recalculates the gradient update precision T once per iteration; or recalculating the gradient updating precision T every iteration preset times; or the frequency is set according to the change of the gradient update accuracy T.

Optionally, the neural network operation module sets a frequency of calculating the gradient update precision T according to a training iteration number in the neural network training.

It can be seen that, in the solution of the embodiment of the present invention, in the operation process of the neural network, the precision S of the input neuron is dynamically adjusted_xWeight accuracy S_wAnd output neuron gradient accuracy

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A neural network operation module, wherein the neural network operation module is used for performing operations of a multilayer neural network, and comprises:

an arithmetic unit for adjusting the input neuron precision S_x(l)Sum weight precision S_w(l)To represent input neurons and weights of the L-th layer and to adjust the gradient precision of the output neurons according to the adjusted weights

2. The module of claim 1, wherein the controller unit is configured to determine the input neuron precision S based on the input neuron precision_x(l)The weight accuracy S_w(l)And the output neuron gradient precision

Obtaining the gradient update precision T, specifically comprising:

Calculating to obtain the gradient updating precision T;

wherein the preset formula is as follows:

3. the module of claim 2, wherein the controller unit adjusts the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The method comprises the following steps:

4. The module of claim 3, wherein the controller unit increases the output neuron gradient precision

5. A module according to claim 3 or 4, characterized in that the controller unit increases the output neuron gradient precision

Thereafter, the controller unit is further configured to:

judging the gradient precision of the output neuron

when the gradient precision of the output neuron

6. The module according to claim 4 or 5, wherein said controller unit reduces bit widths of said fixed-point data format representing said weights, comprising:

the controller unit reduces the bit width of the fixed point data format representing the gradient of the output neuron according to a preset step length N1;

the preset step length N1 is 1, 2, 4, 6, 7, 8 or other positive integer.

7. The module of claim 4 or 5, wherein the controller unit reduces the bit width of the fixed-point data format representing the output neuron gradient, comprising:

the controller unit reduces the bit width of the fixed-point data format representing the gradient of the output neuron in a 2-fold increasing manner.

8. The module according to any one of claims 1 to 7, wherein the controller unit is further configured to:

obtaining the preset precision T according to a machine learning method_rOr is or；

9. A neural network operation method, comprising:

Calculating to obtain gradient updating precision T;

when the gradient updating precision T is smaller than the preset precision T_rAdjusting the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient

To represent the gradient of the L-th output neuron obtained by operation; for subsequent operations.

10. The method of claim 9, wherein said determining is based on said input neuron precision S_x(l)The weight accuracy S_w(l)And the output neuron gradient precision

Calculating to obtain gradient updating precision T, comprising:

Calculating to obtain the gradient updating precision T;

wherein the preset formula is as follows:

11. the method of claim 10, wherein the adjusting the input neuron precision S_x(l)Weight accuracy S_w(l)And output neuron gradient accuracy

The method comprises the following steps:

12. The method of claim 11, wherein the increasing the output neuron gradient precision

While, decreasing represents the output nerveBit width in fixed point data format of element gradient.

13. The method of claim 12, wherein the increasing the output neuron gradient precision

Thereafter, the method further comprises:

judging the gradient precision of the output neuron

when the gradient precision of the output neuron

14. The method of claim 12 or 13, wherein said increasing a bit width of a fixed-point data format representing said output neuron gradient comprises:

reducing the bit width of the fixed point data format representing the gradient of the output neuron according to a preset step length N1;

the preset step length N1 is 1, 2, 4, 6, 7, 8 or other positive integer.

15. The method of claim 12 or 13, wherein said reducing a bit width of a fixed-point data format representing said output neuron gradient comprises:

reducing the bit width of the fixed-point data format representing the output neuron gradient in a 2-fold increasing manner.

16. The method according to any one of claims 9-15, further comprising: