This application claims the Chinese patent application 201610663201.9 formerly submitted, a kind of " optimized artificial neural network
Method " and Chinese patent application 201610663563.8 " a kind of to be used to realize ANN advanced treatment unit " priority.
Embodiment
A part of content of the application is once by inventor Yao Song academic article " Going Deeper With
Embedded FPGA Platform for Convolutional Neural Network " (2016.2) are delivered.The application
The content of above-mentioned article is included, and has carried out more improvement on its basis.
In the application, mainly it will illustrate improvement of the present invention to CNN by taking image procossing as an example.The scheme of the application is applicable
In various artificial neural networks, including deep neural network (DNN), Recognition with Recurrent Neural Network (RNN) and convolutional neural networks
(CNN).Illustrated below by taking CNN as an example
CNN basic conceptions
CNN reaches state-of-the-art performance in extensive visual correlation task.Help, which is understood in the application, to be analyzed
Based on CNN image classification algorithms, we describe CNN rudimentary knowledge first, introduce image network data set and existing CNN moulds
Type.
As shown in figure 1, typical CNN is made up of a series of layer of orderly functions.
The parameter of CNN models is referred to as " weight " (weights).CNN first layer reads input picture, and exports a system
The characteristic pattern (map) of row.Following layer reads the characteristic pattern as caused by last layer, and exports new characteristic pattern.Last point
The probability for each classification that class device (classifier) output input picture may belong to.CONV layers (convolutional layer) and FC layers are (complete
Even layer) it is two kinds of basic channel types in CNN.After CONV layers, generally there is tether layer (Pooling layers).
In this application, for a CNN layer,J-th of input feature vector figure (input feature map) is represented,Represent i-th of output characteristic figure (output feature map), biRepresent the bias term of i-th of output figure.
For CONV layers, ninAnd noutThe quantity of input and output characteristic figure is represented respectively.
For FC layers, ninAnd noutThe length of input and output characteristic vector is represented respectively.
The definition of CONV layers (Convolutional layers, convolutional layer):CONV layers are using series of features figure as defeated
Enter, and output characteristic figure is obtained with convolution kernels convolution.
The non-linear layer being generally connected with CONV layers, i.e. nonlinear activation function, be applied to every in output characteristic figure
Individual element.
CONV layers can be represented with expression formula 1:
Wherein gi,jIt is applied to the convolution kernels of j-th of input feature vector figure and i-th of output characteristic figure.
The definition of FC layers (Fully-Connected layers, connect layer entirely):FC layers are applied on input feature value
One linear transformation:
fout=Wfin+b (2)
W is a nout×ninTransformation matrix, b are bias terms.It is noted that for FC layers, input is not several two dimensions
The combination of characteristic pattern, but a characteristic vector.Therefore, in expression formula 2, parameter ninAnd noutActually correspond to input and
The length of output characteristic vector.
Collect (pooling) layer:Generally it is connected with CONV layers, for exporting each subregion in each characteristic pattern
(subarea) maximum or average value.Pooling maximums can be represented by expression formula 3:
Wherein p is the size for collecting kernel.This nonlinear " down-sampled " is not only that next layer reduces characteristic pattern
Size and calculating, additionally provide a kind of translation invariant (translation invariance).
CNN can be used for during forward inference carrying out image classification.But before CNN is used to any task, it should first
First train CNN data sets.It has recently been demonstrated that the CNN of the forward direction training based on large data sets for a Given task
Model can be used for other tasks, and realize high-precision minor adjustment in network weight (network weights), this
Minor adjustment is called " fine setting (fine-tune) ".CNN training is mainly realized on large server.For embedded
FPGA platform, we are absorbed in the reasoning process for accelerating CNN.
Fig. 2 is shown to accelerate a whole set of technical scheme that CNN proposes from handling process and the angle of hardware structure.
Artificial nerve network model is shown on the left of Fig. 2, that is, the target to be optimized.Illustrated how between in fig. 2 compression,
Quantify, compiling CNN models reduce loss of significance to greatest extent to reduce EMS memory occupation and operation amount.It is aobvious on the right side of Fig. 2
The specialized hardware provided for the CNN after compression has been provided.
The dynamic quantization scheme for neutral net of connecting
Fig. 3 shows the more details of Fig. 2 quantization step.
For a fixed-point number, its value represents as follows:
Wherein bw is several bit widths, flBe can be negative part length (fractional length).
In order to obtain full accuracy while floating number is converted into fixed-point number, it is proposed that a dynamic accuracy data quantization
Tactful and automatic workflow.
It is different from former static accuracy quantization strategy, in the data quantization flow proposed, flFor different layers and
Feature atlas is dynamic change, while keeps static in one layer, to reduce by every layer of truncated error as far as possible.
As shown in figure 3, the quantization flow that the application is proposed mainly is made up of two stages:Weight quantization stage and data
Quantization stage.
The purpose of weight quantization stage is the optimal f for the weight for finding a layerl, such as expression formula 5:
Wherein W is weight, and W (bw, fl) is represented in given bw and flUnder W fixed point format.
Alternatively, the dynamic range of each layer of weight is analyzed first, such as is estimated by sampling.Afterwards, in order to
Avoid data from overflowing, initialize fl.In addition, we are in initial flThe optimal f of neighborhood searchl。
Alternatively, in weight pinpoints quantization step, optimal f is found using another wayl, such as expression formula 6.
Wherein, i represents a certain position in bw positions, kiFor this weight.By the way of expression formula 6, to different positions
Different weights is given, then calculates optimal fl。
The data quantization stage is it is intended that the feature atlas between two layers of CNN models finds optimal fl., can be with this stage
CNN is trained using training dataset (bench mark).The training dataset can be data set0.
Alternatively, all CNN CONV layers are completed first, the weight of FC layers quantifies, then carries out data quantization.Now,
Training dataset is input to the CNN for being quantized weight, and by the successively processing of CONV layers, FC layers, it is special to obtain each layer input
Sign figure.
For each layer of input feature vector figure, successively compared in fixed point CNN models and floating-point CNN models using greedy algorithm
Between data, to reduce loss of significance.Each layer of optimization aim is as shown in expression formula 7:
In expression formula 7, when A represents the calculating of one layer (such as a certain CONV layers or FC layers), x represents input, x+=A
During x, x+Represent the output of this layer.It is worth noting that, for CONV layers or FC layers, direct result x+With than given standard
Longer bit width.Therefore, as optimal flNeed to block during selection.Finally, whole data quantization configuration generation.
According to another embodiment of the invention, in data pinpoint quantization step, found most using another way
Good fl, such as expression formula 8.
Wherein, i represents a certain position in bw positions, kiFor this weight.It is similar with the mode of expression formula 6, to different
Different weights is given in position, then calculates optimal fl。
Above-mentioned data quantization step obtains optimal fl。
In addition, weight quantifies and data quantization can be alternately.For the flow order of data processing, the ANN
Convolutional layer (CONV layers), to connect each layer in layer (FC layers) entirely be series relationship, the training dataset is by the CONV of the ANN
Each feature atlas that layer and FC layers obtain when handling successively.
Specifically, the weight quantization step and the data quantization step according to the series relationship alternately,
Wherein completed wherein in the weight quantization step when the fixed point of after pinpointing quantization of preceding layer, next layer of beginning quantifies it
Before, data quantization step is performed to the feature atlas exported when preceding layer.
The above-mentioned scheme for successively becoming precision fixed point method and device and being applied to simple branchiess neutral net.
Fig. 4 shows pure serial neutral net, any one layer of Layer N of neutral net, one and only one forerunner
Layer, one and only one subsequent layer.Basic procedure:For the neutral net of input, by from output is input to, successively by a letter
Number causes error to minimize, and decides each layer of precision, to the last one layer.
Positioned ways shown in Fig. 5:Most suitable fixed position is found by the way of successively pinpointing.
It can be seen that Fig. 5 method needs online generation fixed-point number neutral net.So-called " online ", exactly chooses some
Typical picture, the serial picture is tested, the result of centre is just can know that during testing these pictures.
Neutral net is pinpointed because Fig. 5 scheme employs the mode successively pinpointed, so needing one to support fixed-point number
Testing tool, the output result of fixed point last layer is had already passed through during the input of instrument, the output of instrument is this layer of fixed-point number network
Result.
The fixed point dynamic quantization scheme of complex network
Fig. 5 scheme is successively propagated using fix-point method, the front layer that each layer of fixed point will rely on.For network structure
That branch merges with branch be present to have no idea to handle.
Fig. 5 scheme for current trend network (GoogLeNet, SqueezeNet etc.) and do not apply to.Fig. 5 method
Each layer of fixed-point operation all relies on the fixed point results of last layer, so having considerable restraint to network structure.
Fig. 6 a-6c show an example GoogLeNet of the neutral net of complexity, and wherein network has multiple branches, together
When the relation including series, parallel, Fig. 6 c are the inputs of GoogLeNet models, and Fig. 6 a are the output of GoogLeNet models.Fig. 6
Shown complex network GoogLeNet more information may refer to Christian Szegedy et al. Going deeper
The texts of with convolutions mono-.
For there is the network of branch (such as GoogLeNet), the cascade (CONCAT, concatenation) for multilayer
There is the input that multiple layers of output is linked into CONCAT on operation, upper strata.
CONCAT operations just refer to the data of each input layer being attached (cascade, CONCATenation) according to passage
Into new one layer, then export to next layer.Such as CONCAT input has two layers:A and input B are inputted, inputs A characteristic pattern
Size is WxH, port number C1, and the characteristic pattern size for inputting B is WxH, port number C2.By the spy after CONCAT layers
It is WxHx (C1+C2) to levy figure dimension.
In an example as shown in Figure 7, CONCAT layers have 4 inputs, 1*1 convolutional layer, 3*3 convolutional layer, 5*5
The maximum tether layer of convolutional layer, 3*3, CONCAT layers cascade up this 4 inputs, there is provided an output.There is the complexity of branch
Neutral net needs CONCAT to operate, so as to have corresponding CONCAT layers in neural network model.
Fig. 8 shows the example for the operation that CONCAT layers perform.
BLOB (binary large object) is binary large object, is an appearance that can store binary file
Device.In a computer, BLOB is often the field type for being used for storing binary file in database.BLOB can be understood as one
Individual big file, typical BLOB is a pictures or an audio files, due to their size, it is necessary to uses special mode
Come handle (such as:Upload, download or be stored in a database).
In embodiments of the present invention, BLOB can be understood as the data structure of the four-dimension.CONCAT layers are the multiple of previous stage
BLOB1, BLOB2 of the output of layer ... BLOBn is cascaded as an output.
Further, realize within hardware CONCAT operate, by change each input BLOB1, BLOB2 ... BLOBn is in internal memory
In the position (memory address) put realize the merging of branch.
As shown in figure 8, BLOB 1,2,3 ... n fixed point configuration information may be inconsistent.However, in actual hardware,
It is required that the fixed point configuration of all inputs of CONCAT layers is consistent.If it is inconsistent to pinpoint configuration information, CONCAT can be caused
The data collision of layer, neutral net successively can not be run to next layer again.
As shown in figure 9, in order to solve the above problems, we determine determining for the input range of neutral net using new method
Point position.
In the method as shown in figure 9, CNN (convolutional neural networks) is the neutral net for having branch, is comprised at least:1st,
2nd ... n-th of convolutional layer (CONV layers), the 1st, the 2nd ... m-th complete to connect layer (FC layers), the 1st, the 2nd ... l-th of CONCAT layer,
Wherein described n, m, l are positive integer.
Weight parameter flow shown in Fig. 9 left-hand branch is roughly the same with Fig. 5.It is different from method shown in Fig. 5, in Fig. 9
In the data quantization flow of right-hand branch, comprise the following steps.
First, the number range of the output to each layer (each layer in CONV layers, FC layers, CONCAT layers) of the CNN
Estimated, wherein the numerical value is floating number.
According to one embodiment of present invention, wherein first step includes:Input data is supplied to the CNN, it is described
Input data connects layer (FC layers), l CONCAT layers processing entirely by the m convolutional layer (CONV layers) of the CNN, n, obtains each layer
Output.
Second, the number range of above-mentioned output is quantified as fixed-point number from floating number fixed point.
Every layer of output is quantified as fixed-point number by above-mentioned steps from floating number, wherein the output for every layer is dynamically chosen
Quantizing range, the quantizing range are constant in this described layer.
According to one embodiment of present invention, optimal f can be calculated by the way of formula 7 or 8l, so that it is determined that fixed
Point quantizing range.
3rd, the fixed point quantizing range of the output based on CONCAT layers, each input for changing the CONCAT layers is determined
Point quantizing range.
Third step includes:Each CONCAT layers in the CNN are determined, wherein each CONCAT layers are the more of previous stage
The output of oneself is merged into the output of individual layer.For example, multiple sub-networks can be found out from the network model of the CNN,
Each sub-network is using CONCAT layers as last layer, so as to be handled using the sub-network as unit.
According to one embodiment of present invention, third step also includes:It is defeated to multiple layers of the previous stage of CONCAT layers
Go out, the fixed point quantizing range and the fixed point amount of the output of the CONCAT layers of every layer of output of previous stage of the CONCAT layers
Change scope.If it is not the same, then the fixed point quantizing range of the input is revised as the fixed point amount with the output of the CONCAT layers
Change scope.
According to one embodiment of present invention, third step also includes:If some is defeated for the previous stage of a CONCAT layer
It is another CONCAT layer to enter, then performs the step of step the three using another described CONCAT layer as another sub-network, iteration
Suddenly.
As shown in Fig. 9 left-hand branch, with according to one embodiment of the present of invention, in addition to:Weight pinpoints quantization step,
The weight of each layer is quantified as fixed-point number from floating number in CONV layers, FC layers and CONCAT layers.
In addition, the weight quantization flow of left-hand branch shown in Fig. 9 and the data quantization flow of right-hand branch can be simultaneously
Perform, can also be alternately performed.
For example, before performing data fixed point quantization step, first to described in the completion of all CONV layers, FC layers and CONCAT layers
Weight pinpoints quantization step.
Or the weight quantization step and the data quantization step can be alternately.Located according to input data
The order of reason, the weight quantization step complete the convolutional layer (CONV layers), connect full layer (FC layers), CONCAT layers it is current
One layer of fixed point quantify after, before the fixed point for starting next layer quantifies, output to this layer performs data quantization step.
According to one embodiment of present invention, also additionally four steps is included:After the third step, the CONV is exported
The fixed point quantizing range of each layer of output in layer, FC layers, CONCAT layers.
Figure 10 shows the fixed point quantization that previous stage input is adjusted according to CONCAT layers based on the embodiment of the present invention
The example put.
In Figure 10 example, CONCAT layers have two inputs, are convolutional layer CONV3, CONV4 respectively, the input of CONV3 layers
For CONV2, the input of CONV2 layers is CONV1.
According to Fig. 9 flow, first step includes:Enter data into neutral net, obtain each layer of output number
According to, and determine the number range of the output of each layer.For example, Figure 10 shows the numerical value of the output of each CONV layers and CONCAT layers
Scope, such as with Gaussian Profile.
Second step, the number range of the output of each layer is quantified as fixed-point number from floating number fixed point.For example, reference formula
4, it is assumed that floating number is quantified as the fixed-point number of 8, i.e. bw=8.For example, the output fixed point of CONV3 layers is fl=5, CONV4
The output fixed point of layer is flThe output fixed point of=3, CONCAT layer is fl=4.
Third step, the fixed point configuration information of the output based on the CONCAT layers, to change each of the CONCAT layers
Individual input CONV3, CONV4 fixed point quantizing range.
For example, the fixed point quantizing range (f of the output of the CONV3l=5) determine with the output of the CONCAT layers
Point quantizing range (fl=4).Both differ, then the fixed point quantizing range of CONV3 output are revised as and the CONCAT
The fixed point quantizing range of the output of layer.Therefore, the fixed point quantizing range of CONV3 output is by modification fl=4.
Next, the fixed point quantizing range (f of the output of the CONV4l=3) with the output of the CONCAT layers
Pinpoint quantizing range (fl=4), both differ, then the fixed point quantizing range of CONV4 output be revised as with it is described
The fixed point quantizing range of the output of CONCAT layers.Therefore, the fixed point quantizing range of CONV4 output is by modification fl=4.
If CONCAT layers also have other inputs, modify in a similar manner.
In addition, if CONCAT layers CONCAT1Some input and a CONCAT layers CONCAT1, then iteration behaviour is performed
Make.First, by the latter CONCAT2It is considered as the input of previous stage, according to output CONCAT1Fixed point configuration modify;Then,
Again by amended CONCAT2As output, CONCAT is changed2The each input of previous stage fixed point configuration.
Moreover, it will be appreciated that the solution of the present invention is applied to various forms of complicated artificial neural networks, it is not limited solely to
Artificial neural network with CONCAT cascade operations, with branch.In addition, CONCAT operations should also be as managing as broad sense
Solution, i.e. different sub-networks (or, network branches) is combined as the operation of a network.
In addition, " multiple " in description of the invention and claim refer to two or more.
It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment emphasis
What is illustrated is all the difference with other embodiment, between each embodiment identical similar part mutually referring to.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, can also pass through
Other modes are realized.Device embodiment described above is only schematical, for example, flow chart and block diagram in accompanying drawing
Show the device of multiple embodiments according to the present invention, method and computer program product architectural framework in the cards,
Function and operation.At this point, each square frame in flow chart or block diagram can represent the one of a module, program segment or code
Part, a part for the module, program segment or code include one or more and are used to realize holding for defined logic function
Row instruction.It should also be noted that at some as in the implementation replaced, the function that is marked in square frame can also with different from
The order marked in accompanying drawing occurs.For example, two continuous square frames can essentially perform substantially in parallel, they are sometimes
It can perform in the opposite order, this is depending on involved function.It is it is also noted that every in block diagram and/or flow chart
The combination of individual square frame and block diagram and/or the square frame in flow chart, function or the special base of action as defined in performing can be used
Realize, or can be realized with the combination of specialized hardware and computer instruction in the system of hardware.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area
For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies
Change, equivalent substitution, improvement etc., should be included in the scope of the protection.It should be noted that:Similar label and letter exists
Similar terms is represented in following accompanying drawing, therefore, once being defined in a certain Xiang Yi accompanying drawing, is then not required in subsequent accompanying drawing
It is further defined and explained.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained
Cover within protection scope of the present invention.Therefore, protection scope of the present invention described should be defined by scope of the claims.