CN109993279A

CN109993279A - A kind of double-deck same or binary neural network compression method calculated based on look-up table

Info

Publication number: CN109993279A
Application number: CN201910178528.0A
Authority: CN
Inventors: 张萌; 李建军; 李国庆; 沈旭照; 曹晗翔; 刘雪梅; 陈子洋
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2019-03-11
Filing date: 2019-03-11
Publication date: 2019-07-09
Anticipated expiration: 2039-03-11
Also published as: CN109993279B

Abstract

The invention discloses a kind of double-deck same or binary neural network compression methods calculated based on look-up table, the compression method is completed by the double-deck convolutional coding structure, its algorithm is the following steps are included: first, by input feature vector figure after nonlinear activation, batch normalization and two-value activation, the first layer convolution operation that grouping carries out different convolution kernel sizes obtains first layer output result；Then, output characteristic pattern is obtained using the second layer convolution operation of 1 × 1 size to first layer output result.In hardware realization, same or operation is inputted using the three of the double-deck parallel computation to the improved double-deck convolution and all uses look-up table mode to complete to calculate instead of conventional double sequence calculation, and by all double-deck convolution operations, improves hardware resource utilization.It is a kind of hardware algorithm collaborative compression scheme for merging full precision efficient neural network skill and look-up table calculation that the present invention, which provides compression method, there is preferable compression effectiveness in structure, and logical resource consumption is decreased on hardware.

Description

A kind of double-deck same or binary neural network compression method calculated based on look-up table

Technical field

The present invention relates to a kind of FPGA design optimizings of binary neural network, belong to digital image processing techniques neck Domain.

Background technique

Flourishing based on depth learning technology, application of the convolutional neural networks (CNN) in digital image processing field It is more and more extensive.Since most classic AlexNet, the ResNet residual error neural network proposed to Facebook research institute is deep Degree convolutional neural networks start to step into the high-speed developing period, and the performance of neural network is also gradually riseing.In terms of practical application, Google using convolutional neural networks automatic Pilot, in terms of all achieve significant achievement.Convolution mind at the same time Some challenges are also encountered during development through network, as the high calculation amount of convolutional neural networks and high complexity characteristics make CNN is more difficult to be applied in embedded device.

And as mobile intelligent terminal equipment is universal, it is desirable to be also able to achieve in some only equipment of low performance processor The algorithm of neural network.Therefore, the variant BCNN (two-value convolutional neural networks) of CNN with its can without carry out multiplication operate into Row extracts the advantage of feature and receives much attention in terms of lightweight and low-power consumption.Montreal, CAN university in 2016 Courbariaux etc. is in " Binarized neural networks:Training deep neural networks with weights and activations constrained to+1or-1》(arXiv preprint arXiv: 1602.02830,2016.) the novel Binarization methods based on convolutional neural networks are proposed in, he by the weight of neural network and Every layer of activation value binaryzation, saves the time of a large amount of memory space, computing resource and propagated forward, by convolution weight (weights) it is compensated with output characteristic pattern (FeatureMap) multiplied by coefficient, can be realized and mould be not greatly lowered Computation complexity can be theoretically set to reduce by 60% in the case where type precision.This shows that hardware can be effectively reduced in binarization method Resource consumption reduces and calculates cost, improves the processing speed of neural network, and help to realize neural network algorithm on chip. The same year allows traditional convolution multiplication convolution operation to become by the XNORNet that the Mohammad Rastegari of University of Washington is proposed At same or operation, so that the hardware realization of binary neural network becomes easier.But it compares

For the classification capacity of full precision convolutional neural networks, the ability in feature extraction of binary neural network is also deficient It lacks, binaryzation neural network is equivalent to the regularization process of full precision neural network, the complexity of further sparse network.Two-value There is biggish loss after carrying out two-value activation in the feature that change process extracts network, how to extract in the case where binaryzation More validity features become the critical issue of binary neural network.Over the past two years, different proprietary two value-based algorithms was suggested, such as Parallel network PC-BNN ABC-Net etc. has reached preferable effect, but while the raising of two value-based algorithm recognition performances, and Hard-wired cost problem is not reduced greatly, the problems such as simple algorithm but unsuitable hardware occurs.In conclusion two It is worth the algorithm have begun to take shape of convolutional neural networks, it is not that further development, which is conducive to the algorithm of hard-wired binary neural network, Come a direction of binary neural network development.

Since neural network algorithm calculation amount is huge, so directly realizing that these algorithms become in the form of software in terminal Abnormal difficult, the dedicated accelerating hardware of researching neural network is a current development trend.Therefore, different neural networks is dedicated Accelerating structure is proposed, main during designing accelerating hardware consideration is that how to run faster and more save hardware Resource.It is directed to the problem of running faster, what brainstrust was mainly studied is the parallelization operation mode of neural network algorithm, with The characteristics of hardware can execute parallel matches, and carrys out the execution of accelerating algorithm.And aiming at the problem that saving hardware resource, it is main Research direction is the data-reusing contained in neural network algorithm and multiplexing functions part, can reduce opening for hardware resource Pin.

In terms of the hardware realization of binaryzation neural network, accelerate chip fortune according to existing general full precision artificial intelligence Operation is carried out in BCNN, and there are low efficiency, problems at high cost.And for embedded system and other low-power consumption operations Also these high-performance processors can not be used in occasion.Due to the fast development of the algorithm of binary neural network, algorithm structure is more Deformation characteristics become increasingly prevalent the FPGA design realization of network.Tsinghua University is in " FP-BNN:Binarized Neural network on FPGA " general two are realized in (Neurocomputing, 2018,275:1072-1086.) article It is worth neural network accelerator, accelerates in AlexNet structure than realizing 11.6 times of CPU calculating speed and 2.75 times of GPU Computing capability, entire model reach 384GOP/s/w calculating speed on FPGA,

But the structure power consumption and logical resource consumption are relatively large.Therefore, in order in low-power-consumption embedded equipment Using the algorithm of high discrimination, the software and hardware for being conducive to hard-wired algorithm optimization design and the proprietary deployment of FPGA is carried out Collaborative design method.

Summary of the invention

Goal of the invention: for overcome the deficiencies in the prior art, it is same that the present invention provides a kind of bilayer calculated based on look-up table Or binary neural network compression method, network parameter is reduced, computational efficiency is improved and reduces resource consumption.

Technical solution: a kind of double-deck same or binary neural network compression method calculated based on look-up table,

The compression method is completed by the double-deck convolutional coding structure, algorithm the following steps are included:

Firstly, grouping carries out different convolution by input feature vector figure after nonlinear activation, batch normalization and two-value activation The first layer convolution operation of core size obtains first layer output result；

Then, output characteristic pattern is obtained using the second layer convolution operation of 1 × 1 size to first layer output result.

The bilayer convolutional coding structure, hardware realization step include:

(1) after hardware realization nonlinear activation, batch normalization and two-value activation, first layer convolution module is carried out With or treatment process in carry out simultaneously the convolution of the second layer with or processing, realize that the double-deck convolution calculates simultaneously；

(2) output valve that convolution double-deck in step (1) is calculated simultaneously is added using five into three adders progress flowing water Method processing.

The double-deck convolution calculate simultaneously using three input with or processing, wherein three values of input be respectively input feature vector map values, First layer convolution weighted value, second layer convolution weighted value.

The second layer convolution for 1 × 1 size of grouping convolution sum that the double-deck convolution is made of different convolution kernel sizes connects It connects.

The bilayer convolution is calculated simultaneously using look-up table mode, and multiple input single output is basic in foundation look-up table Characteristic, the double-deck convolution calculates three inputs of composition simultaneously together or processing basic unit is realized in a look-up table.

The utility model has the advantages that using ability in feature extraction the invention proposes the double-deck same or two-value network calculated based on look-up table Stronger composite double layer convolution kernel replaces traditional convolution kernel, and uses three inputs same or calculate and eliminate convolution behaviour in bilayer convolution Make non-two-value situation, further reduces binary neural network parameter amount and computation complexity.This structure uses CIFAR-10 number The validity of algorithm is demonstrated according to collection.

Detailed description of the invention

Fig. 1 is the double-deck schematic diagram of single module；

Fig. 2 is sign function and gradient updating functional arrangement；

Fig. 3 is binary neural network algorithm and improvement structure chart；

Fig. 4 is that three inputs are same or the conversion of hardware realization process is schemed；

Fig. 5 is that the general hardware of the double-deck convolution realizes frame diagram.

Specific embodiment

Technical solution of the present invention is described further with reference to the accompanying drawing.

A kind of double-deck same or binary neural network compression method calculated based on look-up table, the compression method is by twin-laminate roll Product structure is completed, and algorithm is the following steps are included: firstly, by input feature vector figure by nonlinear activation, batch normalization and two-value After activation, the first layer convolution operation that grouping carries out different convolution kernel sizes obtains first layer output result.Then, to first layer Output result obtains output characteristic pattern using the second layer convolution operation of 1 × 1 size.

Its hardware realization step includes:

Below with reference to example, the present invention will be further explained, is carried out using the convolution sizes such as 3 × 3,1 × 3,3 × 1 It illustrates.3 × 3 common convolution operations of left side as shown in Figure 1, is replaced with the double-deck convolution on right side by convolution operation.Assuming that Input activation port number is n, and output activation port number is m, then the traditional convolution kernel size in left side is n*m*9, right side Pconv3 × 3 represent 3 × 3 convolution, and pconv1 × 3 represent 1 × 3 convolution, pconv3 × 1 represents 3 × 1 convolution, and parameter is respectively N/8*m/8*9, n/8*m/8*3, n/8*m/8*3,1 × 1 convolution operation size are n*m, and the double-deck deconvolution parameter summation on right side is N*m*1.75, parameter amount are the 1/5 of common convolution, greatly reduce the number and convolutional calculation amount of parameter.Wherein parameter subtracts It is few that the loss of further precision is not brought to be to be put forward for the first time without two-value activation between the double-deck convolution, using hardware Realize binarization；Two-value activation can all extract original network convolution process each time in binary neural network Feature is changed into a kind of new feature with part validity feature, meanwhile, the reversed gradient communication process of network is caused sternly The influence of weight, gradient can not be propagated forward when gradient being caused to propagate at this, and as shown in Fig. 2 (a), sign function is greater than null part It is 1, being less than null part is 0, and for its gradient to be infinite at zero, other place gradients are all zero.So needing new gradient letter Number (shown in such as Fig. 2 (b)) is for carrying out solving the problems, such as that gradient can not backpropagation.New ladder of this structure similar to Gaussian Profile Function is spent, the gradient distribution of sign function is both met to a certain extent, also further reduces the normal loss of binaryzation Process plays certain correcting action, and the training speed of network and accuracy rate is made all to be improved.But gradient is repaired Gradient propagation problem is just being solved only, the loss during forward-propagating can not solved effectively, therefore is being needed in feature extraction It reduces the loss of two-value activation bring as far as possible in the process, reduces the training difficulty and the loss of precision of network.According to It should be extracted as far as possible with the less number of plies during designing binary neural network algorithm known to above description more effective Feature just meets the algorithm characteristic of two-value network using the double-layer network of such as Fig. 1 (b).Test through neural network proves, special The feature extraction for levying figure direction is more even more important than the feature extraction between channel, by the number and reduction that increase first layer channel Second layer number of channels can extract more features under less parameter.

To verify algorithm part of the invention, experiment using Tensorflow build based on two-value bilayer with or convolutional Neural Network algorithm uses 4 parallel 3 × 3 convolution kernels, 21 × 3 convolution kernels, 23 × 1 convolution kernels and 11 × 1 convolution kernel generation For 3 × 3 convolution kernels.Test is compared using convolutional neural networks structure shown in Fig. 3, Fig. 3 (a) is common residual error Neural network includes seven convolution modules, in first module, two-value weight convolution is used to grasp after batch normalization operation Make, port number 128；In second to the 7th convolution module, each module has 13 × 3 convolution operation, port number Respectively 128,256,256 and 512；After each convolution operation all can and then one PBA layers, by nonlinear function active coating, Batch normalized layer and two-value active coating composition, can all have after the second, the 4th and layer 7 convolution module one most Great Chiization layer；And then a full articulamentum after 7th convolution module, is 32 × 32 threeway due to using size Road color image CIFAR-10 data set is tested and is trained, and wherein CIFAR-10 is 10 classification, therefore most for CIFAR-10 The output channel number of the full articulamentum of the latter is 10, finally accesses normalization exponential function (Softmax) layer and completes classification Operation, Fig. 3 (b) are using present invention network improved on the basis of Fig. 3 (a), as shown in dotted line frame module, from second convolution Module starts, and replaces former 3 × 3 convolution kernels using the double-deck same or convolution kernel, the middle layer feature port number of the double-deck convolutional coding structure can Self-defining, increase appropriate can enhance the overall performance of network, and other parts are then kept as former network.Fig. 3 (a), (b) Network model is built using Tensorflow to be trained and tests, and table 1 gives the model pair of 250 wheel of training under the same number of plies Compare situation.

As shown in table 1, the same number of plies, after training 250 is taken turns, at CIFAR-10, ResNet (- 7 layers of residual error neural network) Network test accuracy rate is 87%, is using improved two-value residual error network (PM-ResNet-7) test accuracy rate of the present invention 86.1%；Table one gives the network parameter comparative situation under CIFAR-10 data set, and primitive network number of parameters is 2.83M, And improved network is only 1.08M, slip reaches 63%.The reduction of parameter necessarily reduces the convolution operation number of network, Therefore in the case where guaranteeing test accuracy rate and full two-value, the computation complexity of network is greatly reduced, when having saved calculating Between.

The parameter and accuracy rate of 1 heterogeneous networks model of table compare

Data set	Model	Number of parameters	Accuracy rate
				CIFAR-10	ResNet-7	2.83M	87%
CIFAR-10	PM-ResNet-7	1.08M	86.1%

In terms of hardware realization, as shown in Fig. 4 (a), the convolution operation step of normal double-layer network is first to calculate first Output valve o11, o12, o13 of layer convolution, the convolution operation that calculating resulting value carries out 1x1 again obtain o1, o2, o3.The first meter Calculation mode such as formula (1), (2), (3), (4) are shown, and first layer convolution operation resulting value needs to carry out corresponding sum operation, therefore It is obtaining the result is that bit value more than one.When carrying out the calculating of next step O1 result, though weight

O11=I11W111+I12W112+ ...+I21W121+I22W122+ ...+I31W131+I32W132+ ...+I39W139 (1)

O12=I11W211+I12W212+ ...+I21W221+I22W222+ ...+I31W231+I32W232+ ...+I39W239 (2)

O13=I11W311+I12W312+ ...+I21W321+I22W322+ ...+I31W331+I32W332+ ...+I39W339 (3)

O1=O11* × 11+O12* × 12+O13*x13 (4)

Part x11, x12, x13 are single-bit, but output result O11, O12, O13 of first layer are more bit values, therefore only It can be carried out addition and subtraction operation, the advantage that convolution operation is able to use with or operates when losing binary neural network hardware realization, The hardware resource cost of this method is also larger.And improved calculating mode such as Fig. 4 (b) shown in, using fusion with or method only Three input values need to be used same or judgement can export first layer result removal, not only solve the non-single-bit of middle layer.It can not The difficult point for carrying out with or calculating again, and eliminates two-value activation bring loss of significance, allows binary neural network can be More features are extracted in the case of less parameter.By formula (5) it is found that the final result O1 calculated need to only carry out three single-bits Required result can be obtained by carrying out the cumulative summation of 1 bit after same or operation again.Since the adder of multidigit is relative to an addition Device resource consumption is bigger, and bring clock delay is also more, increases complexity to the realization of circuit.

At the same time, three single-bits are same or network calculates on look-up table relative to common calculation has more The advantages of.It is found that the programmable logic resource of FPGA mainly consists of two parts by taking FPGA as an example, a part is by look-up table (LUT) combinational circuit is realized, a part is to realize sequence circuit by register.It is consumed most during realizing neural network hardware It is exactly convolution algorithm more, and convolution algorithm is mainly multiplication and addition and subtraction operation, is mainly addition and subtraction in binary neural network With with or operation, but a large amount of combinational logic circuit is all consumed, so being a key point to the optimization of combinational logic circuit.Root According to the characteristic of LUT, each LUT is to look for table, it can be achieved that different logic function, but it is fixed for outputting and inputting mode , for the FPGA device of the neural network hardware realization of current main-stream, the LUT of FPGA mostly uses 4,6 input modes.For 6 Input pattern, the LUT there are three types of mode, respectively can -1 output of one 6 input of reality, two -1 outputs of 3 input or one it is 5 defeated Enter -2 output modes.According to same or principle it is found that two single-bits inputs carry out with or obtain single-bit output, three single-bits Input carries out with or is still single-bit output.Meanwhile being limited according to the output bit of LUT, three two inputs are same or operate not Can be realized on 1 LUT, thus under normal circumstances two two input it is same or operation required for LUT and two three input LUT required for same or operation is identical, and also needs to carry out after the normal same or convolution under the first mode secondary Convolution operation, therefore consumed hardware resource need to consume more logic moneys relative to second of one-time calculation bilayer convolution Source, therefore second of mode disclosed by the invention can save considerable hardware resource.

Whole hardware realization for double-layer network will be as shown in figure 5, this example will be counted parallel using 8 parallel modules It calculates, by input channel p, characteristic pattern size is that the convolution module of m*n is divided into 8 modules, and input feature vector figure size is p/8*m*n, For 3 × 3 convolution modules；First by the upper left hand corner section of input feature vector figure, size is the value of p/8*3*3 and 4 difference volume 3 × 3 Product core, size are that the weight matrix of p/8*3*3 carries out convolutional calculation respectively, and as seen from the figure, first part's output valve number is equal to The number of weight, same or operation do not reduce output valve number, and gained matrix size such as Fig. 5 intermediary matrix show p/8*4*9. Obtained intermediate result continues and next 1 × 1 convolution kernel carries out convolution operation, is carried out by 128 4*1 matrixes to intermediary matrix Sliding is same or operation obtains the matrix in 128 channels of output, matrix size 128*p/8*4*9, and obtained matrix carries out non-channel It sums on direction, each Matrix Calculating and number are p/8*4*9, i.e. channel direction does not need to add up.The adder of the matrix is adopted It is compressed with five into three additions, firstly, the characteristics of being exported according to the 5 of LUT inputs 2,5 value summations only need the consumption of 2 LUT. Secondly, according to 5 into inputting the higher feature of the more resource utilizations of number parallel in 3 addition water operations, by it is double-deck with or institute Progress parallel addition calculating, which must be worth, will greatly improve resource utilization.Identical place is used for 1 × 3 or 3 × 1 convolution modules Reason mode is calculated, and final parallel computation obtains summation on 81 directions of 128*1 matrix values progress and obtains final 128 output Feature map values.

It should be pointed out that for those skilled in the art, without departing from the principle of the present invention, Several improvements and modifications can also be made, these modifications and embellishments should also be considered as the scope of protection of the present invention.In the present embodiment not The available prior art of specific each component part is realized.

Claims

1. a kind of double-deck same or binary neural network compression method calculated based on look-up table, it is characterised in that:

Firstly, grouping carries out different convolution kernel rulers by input feature vector figure after nonlinear activation, batch normalization and two-value activation Very little first layer convolution operation obtains first layer output result；

2. a kind of double-deck same or binary neural network compression method calculated based on look-up table according to claim 1, Be characterized in that: the bilayer convolutional coding structure, hardware realization step include:

(1) hardware realization nonlinear activation, batch normalization and two-value activation after, to first layer convolution module carry out with or Carried out simultaneously in treatment process the convolution of the second layer with or processing, realize that the double-deck convolution calculates simultaneously；

(2) output valve that convolution double-deck in step (1) is calculated simultaneously carries out at flowing water addition using five into three adders Reason.

3. a kind of double-deck same or binary neural network compression method calculated based on look-up table according to claim 1, Be characterized in that: the double-deck convolution and meanwhile calculate using three input with or processing, wherein three values of input are input feature vector figure respectively Value, first layer convolution weighted value, second layer convolution weighted value.

4. a kind of double-deck same or binary neural network compression method calculated based on look-up table according to claim 1, Be characterized in that: the second layer convolution for 1 × 1 size of grouping convolution sum that the double-deck convolution is made of different convolution kernel sizes connects It connects.

5. a kind of double-deck same or binary neural network compression method calculated based on look-up table according to claim 1, Be characterized in that: the bilayer convolution is calculated simultaneously using look-up table mode, the base according to multiple input single output in look-up table This characteristic, the double-deck convolution calculates three inputs of composition simultaneously together or processing basic unit is realized in a look-up table.