CN109993279A - A kind of double-deck same or binary neural network compression method calculated based on look-up table - Google Patents
A kind of double-deck same or binary neural network compression method calculated based on look-up table Download PDFInfo
- Publication number
- CN109993279A CN109993279A CN201910178528.0A CN201910178528A CN109993279A CN 109993279 A CN109993279 A CN 109993279A CN 201910178528 A CN201910178528 A CN 201910178528A CN 109993279 A CN109993279 A CN 109993279A
- Authority
- CN
- China
- Prior art keywords
- convolution
- double
- deck
- look
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a kind of double-deck same or binary neural network compression methods calculated based on look-up table, the compression method is completed by the double-deck convolutional coding structure, its algorithm is the following steps are included: first, by input feature vector figure after nonlinear activation, batch normalization and two-value activation, the first layer convolution operation that grouping carries out different convolution kernel sizes obtains first layer output result;Then, output characteristic pattern is obtained using the second layer convolution operation of 1 × 1 size to first layer output result.In hardware realization, same or operation is inputted using the three of the double-deck parallel computation to the improved double-deck convolution and all uses look-up table mode to complete to calculate instead of conventional double sequence calculation, and by all double-deck convolution operations, improves hardware resource utilization.It is a kind of hardware algorithm collaborative compression scheme for merging full precision efficient neural network skill and look-up table calculation that the present invention, which provides compression method, there is preferable compression effectiveness in structure, and logical resource consumption is decreased on hardware.
Description
Technical field
The present invention relates to a kind of FPGA design optimizings of binary neural network, belong to digital image processing techniques neck
Domain.
Background technique
Flourishing based on depth learning technology, application of the convolutional neural networks (CNN) in digital image processing field
It is more and more extensive.Since most classic AlexNet, the ResNet residual error neural network proposed to Facebook research institute is deep
Degree convolutional neural networks start to step into the high-speed developing period, and the performance of neural network is also gradually riseing.In terms of practical application,
Google using convolutional neural networks automatic Pilot, in terms of all achieve significant achievement.Convolution mind at the same time
Some challenges are also encountered during development through network, as the high calculation amount of convolutional neural networks and high complexity characteristics make
CNN is more difficult to be applied in embedded device.
And as mobile intelligent terminal equipment is universal, it is desirable to be also able to achieve in some only equipment of low performance processor
The algorithm of neural network.Therefore, the variant BCNN (two-value convolutional neural networks) of CNN with its can without carry out multiplication operate into
Row extracts the advantage of feature and receives much attention in terms of lightweight and low-power consumption.Montreal, CAN university in 2016
Courbariaux etc. is in " Binarized neural networks:Training deep neural networks with
weights and activations constrained to+1or-1》(arXiv preprint arXiv:
1602.02830,2016.) the novel Binarization methods based on convolutional neural networks are proposed in, he by the weight of neural network and
Every layer of activation value binaryzation, saves the time of a large amount of memory space, computing resource and propagated forward, by convolution weight
(weights) it is compensated with output characteristic pattern (FeatureMap) multiplied by coefficient, can be realized and mould be not greatly lowered
Computation complexity can be theoretically set to reduce by 60% in the case where type precision.This shows that hardware can be effectively reduced in binarization method
Resource consumption reduces and calculates cost, improves the processing speed of neural network, and help to realize neural network algorithm on chip.
The same year allows traditional convolution multiplication convolution operation to become by the XNORNet that the Mohammad Rastegari of University of Washington is proposed
At same or operation, so that the hardware realization of binary neural network becomes easier.But it compares
For the classification capacity of full precision convolutional neural networks, the ability in feature extraction of binary neural network is also deficient
It lacks, binaryzation neural network is equivalent to the regularization process of full precision neural network, the complexity of further sparse network.Two-value
There is biggish loss after carrying out two-value activation in the feature that change process extracts network, how to extract in the case where binaryzation
More validity features become the critical issue of binary neural network.Over the past two years, different proprietary two value-based algorithms was suggested, such as
Parallel network PC-BNN ABC-Net etc. has reached preferable effect, but while the raising of two value-based algorithm recognition performances, and
Hard-wired cost problem is not reduced greatly, the problems such as simple algorithm but unsuitable hardware occurs.In conclusion two
It is worth the algorithm have begun to take shape of convolutional neural networks, it is not that further development, which is conducive to the algorithm of hard-wired binary neural network,
Come a direction of binary neural network development.
Since neural network algorithm calculation amount is huge, so directly realizing that these algorithms become in the form of software in terminal
Abnormal difficult, the dedicated accelerating hardware of researching neural network is a current development trend.Therefore, different neural networks is dedicated
Accelerating structure is proposed, main during designing accelerating hardware consideration is that how to run faster and more save hardware
Resource.It is directed to the problem of running faster, what brainstrust was mainly studied is the parallelization operation mode of neural network algorithm, with
The characteristics of hardware can execute parallel matches, and carrys out the execution of accelerating algorithm.And aiming at the problem that saving hardware resource, it is main
Research direction is the data-reusing contained in neural network algorithm and multiplexing functions part, can reduce opening for hardware resource
Pin.
In terms of the hardware realization of binaryzation neural network, accelerate chip fortune according to existing general full precision artificial intelligence
Operation is carried out in BCNN, and there are low efficiency, problems at high cost.And for embedded system and other low-power consumption operations
Also these high-performance processors can not be used in occasion.Due to the fast development of the algorithm of binary neural network, algorithm structure is more
Deformation characteristics become increasingly prevalent the FPGA design realization of network.Tsinghua University is in " FP-BNN:Binarized
Neural network on FPGA " general two are realized in (Neurocomputing, 2018,275:1072-1086.) article
It is worth neural network accelerator, accelerates in AlexNet structure than realizing 11.6 times of CPU calculating speed and 2.75 times of GPU
Computing capability, entire model reach 384GOP/s/w calculating speed on FPGA,
But the structure power consumption and logical resource consumption are relatively large.Therefore, in order in low-power-consumption embedded equipment
Using the algorithm of high discrimination, the software and hardware for being conducive to hard-wired algorithm optimization design and the proprietary deployment of FPGA is carried out
Collaborative design method.
Summary of the invention
Goal of the invention: for overcome the deficiencies in the prior art, it is same that the present invention provides a kind of bilayer calculated based on look-up table
Or binary neural network compression method, network parameter is reduced, computational efficiency is improved and reduces resource consumption.
Technical solution: a kind of double-deck same or binary neural network compression method calculated based on look-up table,
The compression method is completed by the double-deck convolutional coding structure, algorithm the following steps are included:
Firstly, grouping carries out different convolution by input feature vector figure after nonlinear activation, batch normalization and two-value activation
The first layer convolution operation of core size obtains first layer output result;
Then, output characteristic pattern is obtained using the second layer convolution operation of 1 × 1 size to first layer output result.
The bilayer convolutional coding structure, hardware realization step include:
(1) after hardware realization nonlinear activation, batch normalization and two-value activation, first layer convolution module is carried out
With or treatment process in carry out simultaneously the convolution of the second layer with or processing, realize that the double-deck convolution calculates simultaneously;
(2) output valve that convolution double-deck in step (1) is calculated simultaneously is added using five into three adders progress flowing water
Method processing.
The double-deck convolution calculate simultaneously using three input with or processing, wherein three values of input be respectively input feature vector map values,
First layer convolution weighted value, second layer convolution weighted value.
The second layer convolution for 1 × 1 size of grouping convolution sum that the double-deck convolution is made of different convolution kernel sizes connects
It connects.
The bilayer convolution is calculated simultaneously using look-up table mode, and multiple input single output is basic in foundation look-up table
Characteristic, the double-deck convolution calculates three inputs of composition simultaneously together or processing basic unit is realized in a look-up table.
The utility model has the advantages that using ability in feature extraction the invention proposes the double-deck same or two-value network calculated based on look-up table
Stronger composite double layer convolution kernel replaces traditional convolution kernel, and uses three inputs same or calculate and eliminate convolution behaviour in bilayer convolution
Make non-two-value situation, further reduces binary neural network parameter amount and computation complexity.This structure uses CIFAR-10 number
The validity of algorithm is demonstrated according to collection.
Detailed description of the invention
Fig. 1 is the double-deck schematic diagram of single module;
Fig. 2 is sign function and gradient updating functional arrangement;
Fig. 3 is binary neural network algorithm and improvement structure chart;
Fig. 4 is that three inputs are same or the conversion of hardware realization process is schemed;
Fig. 5 is that the general hardware of the double-deck convolution realizes frame diagram.
Specific embodiment
Technical solution of the present invention is described further with reference to the accompanying drawing.
A kind of double-deck same or binary neural network compression method calculated based on look-up table, the compression method is by twin-laminate roll
Product structure is completed, and algorithm is the following steps are included: firstly, by input feature vector figure by nonlinear activation, batch normalization and two-value
After activation, the first layer convolution operation that grouping carries out different convolution kernel sizes obtains first layer output result.Then, to first layer
Output result obtains output characteristic pattern using the second layer convolution operation of 1 × 1 size.
Its hardware realization step includes:
(1) after hardware realization nonlinear activation, batch normalization and two-value activation, first layer convolution module is carried out
With or treatment process in carry out simultaneously the convolution of the second layer with or processing, realize that the double-deck convolution calculates simultaneously;
(2) output valve that convolution double-deck in step (1) is calculated simultaneously is added using five into three adders progress flowing water
Method processing.
The double-deck convolution calculate simultaneously using three input with or processing, wherein three values of input be respectively input feature vector map values,
First layer convolution weighted value, second layer convolution weighted value.
The second layer convolution for 1 × 1 size of grouping convolution sum that the double-deck convolution is made of different convolution kernel sizes connects
It connects.
The bilayer convolution is calculated simultaneously using look-up table mode, and multiple input single output is basic in foundation look-up table
Characteristic, the double-deck convolution calculates three inputs of composition simultaneously together or processing basic unit is realized in a look-up table.
Below with reference to example, the present invention will be further explained, is carried out using the convolution sizes such as 3 × 3,1 × 3,3 × 1
It illustrates.3 × 3 common convolution operations of left side as shown in Figure 1, is replaced with the double-deck convolution on right side by convolution operation.Assuming that
Input activation port number is n, and output activation port number is m, then the traditional convolution kernel size in left side is n*m*9, right side
Pconv3 × 3 represent 3 × 3 convolution, and pconv1 × 3 represent 1 × 3 convolution, pconv3 × 1 represents 3 × 1 convolution, and parameter is respectively
N/8*m/8*9, n/8*m/8*3, n/8*m/8*3,1 × 1 convolution operation size are n*m, and the double-deck deconvolution parameter summation on right side is
N*m*1.75, parameter amount are the 1/5 of common convolution, greatly reduce the number and convolutional calculation amount of parameter.Wherein parameter subtracts
It is few that the loss of further precision is not brought to be to be put forward for the first time without two-value activation between the double-deck convolution, using hardware
Realize binarization;Two-value activation can all extract original network convolution process each time in binary neural network
Feature is changed into a kind of new feature with part validity feature, meanwhile, the reversed gradient communication process of network is caused sternly
The influence of weight, gradient can not be propagated forward when gradient being caused to propagate at this, and as shown in Fig. 2 (a), sign function is greater than null part
It is 1, being less than null part is 0, and for its gradient to be infinite at zero, other place gradients are all zero.So needing new gradient letter
Number (shown in such as Fig. 2 (b)) is for carrying out solving the problems, such as that gradient can not backpropagation.New ladder of this structure similar to Gaussian Profile
Function is spent, the gradient distribution of sign function is both met to a certain extent, also further reduces the normal loss of binaryzation
Process plays certain correcting action, and the training speed of network and accuracy rate is made all to be improved.But gradient is repaired
Gradient propagation problem is just being solved only, the loss during forward-propagating can not solved effectively, therefore is being needed in feature extraction
It reduces the loss of two-value activation bring as far as possible in the process, reduces the training difficulty and the loss of precision of network.According to
It should be extracted as far as possible with the less number of plies during designing binary neural network algorithm known to above description more effective
Feature just meets the algorithm characteristic of two-value network using the double-layer network of such as Fig. 1 (b).Test through neural network proves, special
The feature extraction for levying figure direction is more even more important than the feature extraction between channel, by the number and reduction that increase first layer channel
Second layer number of channels can extract more features under less parameter.
To verify algorithm part of the invention, experiment using Tensorflow build based on two-value bilayer with or convolutional Neural
Network algorithm uses 4 parallel 3 × 3 convolution kernels, 21 × 3 convolution kernels, 23 × 1 convolution kernels and 11 × 1 convolution kernel generation
For 3 × 3 convolution kernels.Test is compared using convolutional neural networks structure shown in Fig. 3, Fig. 3 (a) is common residual error
Neural network includes seven convolution modules, in first module, two-value weight convolution is used to grasp after batch normalization operation
Make, port number 128;In second to the 7th convolution module, each module has 13 × 3 convolution operation, port number
Respectively 128,256,256 and 512;After each convolution operation all can and then one PBA layers, by nonlinear function active coating,
Batch normalized layer and two-value active coating composition, can all have after the second, the 4th and layer 7 convolution module one most
Great Chiization layer;And then a full articulamentum after 7th convolution module, is 32 × 32 threeway due to using size
Road color image CIFAR-10 data set is tested and is trained, and wherein CIFAR-10 is 10 classification, therefore most for CIFAR-10
The output channel number of the full articulamentum of the latter is 10, finally accesses normalization exponential function (Softmax) layer and completes classification
Operation, Fig. 3 (b) are using present invention network improved on the basis of Fig. 3 (a), as shown in dotted line frame module, from second convolution
Module starts, and replaces former 3 × 3 convolution kernels using the double-deck same or convolution kernel, the middle layer feature port number of the double-deck convolutional coding structure can
Self-defining, increase appropriate can enhance the overall performance of network, and other parts are then kept as former network.Fig. 3 (a), (b)
Network model is built using Tensorflow to be trained and tests, and table 1 gives the model pair of 250 wheel of training under the same number of plies
Compare situation.
As shown in table 1, the same number of plies, after training 250 is taken turns, at CIFAR-10, ResNet (- 7 layers of residual error neural network)
Network test accuracy rate is 87%, is using improved two-value residual error network (PM-ResNet-7) test accuracy rate of the present invention
86.1%;Table one gives the network parameter comparative situation under CIFAR-10 data set, and primitive network number of parameters is 2.83M,
And improved network is only 1.08M, slip reaches 63%.The reduction of parameter necessarily reduces the convolution operation number of network,
Therefore in the case where guaranteeing test accuracy rate and full two-value, the computation complexity of network is greatly reduced, when having saved calculating
Between.
The parameter and accuracy rate of 1 heterogeneous networks model of table compare
Data set | Model | Number of parameters | Accuracy rate |
CIFAR-10 | ResNet-7 | 2.83M | 87% |
CIFAR-10 | PM-ResNet-7 | 1.08M | 86.1% |
In terms of hardware realization, as shown in Fig. 4 (a), the convolution operation step of normal double-layer network is first to calculate first
Output valve o11, o12, o13 of layer convolution, the convolution operation that calculating resulting value carries out 1x1 again obtain o1, o2, o3.The first meter
Calculation mode such as formula (1), (2), (3), (4) are shown, and first layer convolution operation resulting value needs to carry out corresponding sum operation, therefore
It is obtaining the result is that bit value more than one.When carrying out the calculating of next step O1 result, though weight
O11=I11W111+I12W112+ ...+I21W121+I22W122+ ...+I31W131+I32W132+ ...+I39W139 (1)
O12=I11W211+I12W212+ ...+I21W221+I22W222+ ...+I31W231+I32W232+ ...+I39W239 (2)
O13=I11W311+I12W312+ ...+I21W321+I22W322+ ...+I31W331+I32W332+ ...+I39W339 (3)
O1=O11* × 11+O12* × 12+O13*x13 (4)
Part x11, x12, x13 are single-bit, but output result O11, O12, O13 of first layer are more bit values, therefore only
It can be carried out addition and subtraction operation, the advantage that convolution operation is able to use with or operates when losing binary neural network hardware realization,
The hardware resource cost of this method is also larger.And improved calculating mode such as Fig. 4 (b) shown in, using fusion with or method only
Three input values need to be used same or judgement can export first layer result removal, not only solve the non-single-bit of middle layer.It can not
The difficult point for carrying out with or calculating again, and eliminates two-value activation bring loss of significance, allows binary neural network can be
More features are extracted in the case of less parameter.By formula (5) it is found that the final result O1 calculated need to only carry out three single-bits
Required result can be obtained by carrying out the cumulative summation of 1 bit after same or operation again.Since the adder of multidigit is relative to an addition
Device resource consumption is bigger, and bring clock delay is also more, increases complexity to the realization of circuit.
At the same time, three single-bits are same or network calculates on look-up table relative to common calculation has more
The advantages of.It is found that the programmable logic resource of FPGA mainly consists of two parts by taking FPGA as an example, a part is by look-up table
(LUT) combinational circuit is realized, a part is to realize sequence circuit by register.It is consumed most during realizing neural network hardware
It is exactly convolution algorithm more, and convolution algorithm is mainly multiplication and addition and subtraction operation, is mainly addition and subtraction in binary neural network
With with or operation, but a large amount of combinational logic circuit is all consumed, so being a key point to the optimization of combinational logic circuit.Root
According to the characteristic of LUT, each LUT is to look for table, it can be achieved that different logic function, but it is fixed for outputting and inputting mode
, for the FPGA device of the neural network hardware realization of current main-stream, the LUT of FPGA mostly uses 4,6 input modes.For 6
Input pattern, the LUT there are three types of mode, respectively can -1 output of one 6 input of reality, two -1 outputs of 3 input or one it is 5 defeated
Enter -2 output modes.According to same or principle it is found that two single-bits inputs carry out with or obtain single-bit output, three single-bits
Input carries out with or is still single-bit output.Meanwhile being limited according to the output bit of LUT, three two inputs are same or operate not
Can be realized on 1 LUT, thus under normal circumstances two two input it is same or operation required for LUT and two three input
LUT required for same or operation is identical, and also needs to carry out after the normal same or convolution under the first mode secondary
Convolution operation, therefore consumed hardware resource need to consume more logic moneys relative to second of one-time calculation bilayer convolution
Source, therefore second of mode disclosed by the invention can save considerable hardware resource.
Whole hardware realization for double-layer network will be as shown in figure 5, this example will be counted parallel using 8 parallel modules
It calculates, by input channel p, characteristic pattern size is that the convolution module of m*n is divided into 8 modules, and input feature vector figure size is p/8*m*n,
For 3 × 3 convolution modules;First by the upper left hand corner section of input feature vector figure, size is the value of p/8*3*3 and 4 difference volume 3 × 3
Product core, size are that the weight matrix of p/8*3*3 carries out convolutional calculation respectively, and as seen from the figure, first part's output valve number is equal to
The number of weight, same or operation do not reduce output valve number, and gained matrix size such as Fig. 5 intermediary matrix show p/8*4*9.
Obtained intermediate result continues and next 1 × 1 convolution kernel carries out convolution operation, is carried out by 128 4*1 matrixes to intermediary matrix
Sliding is same or operation obtains the matrix in 128 channels of output, matrix size 128*p/8*4*9, and obtained matrix carries out non-channel
It sums on direction, each Matrix Calculating and number are p/8*4*9, i.e. channel direction does not need to add up.The adder of the matrix is adopted
It is compressed with five into three additions, firstly, the characteristics of being exported according to the 5 of LUT inputs 2,5 value summations only need the consumption of 2 LUT.
Secondly, according to 5 into inputting the higher feature of the more resource utilizations of number parallel in 3 addition water operations, by it is double-deck with or institute
Progress parallel addition calculating, which must be worth, will greatly improve resource utilization.Identical place is used for 1 × 3 or 3 × 1 convolution modules
Reason mode is calculated, and final parallel computation obtains summation on 81 directions of 128*1 matrix values progress and obtains final 128 output
Feature map values.
It should be pointed out that for those skilled in the art, without departing from the principle of the present invention,
Several improvements and modifications can also be made, these modifications and embellishments should also be considered as the scope of protection of the present invention.In the present embodiment not
The available prior art of specific each component part is realized.
Claims (5)
1. a kind of double-deck same or binary neural network compression method calculated based on look-up table, it is characterised in that:
The compression method is completed by the double-deck convolutional coding structure, algorithm the following steps are included:
Firstly, grouping carries out different convolution kernel rulers by input feature vector figure after nonlinear activation, batch normalization and two-value activation
Very little first layer convolution operation obtains first layer output result;
Then, output characteristic pattern is obtained using the second layer convolution operation of 1 × 1 size to first layer output result.
2. a kind of double-deck same or binary neural network compression method calculated based on look-up table according to claim 1,
Be characterized in that: the bilayer convolutional coding structure, hardware realization step include:
(1) hardware realization nonlinear activation, batch normalization and two-value activation after, to first layer convolution module carry out with or
Carried out simultaneously in treatment process the convolution of the second layer with or processing, realize that the double-deck convolution calculates simultaneously;
(2) output valve that convolution double-deck in step (1) is calculated simultaneously carries out at flowing water addition using five into three adders
Reason.
3. a kind of double-deck same or binary neural network compression method calculated based on look-up table according to claim 1,
Be characterized in that: the double-deck convolution and meanwhile calculate using three input with or processing, wherein three values of input are input feature vector figure respectively
Value, first layer convolution weighted value, second layer convolution weighted value.
4. a kind of double-deck same or binary neural network compression method calculated based on look-up table according to claim 1,
Be characterized in that: the second layer convolution for 1 × 1 size of grouping convolution sum that the double-deck convolution is made of different convolution kernel sizes connects
It connects.
5. a kind of double-deck same or binary neural network compression method calculated based on look-up table according to claim 1,
Be characterized in that: the bilayer convolution is calculated simultaneously using look-up table mode, the base according to multiple input single output in look-up table
This characteristic, the double-deck convolution calculates three inputs of composition simultaneously together or processing basic unit is realized in a look-up table.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910178528.0A CN109993279B (en) | 2019-03-11 | 2019-03-11 | Double-layer same-or binary neural network compression method based on lookup table calculation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910178528.0A CN109993279B (en) | 2019-03-11 | 2019-03-11 | Double-layer same-or binary neural network compression method based on lookup table calculation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109993279A true CN109993279A (en) | 2019-07-09 |
CN109993279B CN109993279B (en) | 2023-08-04 |
Family
ID=67130485
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910178528.0A Active CN109993279B (en) | 2019-03-11 | 2019-03-11 | Double-layer same-or binary neural network compression method based on lookup table calculation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109993279B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111445012A (en) * | 2020-04-28 | 2020-07-24 | 南京大学 | FPGA-based packet convolution hardware accelerator and method thereof |
CN111832718A (en) * | 2020-06-24 | 2020-10-27 | 上海西井信息科技有限公司 | Chip architecture |
US20210150313A1 (en) * | 2019-11-15 | 2021-05-20 | Samsung Electronics Co., Ltd. | Electronic device and method for inference binary and ternary neural networks |
CN112906886A (en) * | 2021-02-08 | 2021-06-04 | 合肥工业大学 | Result-multiplexing reconfigurable BNN hardware accelerator and image processing method |
CN113408713A (en) * | 2021-08-18 | 2021-09-17 | 成都时识科技有限公司 | Method for eliminating data copy, neural network processor and electronic product |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160148078A1 (en) * | 2014-11-20 | 2016-05-26 | Adobe Systems Incorporated | Convolutional Neural Network Using a Binarized Convolution Layer |
CN106355244A (en) * | 2016-08-30 | 2017-01-25 | 深圳市诺比邻科技有限公司 | CNN (convolutional neural network) construction method and system |
US20180247180A1 (en) * | 2015-08-21 | 2018-08-30 | Institute Of Automation, Chinese Academy Of Sciences | Deep convolutional neural network acceleration and compression method based on parameter quantification |
-
2019
- 2019-03-11 CN CN201910178528.0A patent/CN109993279B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160148078A1 (en) * | 2014-11-20 | 2016-05-26 | Adobe Systems Incorporated | Convolutional Neural Network Using a Binarized Convolution Layer |
US20180247180A1 (en) * | 2015-08-21 | 2018-08-30 | Institute Of Automation, Chinese Academy Of Sciences | Deep convolutional neural network acceleration and compression method based on parameter quantification |
CN106355244A (en) * | 2016-08-30 | 2017-01-25 | 深圳市诺比邻科技有限公司 | CNN (convolutional neural network) construction method and system |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210150313A1 (en) * | 2019-11-15 | 2021-05-20 | Samsung Electronics Co., Ltd. | Electronic device and method for inference binary and ternary neural networks |
CN111445012A (en) * | 2020-04-28 | 2020-07-24 | 南京大学 | FPGA-based packet convolution hardware accelerator and method thereof |
CN111832718A (en) * | 2020-06-24 | 2020-10-27 | 上海西井信息科技有限公司 | Chip architecture |
CN112906886A (en) * | 2021-02-08 | 2021-06-04 | 合肥工业大学 | Result-multiplexing reconfigurable BNN hardware accelerator and image processing method |
CN113408713A (en) * | 2021-08-18 | 2021-09-17 | 成都时识科技有限公司 | Method for eliminating data copy, neural network processor and electronic product |
CN113408713B (en) * | 2021-08-18 | 2021-11-16 | 成都时识科技有限公司 | Method for eliminating data copy, neural network processor and electronic product |
Also Published As
Publication number | Publication date |
---|---|
CN109993279B (en) | 2023-08-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109993279A (en) | A kind of double-deck same or binary neural network compression method calculated based on look-up table | |
Guo et al. | FBNA: A fully binarized neural network accelerator | |
CN111459877B (en) | Winograd YOLOv2 target detection model method based on FPGA acceleration | |
US20190087713A1 (en) | Compression of sparse deep convolutional network weights | |
CN103176767B (en) | The implementation method of the floating number multiply-accumulate unit that a kind of low-power consumption height is handled up | |
CN110991631A (en) | Neural network acceleration system based on FPGA | |
CN108108809A (en) | A kind of hardware structure and its method of work that acceleration is made inferences for convolutional Neural metanetwork | |
CN107092960A (en) | A kind of improved parallel channel convolutional neural networks training method | |
CN109948784A (en) | A kind of convolutional neural networks accelerator circuit based on fast filtering algorithm | |
CN110383300A (en) | A kind of computing device and method | |
CN110163359A (en) | A kind of computing device and method | |
CN109284824A (en) | A kind of device for being used to accelerate the operation of convolution sum pond based on Reconfiguration Technologies | |
Li et al. | AlphaGo policy network: A DCNN accelerator on FPGA | |
CN113283587A (en) | Winograd convolution operation acceleration method and acceleration module | |
Duan et al. | Energy-efficient architecture for FPGA-based deep convolutional neural networks with binary weights | |
Li et al. | An efficient CNN accelerator using inter-frame data reuse of videos on FPGAs | |
Zhuang et al. | Vlsi architecture design for adder convolution neural network accelerator | |
Jiang et al. | Hardware implementation of depthwise separable convolution neural network | |
Zhan et al. | Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems | |
Tsai et al. | A CNN accelerator on FPGA using binary weight networks | |
Liu et al. | Tcp-net: Minimizing operation counts of binarized neural network inference | |
Yang et al. | Data-aware adaptive pruning model compression algorithm based on a group attention mechanism and reinforcement learning | |
Paul et al. | Hardware-software co-design approach for deep learning inference | |
Kang et al. | Design of convolution operation accelerator based on FPGA | |
CN110163793B (en) | Convolution calculation acceleration method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |