CN108805286A - High performance network accelerated method based on high-order residual quantization - Google Patents
High performance network accelerated method based on high-order residual quantization Download PDFInfo
- Publication number
- CN108805286A CN108805286A CN201810604458.6A CN201810604458A CN108805286A CN 108805286 A CN108805286 A CN 108805286A CN 201810604458 A CN201810604458 A CN 201810604458A CN 108805286 A CN108805286 A CN 108805286A
- Authority
- CN
- China
- Prior art keywords
- tensor
- order
- binaryzation
- quantization
- indicate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The present invention provides a kind of high performance network accelerated methods based on high-order residual quantization, including:Step S1 obtains a series of binary system data of different scales by quantization and recursive operation;Step S2 carries out convolution algorithm to the binary system data of different scale, and is combined to obtained operation result.The present invention is a kind of effective accurately depth network accelerating method.This concept of residual error is referred to indicate information loss and recursively calculate the residual error of the input data after various sizes of quantization to reduce information loss.It is inputted using binaryzation weight and binaryzation, the size of network is reduced to original 1/32 or so, and training speed is improved 30 times or so.Method proposed by the present invention additionally provides the possibility of the training depth convolutional network on CPU.Experimental result shows that HORQ networks proposed by the present invention have good classifying quality and acceleration effect.
Description
Technical field
The present invention relates to a kind of method of depth network acceleration, specifically a kind of high property based on high-order residual quantization
It can network accelerating method.
Background technology
It is a kind of effective network accelerating method to be proven to the method that input tensor carries out binaryzation.But it is existing
Binarization method be considered the simple thresholding operation (first approximation) for pixel, and with larger
Loss of significance.The method for accelerating depth e-learning speed is broadly divided into three kinds.Simplest method is to execute network cut,
And the network structure after re -training trimming.In order to realize higher structure compresses rate, there is researcher to develop structure later
Sparse approximation technology, for the larger sub-network of shape is become shallow-layer sub-network.But this method is for having difference
The network of network structure needs the corresponding approximation network structure appropriate of design.Nearest academia proposes a kind of new net
The weight of network and corresponding forward or a backward stream compression are changed to the expression shape of binaryzation by network binaryzation solution
Formula, so as to reduce calculation amount and network storage space.Some methods carry out input image data also by threshold operation
Binaryzation.But although this method improves network training speed, also simultaneously so that sorter network accuracy rate drastically under
Drop.Pervious input data binarization operation simply uses positive and negative threshold value, is considered for floating number data
Very rough quantization, it can also be considered as single order two-value approximation method.
In conclusion existing network accelerating method, cannot effective and high performance raising network speed, do not have at present
It is found the explanation or report of technology similar to the present invention, is also not yet collected into data similar both at home and abroad.
Invention content
Aiming at the above shortcomings existing in the prior art, the purpose of the present invention is to propose to a kind of based on high-order residual quantization
High performance network accelerated method, this method are a kind of high-order binaryzation schemes, and specifically a kind of new binaryzation quantification manner claims
For the method for high-order residual quantization (HORQ), it realizes the binaryzation of both input and weight, can be more accurate approximate
Still have the advantages that binaryzation can improve calculating speed while calculating process.Specifically, the scheme recurrence proposed
Ground executes residual quantization, and generates a series of binaryzation input pictures with the size successively decreased.In addition, the invention also provides
The filtering of high-order binary system and gradient propagation operation that a kind of forward and backward calculates.
The present invention is achieved by the following technical solutions.
A kind of high performance network accelerated method based on high-order residual quantization, includes the following steps:
Step S1 obtains a series of binary system data of different scales by quantization and recursive operation;
Step S2 carries out convolution algorithm to the binary system data of different scale, and the operation result to obtaining carries out
In conjunction with;
Step S3 using being obtained in step S2 as a result, training convolutional neural networks, and then is completed to convolutional neural networks
Acceleration training.
Preferably, the step S1 includes following sub-step:
Step S11 calculates single order residual error, and is operated by thresholding, further approximation single order residual error;
Step S12, recurrence execute step S11, obtain a series of binaryzation residual error tensors corresponding to different quantization scales,
As binary system data.
Preferably, the step S11 includes following sub-step:
Step S111, it is assumed that an input data tensor matrix X is quantified by following process, obtains input data tensor
The single order residual error of matrix X:
X≈β1H1;
Wherein, β1For real number;H1∈ {+1, -1 }n, indicate single order two-value residual error tensor;N is tensor H1Dimension;
Step S112 optimizes the quantized result in step S111:
Wherein, J () indicates square error loss function;Obtained optimum results are single order binaryzation quantized result:
Wherein, the norm that l1 is 1;Value herein
Step S113, by calculating actual input data X and single order binaryzation quantized resultBetween difference determine
Adopted single order binaryzation residual error tensor R1(X), and then further approximate single order residual error:
R1(X)=X- β1H1
R1(X) it is used to indicate to lose due to information caused by approximation.
Preferably, step S12 includes following sub-step:
Step S121, further quantifies R1(X) following to indicate:
R1(X)=β2H2
Wherein, β2It is real number;H2∈ {+1, -1 }n, indicate that second order two-value residual error tensor, n are tensor H2Dimension;
Obtain the second order quantized residual of input data:
X=β1H1+R1(X)≈β1H1+β2H2
Wherein, β1And β2It is real number scalar, H respectively1And H2It is two-value residual error tensor respectively;β1H1Referred to as single order binaryzation is defeated
Enter tensor, and β2H2Referred to as second order binaryzation inputs tensor;
Step S122 solves the optimization problem of second order binaryzation input tensor:
Obtained optimum results are:
Value hereinFurther obtaining second order binaryzation residual error tensor is:
R2(X)=R1(X)-β2H2;
In optimization process, obtain such as lower inequality:
Wherein, the L2 norms of second order binaryzation residual error tensor are used for indicating information loss.
Preferably, further include following steps:
Further quantization second order binaryzation residual error tensor R2(X) K rank residual quantizations are arrived, the K rank residual errors of input data X are obtained
Quantization:
Wherein,
Preferably, the step S2 includes following sub-step:
Step S21 remolds binary system data;
Step S22 carries out convolutional calculation using the binary system data after remodeling.
Preferably, the step S21 includes following sub-step:
Step S211, it is assumed that binary system data are that dimension isInput data tensor matrix X, convolution
Weight tensor matrix W dimension isFor W, it is divided into coutA filter is 1 the remodeling of each filter
×(cin×w×h);So entire convolution weight tensor matrix W is remolded as Wr, the W that is obtained after remodelingrDimension be cout×
(cin×w×h);The output of convolutional layer is defined with Y, then the dimension of Y isFor input data tensor
For matrix X, a shared hout×woutA sub- tensor, every sub- tensor remodeling at the dimension as filter, so weight
The X obtained after modelingrDimension be (cin×w×h)×wout×hout;
Wherein, cinIndicate the channel number of input tensor, winIndicate the width in each channel of input tensor, hinIndicate defeated
Enter the height in each channel of tensor, coutIndicate that convolution kernel number, w indicate that the width of convolution kernel, h indicate the height of convolution kernel,
WoutIndicate the width in each channel of output tensor, houtIndicate the height in each channel of output tensor;
Step S212 uses matrix multiplication Yr=WrXrTo indicate the result of calculation of convolution operation;
Step S213, YrDimension remodeling complete entirely to remold calculating process at Y.
Preferably, the step S22 includes following sub-step:
Step S221, to WrQuantified:
Wr(i)≈αiBi,
Wherein, Wr(i)It is WrThe i-th row, αiIndicate a constant, BiIndicate Wr(i)Single order two-value it is approximate;
Step S222, to XrQuantified:
Xr(i)≈β1(i)H1(i)
Wherein, Xr(i)It is XrThe i-th row;
Step S223 utilizes the W after quantizationrAnd XrCarry out binaryzation convolutional calculation.
Above-mentioned binaryzation convolutional calculation problem can be solved with the algorithm in Fig. 2.
High performance network accelerated method provided by the invention based on high-order residual quantization is one kind by recurrence thresholding
The two-value quantization method of operation, it is proposed that high-order residual quantization frame, by a threshold operation, network structure can calculate residual
Mistake is poor, then executes the threshold operation of a new round, with further approximate residual error.Therefore, the present invention can be obtained corresponding to not
With a series of binary images of quantization scale.Tensor (the stacking binary picture of different scale) is inputted based on these binaryzations,
The present invention develops the effective binaryzation filter operation calculated for forward and backward.
Compared with prior art, the present invention has the advantages that:
The present invention can either make the training speed of network accelerate, additionally it is possible to the information during reduction quantization few as possible
Loss;Method in the present invention is simultaneously using the weight of the input data and binaryzation of binaryzation, after the completion of primary quantization, profit
With calculated residual error, recurrence carries out carry out quantization operation next time.This method can be strong with flexibility, is adapted to different
Experiment condition.
The present invention can either improve the speed of service, additionally it is possible to ensure the training effect of network.Using binaryzation input and
Weight accelerates to reduce information loss to the greatest extent while network.
Present invention utilizes recursive thought, the tensor generated by recursive procedure all of.Recurrence is generated
Tensor be combined, the quantized result of different scale size is used.
The present invention is adapted to different hsrdware requirements and the suitable residual error exponent number of accelerated selection.Residual error exponent number is two or three
When, network and the information loss for being as possible can be accelerated to reduce.
Description of the drawings
Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention,
Objects and advantages will become more apparent upon:
Fig. 1 is the tensor remodeling process schematic diagram of one embodiment of the invention;
Fig. 2 is the algorithm flow chart for second order two-value convolution of one embodiment of the invention;
Fig. 3 is the training flow chart of the HORQ networks of one embodiment of the invention;
Fig. 4 is Experimental comparison results figure of the one embodiment of the invention on CIFAR-10 data sets;
Fig. 5 is Experimental comparison results' figure on MNIST data sets;
Fig. 6 is the acceleration rate of one embodiment of the invention and the relational graph of channel number;
Fig. 7 is the relational graph of the acceleration rate and quantization exponent number of one embodiment of the invention;
Fig. 8 is the relational graph of the acceleration rate and convolution kernel size of one embodiment of the invention;
Fig. 9 is the algorithm principle figure of the high-order residual quantization of one embodiment of the invention.
Specific implementation mode
Elaborate below to the embodiment of the present invention, following embodiment under based on the technical solution of the present invention into
Row is implemented, and gives detailed embodiment and specific operating process, but protection scope of the present invention is not limited to following realities
Apply example.
The present invention provides a kind of high performance network accelerated methods based on high-order residual quantization, are a high-order binaryzations
Scheme, realizing while more accurate approximate calculation process still has the advantages that binaryzation can improve calculating speed.With
In lower embodiment, specifically, the scheme proposed recursively executes residual quantization, and generates a series of with the size successively decreased
Binaryzation input picture.In addition, embodiment also proposed the high-order binary system filtering of forward and backward calculating and gradient is propagated
Operation.
Following embodiment method is a kind of method that new binaryzation quantification manner is known as high-order residual quantization (HORQ).It
Realize the binaryzation of both input and weight.First, a series of binary system data of different scales are obtained.Then, right
The binary system data of different scale carry out convolution algorithm, and are combined to obtained result of calculation.The method successfully subtracts
Information loss is lacked.
Specially:
Single order residual error is calculated, the thresholding operation of a new round is then executed, with further approximate single order residual error.The two of residual error
Value formal approximation is considered high-order binary system.Aforesaid operations are recursively executed, and a system can be finally obtained
Row correspond to the binaryzation residual error tensor of different quantization scales, as binary system data.Based on these binary input numbers
According to the effective binaryzation filter operation that exploitation is calculated for forward and backward.
The input of convolutional layer is four dimensional tensor.If the remodeling input corresponding weight filter of tensor sum is at matrix, convolution
Operation is considered matrix multiplication.During matrix multiplication vector behaviour is can be considered as the operation of each element
Make.The case where input is a n dimensional vector n is considered first:Assuming that being vector X ∈ R there are one input data tensor matrixn, it
It can be quantified by following process:X≈β1H1, wherein H1∈ {+1, -1 }n.Then it is tied by solving following optimization problem
Fruit:
The solution of this problem is:
Wherein β1It is a real number.The above problem is considered single order binaryzation quantification problem, can pass through calculating
Actually enter and the quantization of single order binaryzation after result between difference define single order residual error tensor R1(X):
R1(X)=X- β1H1
After above-mentioned parameter determines, R can be used1(X) it indicates to lose due to information caused by approximation.
Due to R1(X) it is real number tensor, can further quantifies R1 (X) and indicate as follows:
R1(X)=β2H2
Wherein β2It is real number, H2∈ {+1, -1 }n。
The second order quantized residual of input data can so be obtained:
X=β1H1+R1(X)≈β1H1+β2H2
Wherein β1And β2It is real number scalar, H1And H2It is binaryzation residual error tensor.β1H1Referred to as single order binaryzation inputs tensor,
And β2H2Referred to as second order binaryzation inputs tensor.Using method similar with solution procedure before, can solve the above problems.
First, corresponding optimization problem is solved:
The solution of last formula is:
Herein,It is best β1, H1Value, and for arbitrary β1, H1, corresponding R can be calculated1(X), together
Reason can calculate corresponding R2(X);In order to further illustrate value herein
The binaryzation approximation method newly proposed in the embodiment of the present invention in theory in actual effect than already existing
Conventional method is good very much.
The information loss of single order and Two-order approximation method can be compared.In the binarization method of conventional first order, R1(X) quilt
It is defined as residual error approximation tensor.Similar, second order residual error tensor is defined as R in the method newly proposed2(X)=R1(X)-β2H2.In optimization process, it can obtain such as lower inequality:
Therefore, if indicating information loss using the L2 norms of residual error tensor, two proposed in embodiment are able to demonstrate that
Rank residual quantization method reduces information loss amount.
Furthermore, it is understood that second order residual quantization method can be extended to K rank residual quantizations:
Wherein,
The input tensor of higher order can be obtained by recursively calculating residual error tensor.In fact, if exponent number is got higher,
Information loss can become less, but calculation amount can increase many.It can be found that the effect of second order and three rank residual quantizations is
It is good enough.Second order residual quantization is mainly used in embodiment, can keep appropriate under the premise of ensureing residual quantization effect
Operand.
Receive the input of HORQ binaryzations, and high-order binaryzation filtering is carried out in forward and backward calculating process.It is first
Tensor remodeling process, as shown in Figure 1.Assuming that binary system data are tensor matrix X, dimension isConvolution
Filter W dimensions areFor weight tensor W, it is divided into coutA filter can think highly of each filtering
Modeling is 1 × (cin×w×h).So entire matrix W is remolded as Wr, dimension cout×(cin×w×h).Convolution is defined with Y
The output of layer, then the dimension of Y isFor input tensor X, a shared hout×woutA sub- tensor,
Every sub- tensor remodeling at the dimension as filter, so the X obtained after remodelingrDimension is (cin×w×h)×wout
×hout.Then matrix multiplication Y is usedr=WrXrTo indicate the result of calculation of convolution operation.Finally, YrDimension remold at Y
To complete entirely to remold calculating process.
After tensor remodeling process, next structure is the convolutional calculation process using second order residual quantization.Second order
Residual quantization process is used to calculate XrAnd WrMatrix product calculate.WrIt is first quantized into:
Wr(i)≈αiBi,
Wherein, Wr(i)It is WrThe i-th row.Later, we using second order residual quantization come to input matrix XrQuantified.
Xr(i)≈β1(i)H1(i)
Wherein, Xr(i)It is XrThe i-th row.Above-mentioned binaryzation convolutional calculation problem can be solved with the algorithm in Fig. 2.
The process of a HORQ network is trained to show in figure 3 using the second order residual quantization method proposed in above-described embodiment
Show.General procedure includes propagated forward, back-propagating and parameter update.In the communication process of forward and backward, input
It is the binaryzation expression-form of input and weight.In fact, high-order residual quantization method in the present invention can in convolutional layer and
It is easily applied in full articulamentum.In the training process, input and weight are quantified, and successively calculates binaryzation convolution
Process.After propagated forward, with the weight of binaryzation and input to carry out backpropagation, and come using sign function sign ()
Calculate gradient.In undated parameter using the actual value of parameter and input, this is because parameter newer amplitude when each iteration
All very littles.If updating the weight of binaryzation, updated value can disappear during binaryzation, and the training of network can be by shadow
It rings.
The high performance network accelerated method based on high-order residual quantization that above-described embodiment provides, is a kind of high-order binaryzation
The effective accurate depth network accelerating method of approximation method.This concept of residual error is referred to indicate information loss and pass
Calculate the residual error of the input data after various sizes of quantization with returning to reduce information loss.Use binaryzation weight and binaryzation
Input, the size of network is reduced to original 1/32 or so, and training speed is improved 30 times or so.Above-described embodiment carries
The method gone out additionally provides the possibility of the training depth convolutional network on CPU.Experimental result shows the HORQ that embodiment proposes
Network has good classifying quality and acceleration effect.
It is further described with reference to specific example.
Embodiment is tested on two databases of MNIST and CIFAR-10.MNIST databases are for image point
The database of class includes the 0-9 digital pictures of handwritten form.It is compared with other methods, also uses in an experiment for convenience
MLP structures as control methods.Second order residual quantization connection structure of this structure comprising three hidden layers, 4096 dimensions
And L2-SVM structures (using Hinge error functions).In order to train MLP, process of convolution, data prediction, number are not used
According to modes such as gain and pre-training.ADAM adaptive learning methods are used, it is 200 to also use for every batch of batch size
Data are normalized to improve training speed to carry out batch.The MLP of single order binaryzation connection (XNOR) is also used in an experiment
Structure is compared come the final test accuracy rate of the method with the embodiment of the present invention.Both methods has used identical net
Network structure, final result show that the method ratio XNOR methods proposed in the embodiment of the present invention are high by 0.71% in accuracy rate, and
Method in the embodiment of the present invention has faster convergence speed.From the point of view of the change procedure of Hinge error function values, though
Right two methods can make Hinge errors drop to relatively small numerical value, and the error of HORQ methods declines process and shows
It obtains more smooth.The work of other researchers before uses the input of binaryzation weight and float, present invention implementation
The method used in example uses the input of the weight and binaryzation of binaryzation.Experimental result on MNIST databases illustrates this
The method used in inventive embodiments can be realized reduces loss of significance to the greatest extent while accelerating the speed of depth network.Specifically
As shown in Figure 4,5.
On CIFAR-10 databases, 50000 pictures are used to train, and 10000 pictures are used to test.It is testing
In the process without the method using process of data preprocessing and data gain.In order to show the second order of proposition of the embodiment of the present invention
The difference of residual quantization method and traditional single order binaryzation quantization method (XNOR), the convolution for having used the number of plies less in experiment
Neural network.Sample number per batch processing is 50, enables to training speed more faster, and returned to training data
One change and standardization.
In the comparison procedure of experimental result, due to experiment use convolutional neural networks structure be not it is very complicated,
The result of baseline experiment is not so good as some comparing results, but shallower network be conducive to compare the embodiment of the present invention method and
XNOR.Under the premise of using identical network structure, the experimental result high 5% of the experimental result ratio XNOR of the embodiment of the present invention is left
The speed on the right side, network convergence is essentially identical.This illustrates the more effective fruit of the method used in the embodiment of the present invention.
The calculation amount of the method to being proposed in the embodiment of the present invention is analyzed below.One convolutional neural networks is come
It says, inputs and beConvolution weight tensor matrix isOperation sum is cout×cin×
wh×winhin.Using the existing suitable CPU for calculating 64 binaryzation data operations, the K rank residual errors that the embodiment of the present invention uses
Quantization need operation sum be:
K×cout×cin×wh×winhin+(K+1)×winhin=KNp+(K+1)Nn
In these calculating operations, KNpA operation for belonging to binaryzation precision, can be accelerated, (K+1) in addition
NnCalculating operation is the operation of floating number precision, cannot be accelerated.Therefore rate of acceleration is
The rate of acceleration of the case where for second order, the embodiment of the present invention is
From the equations above as can be seen that width and height of the rate of acceleration independent of input tensor, but and filter
Size it is related with port number.First, port number is fixed:coutcin=10 × 10 carry out observation filter device size to accelerating
The size of rate, the results are shown in Figure 6.Then, filter size is fixed as w × h=3 × 3 by we, and the port number of input is 3,
Observation input channel number has any influence to rate of acceleration, and the results are shown in Figure 7.Experimental result is shown if port number and filter
Size is excessively small, the effect very little of acceleration, therefore binarization method is applied in DCNN in embodiments of the present invention, energy
It enough avoids using the very few network layer of quantization port number.If setup parameter is cincout=64 × 256, w × h=3 × 3, then
Second order residual quantization method can speed up 31.98 times.But in a practical situation, rate of acceleration may be slightly below this value, be
Due to the limitation of memory reading process and data handling procedure.The relationship for quantifying exponent number and speed-up ratio is as shown in Figure 8.It is tied from experiment
Fruit can be seen that the second order of proposition of the embodiment of the present invention and three rank residual quantization methods can be the case where ensureing accuracy rate effect
Under have the acceleration effect of very strength.The complete frame of the present invention is as shown in Figure 9.
Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited in above-mentioned
Particular implementation, those skilled in the art can make various deformations or amendments within the scope of the claims, this not shadow
Ring the substantive content of the present invention.
Claims (8)
1. a kind of high performance network accelerated method based on high-order residual quantization, which is characterized in that include the following steps:
Step S1 obtains a series of binary system data of different scales by quantization and recursive operation;
Step S2 carries out convolution algorithm to the binary system data of different scale, and is combined to obtained operation result;
Step S3 using obtaining in step S2 as a result, training convolutional neural networks, and then completes to add convolutional neural networks
Speed training.
2. the high performance network accelerated method according to claim 1 based on high-order residual quantization, which is characterized in that described
Step S1 includes following sub-step:
Step S11 calculates single order residual error, and is operated by thresholding, further approximation single order residual error;
Step S12, recurrence execute step S11, obtain a series of binaryzation residual error tensors corresponding to different quantization scales, as
Binary system data.
3. the high performance network accelerated method according to claim 2 based on high-order residual quantization, which is characterized in that described
Step S11 includes following sub-step:
Step S111, it is assumed that an input data tensor matrix X is quantified by following process, obtains input data tensor matrix X
Single order residual error:
X≈β1H1;
Wherein, β1For real number;H1∈ {+1, -1 }n, indicate single order two-value residual error tensor;N is tensor H1Dimension;
Step S112 optimizes the quantized result in step S111:
Wherein, J () indicates square error loss function;Obtained optimum results are single order binaryzation quantized result:
Wherein, the norm that l1 is 1;Value herein
Step S113, by calculating actual input data X and single order binaryzation quantized result Between difference define one
Rank binaryzation residual error tensor R1(X), and then further approximate single order residual error:
R1(X)=X- β1H1
R1(X) it is used to indicate to lose due to information caused by approximation.
4. the high performance network accelerated method according to claim 3 based on high-order residual quantization, which is characterized in that step
S12 includes following sub-step:
Step S121, further quantifies R1(X) following to indicate:
R1(X)=β2H2
Wherein, β2It is real number;H2∈ {+1, -1 }n, indicate that second order two-value residual error tensor, n are tensor H2Dimension;
Obtain the second order quantized residual of input data:
X=β1H1+R1(X)≈β1H1+β2H2
Wherein, β1And β2It is real number scalar, H respectively1And H2It is two-value residual error tensor respectively;β1H1Referred to as single order binaryzation input
Amount, and β2H2Referred to as second order binaryzation inputs tensor;
Step S122 solves the optimization problem of second order binaryzation input tensor:
Obtained optimum results are:
Value hereinFurther obtaining second order binaryzation residual error tensor is:
R2(X)=R1(X)-β2H2;
In optimization process, obtain such as lower inequality:
Wherein, the L2 norms of second order binaryzation residual error tensor are used for indicating information loss.
5. the high performance network accelerated method according to claim 4 based on high-order residual quantization, which is characterized in that also wrap
Include following steps:
Further quantization second order binaryzation residual error tensor R2(X) K rank residual quantizations are arrived, the K rank residual quantizations of input data X are obtained:
Wherein,
6. the high performance network accelerated method according to claim 1 based on high-order residual quantization, which is characterized in that described
Step S2 includes following sub-step:
Step S21 remolds binary system data;
Step S22 carries out convolutional calculation using the binary system data after remodeling.
7. the high performance network accelerated method according to claim 6 based on high-order residual quantization, which is characterized in that described
Step S21 includes following sub-step:
Step S211, it is assumed that binary system data are that dimension isInput data tensor matrix X, convolution weight
Tensor matrix W dimension isFor W, it is divided into coutA filter, the remodeling of each filter for 1 ×
(cin×w×h);So entire convolution weight tensor matrix W is remolded as Wr, the W that is obtained after remodelingrDimension be cout×(cin
×w×h);The output of convolutional layer is defined with Y, then the dimension of Y isFor input data tensor matrix
For X, a shared hout×woutA sub- tensor, every sub- tensor remodeling at the dimension as filter, so after remodeling
Obtained XrDimension be (cin×w×h)×wout×hout;
Wherein, cinIndicate the channel number of input tensor, winIndicate the width in each channel of input tensor, hinIndicate input
Measure the height in each channel, coutIndicate that convolution kernel number, w indicate that the width of convolution kernel, h indicate the height of convolution kernel, woutTable
Show the width in each channel of output tensor, houtIndicate the height in each channel of output tensor;
Step S212 uses matrix multiplication Yr=WrXrTo indicate the result of calculation of convolution operation;
Step S213, YrDimension remodeling complete entirely to remold calculating process at Y.
8. the high performance network accelerated method according to claim 7 based on high-order residual quantization, which is characterized in that described
Step S22 includes following sub-step:
Step S221, to WrQuantified:
Wr(i)≈αiBi,
Wherein, Wr(i)It is WrThe i-th row, αiIndicate a constant, BiIndicate Wr(i)Single order two-value it is approximate;
Step S222, to XrQuantified:
Xr(i)≈β1(i)H1(i)
Wherein, Xr(i)It is XrThe i-th row;
Step S223 utilizes the W after quantizationrAnd XrCarry out binaryzation convolutional calculation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810604458.6A CN108805286A (en) | 2018-06-12 | 2018-06-12 | High performance network accelerated method based on high-order residual quantization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810604458.6A CN108805286A (en) | 2018-06-12 | 2018-06-12 | High performance network accelerated method based on high-order residual quantization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108805286A true CN108805286A (en) | 2018-11-13 |
Family
ID=64087017
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810604458.6A Pending CN108805286A (en) | 2018-06-12 | 2018-06-12 | High performance network accelerated method based on high-order residual quantization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108805286A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111340201A (en) * | 2018-12-19 | 2020-06-26 | 北京地平线机器人技术研发有限公司 | Convolutional neural network accelerator and method for performing convolutional operation thereof |
WO2021044244A1 (en) * | 2019-09-03 | 2021-03-11 | International Business Machines Corporation | Machine learning hardware having reduced precision parameter components for efficient parameter update |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107516129A (en) * | 2017-08-01 | 2017-12-26 | 北京大学 | The depth Web compression method decomposed based on the adaptive Tucker of dimension |
-
2018
- 2018-06-12 CN CN201810604458.6A patent/CN108805286A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107516129A (en) * | 2017-08-01 | 2017-12-26 | 北京大学 | The depth Web compression method decomposed based on the adaptive Tucker of dimension |
Non-Patent Citations (1)
Title |
---|
ZEFAN LI,BINGBING NI等: "Performance Guaranteed Network Acceleration via High-Order Residual Quantization", 《ICCV2017》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111340201A (en) * | 2018-12-19 | 2020-06-26 | 北京地平线机器人技术研发有限公司 | Convolutional neural network accelerator and method for performing convolutional operation thereof |
WO2021044244A1 (en) * | 2019-09-03 | 2021-03-11 | International Business Machines Corporation | Machine learning hardware having reduced precision parameter components for efficient parameter update |
GB2600871A (en) * | 2019-09-03 | 2022-05-11 | Ibm | Machine learning hardware having reduced precision parameter components for efficient parameter update |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108985317B (en) | Image classification method based on separable convolution and attention mechanism | |
CN112101190B (en) | Remote sensing image classification method, storage medium and computing device | |
CN111882040B (en) | Convolutional neural network compression method based on channel number search | |
CN109002889B (en) | Adaptive iterative convolution neural network model compression method | |
CN105528638B (en) | The method that gray relative analysis method determines convolutional neural networks hidden layer characteristic pattern number | |
CN107832787A (en) | Recognition Method of Radar Emitters based on bispectrum own coding feature | |
CN110263863A (en) | Fine granularity mushroom phenotype recognition methods based on transfer learning Yu bilinearity InceptionResNetV2 | |
CN111915490A (en) | License plate image super-resolution reconstruction model and method based on multi-scale features | |
CN114037844A (en) | Global rank perception neural network model compression method based on filter characteristic diagram | |
CN110428045A (en) | Depth convolutional neural networks compression method based on Tucker algorithm | |
CN111612143A (en) | Compression method and system of deep convolutional neural network | |
CN109949200B (en) | Filter subset selection and CNN-based steganalysis framework construction method | |
Nugroho et al. | Hyper-parameter tuning based on random search for densenet optimization | |
CN111127490A (en) | Medical image segmentation method based on cyclic residual U-Net network | |
CN112183742A (en) | Neural network hybrid quantization method based on progressive quantization and Hessian information | |
CN109523016B (en) | Multi-valued quantization depth neural network compression method and system for embedded system | |
CN115759237A (en) | End-to-end deep neural network model compression and heterogeneous conversion system and method | |
CN108805286A (en) | High performance network accelerated method based on high-order residual quantization | |
CN112836820A (en) | Deep convolutional network training method, device and system for image classification task | |
CN113392871B (en) | Polarized SAR (synthetic aperture radar) ground object classification method based on scattering mechanism multichannel expansion convolutional neural network | |
CN114140641A (en) | Image classification-oriented multi-parameter self-adaptive heterogeneous parallel computing method | |
CN111582442A (en) | Image identification method based on optimized deep neural network model | |
CN116403090A (en) | Small-size target detection method based on dynamic anchor frame and transducer | |
CN116051861A (en) | Non-anchor frame target detection method based on heavy parameterization | |
CN115035408A (en) | Unmanned aerial vehicle image tree species classification method based on transfer learning and attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20181113 |