CN108805286A - High performance network accelerated method based on high-order residual quantization - Google Patents

High performance network accelerated method based on high-order residual quantization Download PDF

Info

Publication number
CN108805286A
CN108805286A CN201810604458.6A CN201810604458A CN108805286A CN 108805286 A CN108805286 A CN 108805286A CN 201810604458 A CN201810604458 A CN 201810604458A CN 108805286 A CN108805286 A CN 108805286A
Authority
CN
China
Prior art keywords
tensor
order
binaryzation
quantization
indicate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810604458.6A
Other languages
Chinese (zh)
Inventor
倪冰冰
李泽凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201810604458.6A priority Critical patent/CN108805286A/en
Publication of CN108805286A publication Critical patent/CN108805286A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present invention provides a kind of high performance network accelerated methods based on high-order residual quantization, including:Step S1 obtains a series of binary system data of different scales by quantization and recursive operation;Step S2 carries out convolution algorithm to the binary system data of different scale, and is combined to obtained operation result.The present invention is a kind of effective accurately depth network accelerating method.This concept of residual error is referred to indicate information loss and recursively calculate the residual error of the input data after various sizes of quantization to reduce information loss.It is inputted using binaryzation weight and binaryzation, the size of network is reduced to original 1/32 or so, and training speed is improved 30 times or so.Method proposed by the present invention additionally provides the possibility of the training depth convolutional network on CPU.Experimental result shows that HORQ networks proposed by the present invention have good classifying quality and acceleration effect.

Description

High performance network accelerated method based on high-order residual quantization
Technical field
The present invention relates to a kind of method of depth network acceleration, specifically a kind of high property based on high-order residual quantization It can network accelerating method.
Background technology
It is a kind of effective network accelerating method to be proven to the method that input tensor carries out binaryzation.But it is existing Binarization method be considered the simple thresholding operation (first approximation) for pixel, and with larger Loss of significance.The method for accelerating depth e-learning speed is broadly divided into three kinds.Simplest method is to execute network cut, And the network structure after re -training trimming.In order to realize higher structure compresses rate, there is researcher to develop structure later Sparse approximation technology, for the larger sub-network of shape is become shallow-layer sub-network.But this method is for having difference The network of network structure needs the corresponding approximation network structure appropriate of design.Nearest academia proposes a kind of new net The weight of network and corresponding forward or a backward stream compression are changed to the expression shape of binaryzation by network binaryzation solution Formula, so as to reduce calculation amount and network storage space.Some methods carry out input image data also by threshold operation Binaryzation.But although this method improves network training speed, also simultaneously so that sorter network accuracy rate drastically under Drop.Pervious input data binarization operation simply uses positive and negative threshold value, is considered for floating number data Very rough quantization, it can also be considered as single order two-value approximation method.
In conclusion existing network accelerating method, cannot effective and high performance raising network speed, do not have at present It is found the explanation or report of technology similar to the present invention, is also not yet collected into data similar both at home and abroad.
Invention content
Aiming at the above shortcomings existing in the prior art, the purpose of the present invention is to propose to a kind of based on high-order residual quantization High performance network accelerated method, this method are a kind of high-order binaryzation schemes, and specifically a kind of new binaryzation quantification manner claims For the method for high-order residual quantization (HORQ), it realizes the binaryzation of both input and weight, can be more accurate approximate Still have the advantages that binaryzation can improve calculating speed while calculating process.Specifically, the scheme recurrence proposed Ground executes residual quantization, and generates a series of binaryzation input pictures with the size successively decreased.In addition, the invention also provides The filtering of high-order binary system and gradient propagation operation that a kind of forward and backward calculates.
The present invention is achieved by the following technical solutions.
A kind of high performance network accelerated method based on high-order residual quantization, includes the following steps:
Step S1 obtains a series of binary system data of different scales by quantization and recursive operation;
Step S2 carries out convolution algorithm to the binary system data of different scale, and the operation result to obtaining carries out In conjunction with;
Step S3 using being obtained in step S2 as a result, training convolutional neural networks, and then is completed to convolutional neural networks Acceleration training.
Preferably, the step S1 includes following sub-step:
Step S11 calculates single order residual error, and is operated by thresholding, further approximation single order residual error;
Step S12, recurrence execute step S11, obtain a series of binaryzation residual error tensors corresponding to different quantization scales, As binary system data.
Preferably, the step S11 includes following sub-step:
Step S111, it is assumed that an input data tensor matrix X is quantified by following process, obtains input data tensor The single order residual error of matrix X:
X≈β1H1
Wherein, β1For real number;H1∈ {+1, -1 }n, indicate single order two-value residual error tensor;N is tensor H1Dimension;
Step S112 optimizes the quantized result in step S111:
Wherein, J () indicates square error loss function;Obtained optimum results are single order binaryzation quantized result:
Wherein, the norm that l1 is 1;Value herein
Step S113, by calculating actual input data X and single order binaryzation quantized resultBetween difference determine Adopted single order binaryzation residual error tensor R1(X), and then further approximate single order residual error:
R1(X)=X- β1H1
R1(X) it is used to indicate to lose due to information caused by approximation.
Preferably, step S12 includes following sub-step:
Step S121, further quantifies R1(X) following to indicate:
R1(X)=β2H2
Wherein, β2It is real number;H2∈ {+1, -1 }n, indicate that second order two-value residual error tensor, n are tensor H2Dimension;
Obtain the second order quantized residual of input data:
X=β1H1+R1(X)≈β1H12H2
Wherein, β1And β2It is real number scalar, H respectively1And H2It is two-value residual error tensor respectively;β1H1Referred to as single order binaryzation is defeated Enter tensor, and β2H2Referred to as second order binaryzation inputs tensor;
Step S122 solves the optimization problem of second order binaryzation input tensor:
Obtained optimum results are:
Value hereinFurther obtaining second order binaryzation residual error tensor is:
R2(X)=R1(X)-β2H2
In optimization process, obtain such as lower inequality:
Wherein, the L2 norms of second order binaryzation residual error tensor are used for indicating information loss.
Preferably, further include following steps:
Further quantization second order binaryzation residual error tensor R2(X) K rank residual quantizations are arrived, the K rank residual errors of input data X are obtained Quantization:
Wherein,
Preferably, the step S2 includes following sub-step:
Step S21 remolds binary system data;
Step S22 carries out convolutional calculation using the binary system data after remodeling.
Preferably, the step S21 includes following sub-step:
Step S211, it is assumed that binary system data are that dimension isInput data tensor matrix X, convolution Weight tensor matrix W dimension isFor W, it is divided into coutA filter is 1 the remodeling of each filter ×(cin×w×h);So entire convolution weight tensor matrix W is remolded as Wr, the W that is obtained after remodelingrDimension be cout× (cin×w×h);The output of convolutional layer is defined with Y, then the dimension of Y isFor input data tensor For matrix X, a shared hout×woutA sub- tensor, every sub- tensor remodeling at the dimension as filter, so weight The X obtained after modelingrDimension be (cin×w×h)×wout×hout
Wherein, cinIndicate the channel number of input tensor, winIndicate the width in each channel of input tensor, hinIndicate defeated Enter the height in each channel of tensor, coutIndicate that convolution kernel number, w indicate that the width of convolution kernel, h indicate the height of convolution kernel, WoutIndicate the width in each channel of output tensor, houtIndicate the height in each channel of output tensor;
Step S212 uses matrix multiplication Yr=WrXrTo indicate the result of calculation of convolution operation;
Step S213, YrDimension remodeling complete entirely to remold calculating process at Y.
Preferably, the step S22 includes following sub-step:
Step S221, to WrQuantified:
Wr(i)≈αiBi,
Wherein, Wr(i)It is WrThe i-th row, αiIndicate a constant, BiIndicate Wr(i)Single order two-value it is approximate;
Step S222, to XrQuantified:
Xr(i)≈β1(i)H1(i)
Wherein, Xr(i)It is XrThe i-th row;
Step S223 utilizes the W after quantizationrAnd XrCarry out binaryzation convolutional calculation.
Above-mentioned binaryzation convolutional calculation problem can be solved with the algorithm in Fig. 2.
High performance network accelerated method provided by the invention based on high-order residual quantization is one kind by recurrence thresholding The two-value quantization method of operation, it is proposed that high-order residual quantization frame, by a threshold operation, network structure can calculate residual Mistake is poor, then executes the threshold operation of a new round, with further approximate residual error.Therefore, the present invention can be obtained corresponding to not With a series of binary images of quantization scale.Tensor (the stacking binary picture of different scale) is inputted based on these binaryzations, The present invention develops the effective binaryzation filter operation calculated for forward and backward.
Compared with prior art, the present invention has the advantages that:
The present invention can either make the training speed of network accelerate, additionally it is possible to the information during reduction quantization few as possible Loss;Method in the present invention is simultaneously using the weight of the input data and binaryzation of binaryzation, after the completion of primary quantization, profit With calculated residual error, recurrence carries out carry out quantization operation next time.This method can be strong with flexibility, is adapted to different Experiment condition.
The present invention can either improve the speed of service, additionally it is possible to ensure the training effect of network.Using binaryzation input and Weight accelerates to reduce information loss to the greatest extent while network.
Present invention utilizes recursive thought, the tensor generated by recursive procedure all of.Recurrence is generated Tensor be combined, the quantized result of different scale size is used.
The present invention is adapted to different hsrdware requirements and the suitable residual error exponent number of accelerated selection.Residual error exponent number is two or three When, network and the information loss for being as possible can be accelerated to reduce.
Description of the drawings
Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention, Objects and advantages will become more apparent upon:
Fig. 1 is the tensor remodeling process schematic diagram of one embodiment of the invention;
Fig. 2 is the algorithm flow chart for second order two-value convolution of one embodiment of the invention;
Fig. 3 is the training flow chart of the HORQ networks of one embodiment of the invention;
Fig. 4 is Experimental comparison results figure of the one embodiment of the invention on CIFAR-10 data sets;
Fig. 5 is Experimental comparison results' figure on MNIST data sets;
Fig. 6 is the acceleration rate of one embodiment of the invention and the relational graph of channel number;
Fig. 7 is the relational graph of the acceleration rate and quantization exponent number of one embodiment of the invention;
Fig. 8 is the relational graph of the acceleration rate and convolution kernel size of one embodiment of the invention;
Fig. 9 is the algorithm principle figure of the high-order residual quantization of one embodiment of the invention.
Specific implementation mode
Elaborate below to the embodiment of the present invention, following embodiment under based on the technical solution of the present invention into Row is implemented, and gives detailed embodiment and specific operating process, but protection scope of the present invention is not limited to following realities Apply example.
The present invention provides a kind of high performance network accelerated methods based on high-order residual quantization, are a high-order binaryzations Scheme, realizing while more accurate approximate calculation process still has the advantages that binaryzation can improve calculating speed.With In lower embodiment, specifically, the scheme proposed recursively executes residual quantization, and generates a series of with the size successively decreased Binaryzation input picture.In addition, embodiment also proposed the high-order binary system filtering of forward and backward calculating and gradient is propagated Operation.
Following embodiment method is a kind of method that new binaryzation quantification manner is known as high-order residual quantization (HORQ).It Realize the binaryzation of both input and weight.First, a series of binary system data of different scales are obtained.Then, right The binary system data of different scale carry out convolution algorithm, and are combined to obtained result of calculation.The method successfully subtracts Information loss is lacked.
Specially:
Single order residual error is calculated, the thresholding operation of a new round is then executed, with further approximate single order residual error.The two of residual error Value formal approximation is considered high-order binary system.Aforesaid operations are recursively executed, and a system can be finally obtained Row correspond to the binaryzation residual error tensor of different quantization scales, as binary system data.Based on these binary input numbers According to the effective binaryzation filter operation that exploitation is calculated for forward and backward.
The input of convolutional layer is four dimensional tensor.If the remodeling input corresponding weight filter of tensor sum is at matrix, convolution Operation is considered matrix multiplication.During matrix multiplication vector behaviour is can be considered as the operation of each element Make.The case where input is a n dimensional vector n is considered first:Assuming that being vector X ∈ R there are one input data tensor matrixn, it It can be quantified by following process:X≈β1H1, wherein H1∈ {+1, -1 }n.Then it is tied by solving following optimization problem Fruit:
The solution of this problem is:
Wherein β1It is a real number.The above problem is considered single order binaryzation quantification problem, can pass through calculating Actually enter and the quantization of single order binaryzation after result between difference define single order residual error tensor R1(X):
R1(X)=X- β1H1
After above-mentioned parameter determines, R can be used1(X) it indicates to lose due to information caused by approximation.
Due to R1(X) it is real number tensor, can further quantifies R1 (X) and indicate as follows:
R1(X)=β2H2
Wherein β2It is real number, H2∈ {+1, -1 }n
The second order quantized residual of input data can so be obtained:
X=β1H1+R1(X)≈β1H12H2
Wherein β1And β2It is real number scalar, H1And H2It is binaryzation residual error tensor.β1H1Referred to as single order binaryzation inputs tensor, And β2H2Referred to as second order binaryzation inputs tensor.Using method similar with solution procedure before, can solve the above problems.
First, corresponding optimization problem is solved:
The solution of last formula is:
Herein,It is best β1, H1Value, and for arbitrary β1, H1, corresponding R can be calculated1(X), together Reason can calculate corresponding R2(X);In order to further illustrate value herein
The binaryzation approximation method newly proposed in the embodiment of the present invention in theory in actual effect than already existing Conventional method is good very much.
The information loss of single order and Two-order approximation method can be compared.In the binarization method of conventional first order, R1(X) quilt It is defined as residual error approximation tensor.Similar, second order residual error tensor is defined as R in the method newly proposed2(X)=R1(X)-β2H2.In optimization process, it can obtain such as lower inequality:
Therefore, if indicating information loss using the L2 norms of residual error tensor, two proposed in embodiment are able to demonstrate that Rank residual quantization method reduces information loss amount.
Furthermore, it is understood that second order residual quantization method can be extended to K rank residual quantizations:
Wherein,
The input tensor of higher order can be obtained by recursively calculating residual error tensor.In fact, if exponent number is got higher, Information loss can become less, but calculation amount can increase many.It can be found that the effect of second order and three rank residual quantizations is It is good enough.Second order residual quantization is mainly used in embodiment, can keep appropriate under the premise of ensureing residual quantization effect Operand.
Receive the input of HORQ binaryzations, and high-order binaryzation filtering is carried out in forward and backward calculating process.It is first Tensor remodeling process, as shown in Figure 1.Assuming that binary system data are tensor matrix X, dimension isConvolution Filter W dimensions areFor weight tensor W, it is divided into coutA filter can think highly of each filtering Modeling is 1 × (cin×w×h).So entire matrix W is remolded as Wr, dimension cout×(cin×w×h).Convolution is defined with Y The output of layer, then the dimension of Y isFor input tensor X, a shared hout×woutA sub- tensor, Every sub- tensor remodeling at the dimension as filter, so the X obtained after remodelingrDimension is (cin×w×h)×wout ×hout.Then matrix multiplication Y is usedr=WrXrTo indicate the result of calculation of convolution operation.Finally, YrDimension remold at Y To complete entirely to remold calculating process.
After tensor remodeling process, next structure is the convolutional calculation process using second order residual quantization.Second order Residual quantization process is used to calculate XrAnd WrMatrix product calculate.WrIt is first quantized into:
Wr(i)≈αiBi,
Wherein, Wr(i)It is WrThe i-th row.Later, we using second order residual quantization come to input matrix XrQuantified.
Xr(i)≈β1(i)H1(i)
Wherein, Xr(i)It is XrThe i-th row.Above-mentioned binaryzation convolutional calculation problem can be solved with the algorithm in Fig. 2.
The process of a HORQ network is trained to show in figure 3 using the second order residual quantization method proposed in above-described embodiment Show.General procedure includes propagated forward, back-propagating and parameter update.In the communication process of forward and backward, input It is the binaryzation expression-form of input and weight.In fact, high-order residual quantization method in the present invention can in convolutional layer and It is easily applied in full articulamentum.In the training process, input and weight are quantified, and successively calculates binaryzation convolution Process.After propagated forward, with the weight of binaryzation and input to carry out backpropagation, and come using sign function sign () Calculate gradient.In undated parameter using the actual value of parameter and input, this is because parameter newer amplitude when each iteration All very littles.If updating the weight of binaryzation, updated value can disappear during binaryzation, and the training of network can be by shadow It rings.
The high performance network accelerated method based on high-order residual quantization that above-described embodiment provides, is a kind of high-order binaryzation The effective accurate depth network accelerating method of approximation method.This concept of residual error is referred to indicate information loss and pass Calculate the residual error of the input data after various sizes of quantization with returning to reduce information loss.Use binaryzation weight and binaryzation Input, the size of network is reduced to original 1/32 or so, and training speed is improved 30 times or so.Above-described embodiment carries The method gone out additionally provides the possibility of the training depth convolutional network on CPU.Experimental result shows the HORQ that embodiment proposes Network has good classifying quality and acceleration effect.
It is further described with reference to specific example.
Embodiment is tested on two databases of MNIST and CIFAR-10.MNIST databases are for image point The database of class includes the 0-9 digital pictures of handwritten form.It is compared with other methods, also uses in an experiment for convenience MLP structures as control methods.Second order residual quantization connection structure of this structure comprising three hidden layers, 4096 dimensions And L2-SVM structures (using Hinge error functions).In order to train MLP, process of convolution, data prediction, number are not used According to modes such as gain and pre-training.ADAM adaptive learning methods are used, it is 200 to also use for every batch of batch size Data are normalized to improve training speed to carry out batch.The MLP of single order binaryzation connection (XNOR) is also used in an experiment Structure is compared come the final test accuracy rate of the method with the embodiment of the present invention.Both methods has used identical net Network structure, final result show that the method ratio XNOR methods proposed in the embodiment of the present invention are high by 0.71% in accuracy rate, and Method in the embodiment of the present invention has faster convergence speed.From the point of view of the change procedure of Hinge error function values, though Right two methods can make Hinge errors drop to relatively small numerical value, and the error of HORQ methods declines process and shows It obtains more smooth.The work of other researchers before uses the input of binaryzation weight and float, present invention implementation The method used in example uses the input of the weight and binaryzation of binaryzation.Experimental result on MNIST databases illustrates this The method used in inventive embodiments can be realized reduces loss of significance to the greatest extent while accelerating the speed of depth network.Specifically As shown in Figure 4,5.
On CIFAR-10 databases, 50000 pictures are used to train, and 10000 pictures are used to test.It is testing In the process without the method using process of data preprocessing and data gain.In order to show the second order of proposition of the embodiment of the present invention The difference of residual quantization method and traditional single order binaryzation quantization method (XNOR), the convolution for having used the number of plies less in experiment Neural network.Sample number per batch processing is 50, enables to training speed more faster, and returned to training data One change and standardization.
In the comparison procedure of experimental result, due to experiment use convolutional neural networks structure be not it is very complicated, The result of baseline experiment is not so good as some comparing results, but shallower network be conducive to compare the embodiment of the present invention method and XNOR.Under the premise of using identical network structure, the experimental result high 5% of the experimental result ratio XNOR of the embodiment of the present invention is left The speed on the right side, network convergence is essentially identical.This illustrates the more effective fruit of the method used in the embodiment of the present invention.
The calculation amount of the method to being proposed in the embodiment of the present invention is analyzed below.One convolutional neural networks is come It says, inputs and beConvolution weight tensor matrix isOperation sum is cout×cin× wh×winhin.Using the existing suitable CPU for calculating 64 binaryzation data operations, the K rank residual errors that the embodiment of the present invention uses Quantization need operation sum be:
K×cout×cin×wh×winhin+(K+1)×winhin=KNp+(K+1)Nn
In these calculating operations, KNpA operation for belonging to binaryzation precision, can be accelerated, (K+1) in addition NnCalculating operation is the operation of floating number precision, cannot be accelerated.Therefore rate of acceleration is
The rate of acceleration of the case where for second order, the embodiment of the present invention is
From the equations above as can be seen that width and height of the rate of acceleration independent of input tensor, but and filter Size it is related with port number.First, port number is fixed:coutcin=10 × 10 carry out observation filter device size to accelerating The size of rate, the results are shown in Figure 6.Then, filter size is fixed as w × h=3 × 3 by we, and the port number of input is 3, Observation input channel number has any influence to rate of acceleration, and the results are shown in Figure 7.Experimental result is shown if port number and filter Size is excessively small, the effect very little of acceleration, therefore binarization method is applied in DCNN in embodiments of the present invention, energy It enough avoids using the very few network layer of quantization port number.If setup parameter is cincout=64 × 256, w × h=3 × 3, then Second order residual quantization method can speed up 31.98 times.But in a practical situation, rate of acceleration may be slightly below this value, be Due to the limitation of memory reading process and data handling procedure.The relationship for quantifying exponent number and speed-up ratio is as shown in Figure 8.It is tied from experiment Fruit can be seen that the second order of proposition of the embodiment of the present invention and three rank residual quantization methods can be the case where ensureing accuracy rate effect Under have the acceleration effect of very strength.The complete frame of the present invention is as shown in Figure 9.
Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited in above-mentioned Particular implementation, those skilled in the art can make various deformations or amendments within the scope of the claims, this not shadow Ring the substantive content of the present invention.

Claims (8)

1. a kind of high performance network accelerated method based on high-order residual quantization, which is characterized in that include the following steps:
Step S1 obtains a series of binary system data of different scales by quantization and recursive operation;
Step S2 carries out convolution algorithm to the binary system data of different scale, and is combined to obtained operation result;
Step S3 using obtaining in step S2 as a result, training convolutional neural networks, and then completes to add convolutional neural networks Speed training.
2. the high performance network accelerated method according to claim 1 based on high-order residual quantization, which is characterized in that described Step S1 includes following sub-step:
Step S11 calculates single order residual error, and is operated by thresholding, further approximation single order residual error;
Step S12, recurrence execute step S11, obtain a series of binaryzation residual error tensors corresponding to different quantization scales, as Binary system data.
3. the high performance network accelerated method according to claim 2 based on high-order residual quantization, which is characterized in that described Step S11 includes following sub-step:
Step S111, it is assumed that an input data tensor matrix X is quantified by following process, obtains input data tensor matrix X Single order residual error:
X≈β1H1
Wherein, β1For real number;H1∈ {+1, -1 }n, indicate single order two-value residual error tensor;N is tensor H1Dimension;
Step S112 optimizes the quantized result in step S111:
Wherein, J () indicates square error loss function;Obtained optimum results are single order binaryzation quantized result:
Wherein, the norm that l1 is 1;Value herein
Step S113, by calculating actual input data X and single order binaryzation quantized result Between difference define one Rank binaryzation residual error tensor R1(X), and then further approximate single order residual error:
R1(X)=X- β1H1
R1(X) it is used to indicate to lose due to information caused by approximation.
4. the high performance network accelerated method according to claim 3 based on high-order residual quantization, which is characterized in that step S12 includes following sub-step:
Step S121, further quantifies R1(X) following to indicate:
R1(X)=β2H2
Wherein, β2It is real number;H2∈ {+1, -1 }n, indicate that second order two-value residual error tensor, n are tensor H2Dimension;
Obtain the second order quantized residual of input data:
X=β1H1+R1(X)≈β1H12H2
Wherein, β1And β2It is real number scalar, H respectively1And H2It is two-value residual error tensor respectively;β1H1Referred to as single order binaryzation input Amount, and β2H2Referred to as second order binaryzation inputs tensor;
Step S122 solves the optimization problem of second order binaryzation input tensor:
Obtained optimum results are:
Value hereinFurther obtaining second order binaryzation residual error tensor is:
R2(X)=R1(X)-β2H2
In optimization process, obtain such as lower inequality:
Wherein, the L2 norms of second order binaryzation residual error tensor are used for indicating information loss.
5. the high performance network accelerated method according to claim 4 based on high-order residual quantization, which is characterized in that also wrap Include following steps:
Further quantization second order binaryzation residual error tensor R2(X) K rank residual quantizations are arrived, the K rank residual quantizations of input data X are obtained:
Wherein,
6. the high performance network accelerated method according to claim 1 based on high-order residual quantization, which is characterized in that described Step S2 includes following sub-step:
Step S21 remolds binary system data;
Step S22 carries out convolutional calculation using the binary system data after remodeling.
7. the high performance network accelerated method according to claim 6 based on high-order residual quantization, which is characterized in that described Step S21 includes following sub-step:
Step S211, it is assumed that binary system data are that dimension isInput data tensor matrix X, convolution weight Tensor matrix W dimension isFor W, it is divided into coutA filter, the remodeling of each filter for 1 × (cin×w×h);So entire convolution weight tensor matrix W is remolded as Wr, the W that is obtained after remodelingrDimension be cout×(cin ×w×h);The output of convolutional layer is defined with Y, then the dimension of Y isFor input data tensor matrix For X, a shared hout×woutA sub- tensor, every sub- tensor remodeling at the dimension as filter, so after remodeling Obtained XrDimension be (cin×w×h)×wout×hout
Wherein, cinIndicate the channel number of input tensor, winIndicate the width in each channel of input tensor, hinIndicate input Measure the height in each channel, coutIndicate that convolution kernel number, w indicate that the width of convolution kernel, h indicate the height of convolution kernel, woutTable Show the width in each channel of output tensor, houtIndicate the height in each channel of output tensor;
Step S212 uses matrix multiplication Yr=WrXrTo indicate the result of calculation of convolution operation;
Step S213, YrDimension remodeling complete entirely to remold calculating process at Y.
8. the high performance network accelerated method according to claim 7 based on high-order residual quantization, which is characterized in that described Step S22 includes following sub-step:
Step S221, to WrQuantified:
Wr(i)≈αiBi,
Wherein, Wr(i)It is WrThe i-th row, αiIndicate a constant, BiIndicate Wr(i)Single order two-value it is approximate;
Step S222, to XrQuantified:
Xr(i)≈β1(i)H1(i)
Wherein, Xr(i)It is XrThe i-th row;
Step S223 utilizes the W after quantizationrAnd XrCarry out binaryzation convolutional calculation.
CN201810604458.6A 2018-06-12 2018-06-12 High performance network accelerated method based on high-order residual quantization Pending CN108805286A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810604458.6A CN108805286A (en) 2018-06-12 2018-06-12 High performance network accelerated method based on high-order residual quantization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810604458.6A CN108805286A (en) 2018-06-12 2018-06-12 High performance network accelerated method based on high-order residual quantization

Publications (1)

Publication Number Publication Date
CN108805286A true CN108805286A (en) 2018-11-13

Family

ID=64087017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810604458.6A Pending CN108805286A (en) 2018-06-12 2018-06-12 High performance network accelerated method based on high-order residual quantization

Country Status (1)

Country Link
CN (1) CN108805286A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340201A (en) * 2018-12-19 2020-06-26 北京地平线机器人技术研发有限公司 Convolutional neural network accelerator and method for performing convolutional operation thereof
WO2021044244A1 (en) * 2019-09-03 2021-03-11 International Business Machines Corporation Machine learning hardware having reduced precision parameter components for efficient parameter update

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107516129A (en) * 2017-08-01 2017-12-26 北京大学 The depth Web compression method decomposed based on the adaptive Tucker of dimension

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107516129A (en) * 2017-08-01 2017-12-26 北京大学 The depth Web compression method decomposed based on the adaptive Tucker of dimension

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZEFAN LI,BINGBING NI等: "Performance Guaranteed Network Acceleration via High-Order Residual Quantization", 《ICCV2017》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340201A (en) * 2018-12-19 2020-06-26 北京地平线机器人技术研发有限公司 Convolutional neural network accelerator and method for performing convolutional operation thereof
WO2021044244A1 (en) * 2019-09-03 2021-03-11 International Business Machines Corporation Machine learning hardware having reduced precision parameter components for efficient parameter update
GB2600871A (en) * 2019-09-03 2022-05-11 Ibm Machine learning hardware having reduced precision parameter components for efficient parameter update

Similar Documents

Publication Publication Date Title
CN108985317B (en) Image classification method based on separable convolution and attention mechanism
CN112101190B (en) Remote sensing image classification method, storage medium and computing device
CN111882040B (en) Convolutional neural network compression method based on channel number search
CN109002889B (en) Adaptive iterative convolution neural network model compression method
CN105528638B (en) The method that gray relative analysis method determines convolutional neural networks hidden layer characteristic pattern number
CN107832787A (en) Recognition Method of Radar Emitters based on bispectrum own coding feature
CN110263863A (en) Fine granularity mushroom phenotype recognition methods based on transfer learning Yu bilinearity InceptionResNetV2
CN111915490A (en) License plate image super-resolution reconstruction model and method based on multi-scale features
CN114037844A (en) Global rank perception neural network model compression method based on filter characteristic diagram
CN110428045A (en) Depth convolutional neural networks compression method based on Tucker algorithm
CN111612143A (en) Compression method and system of deep convolutional neural network
CN109949200B (en) Filter subset selection and CNN-based steganalysis framework construction method
Nugroho et al. Hyper-parameter tuning based on random search for densenet optimization
CN111127490A (en) Medical image segmentation method based on cyclic residual U-Net network
CN112183742A (en) Neural network hybrid quantization method based on progressive quantization and Hessian information
CN109523016B (en) Multi-valued quantization depth neural network compression method and system for embedded system
CN115759237A (en) End-to-end deep neural network model compression and heterogeneous conversion system and method
CN108805286A (en) High performance network accelerated method based on high-order residual quantization
CN112836820A (en) Deep convolutional network training method, device and system for image classification task
CN113392871B (en) Polarized SAR (synthetic aperture radar) ground object classification method based on scattering mechanism multichannel expansion convolutional neural network
CN114140641A (en) Image classification-oriented multi-parameter self-adaptive heterogeneous parallel computing method
CN111582442A (en) Image identification method based on optimized deep neural network model
CN116403090A (en) Small-size target detection method based on dynamic anchor frame and transducer
CN116051861A (en) Non-anchor frame target detection method based on heavy parameterization
CN115035408A (en) Unmanned aerial vehicle image tree species classification method based on transfer learning and attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181113