CN108805286A - High performance network accelerated method based on high-order residual quantization - Google Patents

High performance network accelerated method based on high-order residual quantization Download PDF

Info

Publication number
CN108805286A
CN108805286A CN201810604458.6A CN201810604458A CN108805286A CN 108805286 A CN108805286 A CN 108805286A CN 201810604458 A CN201810604458 A CN 201810604458A CN 108805286 A CN108805286 A CN 108805286A
Authority
CN
China
Prior art keywords
tensor
order
quantization
residual
input data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810604458.6A
Other languages
Chinese (zh)
Inventor
倪冰冰
李泽凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201810604458.6A priority Critical patent/CN108805286A/en
Publication of CN108805286A publication Critical patent/CN108805286A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present invention provides a kind of high performance network accelerated methods based on high-order residual quantization, including:Step S1 obtains a series of binary system data of different scales by quantization and recursive operation;Step S2 carries out convolution algorithm to the binary system data of different scale, and is combined to obtained operation result.The present invention is a kind of effective accurately depth network accelerating method.This concept of residual error is referred to indicate information loss and recursively calculate the residual error of the input data after various sizes of quantization to reduce information loss.It is inputted using binaryzation weight and binaryzation, the size of network is reduced to original 1/32 or so, and training speed is improved 30 times or so.Method proposed by the present invention additionally provides the possibility of the training depth convolutional network on CPU.Experimental result shows that HORQ networks proposed by the present invention have good classifying quality and acceleration effect.

Description

High-performance network acceleration method based on high-order residual quantization
Technical Field
The invention relates to a deep network acceleration method, in particular to a high-performance network acceleration method based on high-order residual quantization.
Background
The method of binarizing the input tensor has proven to be an effective network acceleration method. But the existing binarization method can be considered as a simple thresholding operation (first order approximation) for the pixel points and has a large loss of precision. Methods for accelerating the deep web learning speed can be roughly divided into three methods. The simplest approach is to perform network pruning and retrain the pruned network structure. In order to achieve higher structural compression rates, researchers have subsequently developed structural sparse approximation techniques to change larger-shaped sub-networks to shallow sub-networks. However, for networks with different network structures, this method requires designing a corresponding appropriate approximation network structure. Recently, the academia has proposed a new network binarization solution, which converts the network weight and the corresponding forward or backward data stream into a binarization representation form, so as to reduce the calculation amount and the network storage space. Some methods binarize the input image data also by a thresholding operation. However, although the method improves the network training speed, the accuracy of the classification network is also sharply reduced. Previous input data binarization operations, simply using positive and negative thresholds, can be considered as very coarse quantization for floating point data, which can also be considered as first order binary approximation methods.
In summary, none of the existing network acceleration methods can effectively and high-performance improve the network speed, and no explanation or report of the similar technology of the present invention is found, and no similar data is collected at home and abroad.
Disclosure of Invention
In view of the above-mentioned shortcomings in the prior art, the present invention aims to provide a high-performance network acceleration method based on high-order residual quantization, which is a high-order binarization scheme, and particularly a new binarization quantization mode called high-order residual quantization (HORQ) method, which realizes binarization for both input and weight, and can improve the calculation speed by binarization while achieving a more accurate approximate calculation process. In particular, the proposed scheme recursively performs residual quantization and produces a series of binarized input images of decreasing size. In addition, the invention also provides a forward and backward calculation high-order binary filtering and gradient propagation operation.
The invention is realized by the following technical scheme.
A high-performance network acceleration method based on high-order residual quantization comprises the following steps:
step S1, obtaining a series of binary input data with different scales through quantization and recursive operation;
step S2, carrying out convolution operation on binary input data with different scales, and combining the obtained operation results;
and step S3, training the convolutional neural network by using the result obtained in the step S2, and further completing the accelerated training of the convolutional neural network.
Preferably, the step S1 includes the following sub-steps:
step S11, calculating a first order residual error, and further approximating the first order residual error through thresholding operation;
in step S12, step S11 is recursively executed to obtain a series of binarized residual tensors corresponding to different quantization scales as binary input data.
Preferably, the step S11 includes the following sub-steps:
step S111, assuming an input data tensor matrix X, quantizing by the following process to obtain a first order residual error of the input data tensor matrix X:
X≈β1H1
wherein, beta1Is a real number; h1∈{+1,-1}nRepresenting a first order binary residual tensor; n is tensor H1Dimension (d);
step S112, optimizing the quantization result in step S111:
wherein J (-) represents a squared error loss function; the obtained optimization result is a first-order binarization quantization result:
wherein l1 is the norm of 1; value here
Step S113, calculating the actual input data X and the first-order binary quantization resultThe difference between them to define a first order binarized residual tensor R1(X), further approximating a first order residual:
R1(X)=X-β1H1
R1(X) is used to indicate information loss due to the approximation.
Preferably, step S12 includes the following sub-steps:
step S121, further quantizing R1(X) is represented as follows:
R1(X)=β2H2
wherein, beta2Is a real number; h2∈{+1,-1}nRepresenting a second order binary residual tensor, n being tensor H2Dimension (d);
obtaining a second order residual quantization of the input data:
X=β1H1+R1(X)≈β1H12H2
wherein, beta1and beta2Are respectively real scalar, H1And H2respectively a binary residual tensor, beta1H1called the first order binary input tensor, and β2H2Referred to as the second order binarized input tensor;
step S122, solving the optimization problem of the second-order binarization input tensor:
the obtained optimization results are as follows:
value hereFurther obtaining a second-order binarization residual tensor as follows:
R2(X)=R1(X)-β2H2
in the optimization process, the following inequality is obtained:
where the L2 norm of the second order binarized residual tensor is used to represent information loss.
Preferably, the method further comprises the following steps:
further quantizing second order twoValued residual tensor R2(X) quantizing the residual error of K orders to obtain the quantization of the residual error of K orders of the input data X:
wherein,
preferably, the step S2 includes the following sub-steps:
step S21, reshaping the binary input data;
in step S22, convolution calculation is performed using the reshaped binary input data.
Preferably, the step S21 includes the following sub-steps:
step S211, assume that the dimension of binary input data isOf the input data tensor matrix X, the convolution weight tensor matrix W has the dimension ofFor W, divide into coutA filter, each filter is reshaped to 1 × (c)inX w × h); the entire convolution weight tensor matrix W is reshaped into WrW obtained after remodelingrHas a dimension of cout×(cinX w × h); defining the output of the convolutional layer by Y, then the dimension of Y isFor the input data tensor matrix X, there is a total of hout×woutA sub-tensor, each sub-tensor is reshaped into the same dimension as the filter, so that a reshaped X is obtainedrHas a dimension of (c)in×w×h)×wout×hout
Wherein, cinNumber of channels representing input tensor, winWidth, h, of each channel representing the input tensorinHeight of each channel representing the input tensor, coutDenotes the number of convolution kernels, W denotes the width of the convolution kernel, h denotes the height of the convolution kernel, WoutWidth, h, of each channel representing the output tensoroutRepresenting the height of each channel of the output tensor;
step S212, multiply Y using the matrixr=WrXrTo represent the calculation result of the convolution operation;
step S213, converting YrTo Y to complete the entire reshaping calculation process.
Preferably, the step S22 includes the following sub-steps:
step S221, for WrAnd (3) quantization:
Wr(i)≈αiBi
wherein, Wr(i)Is Wrrow i of (a)iRepresents a constant, BiRepresents Wr(i)A first order binary approximation of;
step S222, for XrAnd (3) quantization:
Xr(i)≈β1(i)H1(i)
wherein, Xr(i)Is XrRow i of (1);
Step S223, using the quantized WrAnd XrAnd carrying out binarization convolution calculation.
The above-described binarization convolution calculation problem can be solved by using the algorithm in fig. 2.
The invention provides a high-order residue quantization-based high-performance network acceleration method, which is a binary quantization method through recursive thresholding operation, and provides a high-order residue quantization frame. Therefore, the present invention can obtain a series of binarized images corresponding to different quantization scales. Based on these binarized input tensors (stacked binarized maps of different scales), the present invention develops an efficient binarization filtering operation for forward and backward computations.
Compared with the prior art, the invention has the following beneficial effects:
the invention can accelerate the training speed of the network and reduce the information loss in the quantization process as little as possible; the method of the invention uses the input data of the binaryzation and the weight of the binaryzation at the same time, and after one-time quantization is finished, the next quantization operation is carried out recursively by using the calculated residual error. The method has strong flexibility and can adapt to different experimental conditions.
The invention can improve the running speed and ensure the training effect of the network. And the binary input and the weight are used, so that the network is accelerated, and the information loss is reduced as much as possible.
The present invention utilizes the concept of recursion to fully utilize the tensors generated by the recursion process. Combining the tensors generated by recursion, the quantization results of different scales are used.
The method can adapt to different hardware requirements and accelerate the selection of a proper residual error order. When the residual error order is two or three, the network can be accelerated, and the information loss can be reduced as much as possible.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a schematic diagram of a tensor reshaping process in accordance with an embodiment of the present invention;
FIG. 2 is a flow chart of an algorithm for second order binary convolution according to an embodiment of the present invention;
FIG. 3 is a flow chart of the training of the HORQ network according to an embodiment of the present invention;
FIG. 4 is a graph of experimental comparison results on a CIFAR-10 data set according to an embodiment of the present invention;
FIG. 5 is a graph of experimental comparison results on MNIST data sets;
FIG. 6 is a graph of acceleration rate versus number of channels in accordance with an embodiment of the present invention;
FIG. 7 is a graph of acceleration rate versus quantization order for one embodiment of the present invention;
FIG. 8 is a graph of acceleration rate versus convolution kernel size for one embodiment of the present invention;
fig. 9 is a schematic diagram of an algorithm of high order residual quantization according to an embodiment of the present invention.
Detailed Description
The following examples are given to illustrate the present invention, and the following examples are carried out on the premise of the technical solution of the present invention, and give detailed embodiments and specific procedures, but the scope of the present invention is not limited to the following examples.
The invention provides a high-performance network acceleration method based on high-order residual quantization, which is a high-order binarization scheme, realizes a more accurate approximate calculation process and simultaneously has the advantage that binarization can improve the calculation speed. In the following embodiments, in particular, the proposed scheme recursively performs residual quantization and produces a series of binarized input images of decreasing size. In addition, the embodiment also provides a forward and backward calculation high-order binary filtering and gradient propagation operation.
The following embodiment method is a new binarization quantization method called High Order Residual Quantization (HORQ). It implements binarization of both the input and the weights. First, a series of binary input data of different scales is obtained. And then, carrying out convolution operation on binary input data with different scales, and combining the obtained calculation results. This approach successfully reduces information loss.
The method specifically comprises the following steps:
a first order residual is calculated and then a new round of thresholding operation is performed to further approximate the first order residual. The binarized version approximation of the residual can be considered a higher order binary input. The above operations are performed recursively and finally a series of binarized residual tensors corresponding to different quantization scales can be obtained as binary input data. Based on these binary input data, an efficient binarization filtering operation for forward and backward computations is developed.
The input to the convolutional layer is a four-dimensional tensor. If the reshaped input tensor and corresponding weight filter are in a matrix, the convolution operation can be considered a matrix multiplication. The operation for each element in the matrix multiplication process can be considered a vector operation. Consider first the case where the input is a one-dimensional vector: suppose there is an input data tensor matrix as vector X ∈ Rnit can be quantified by a process in which X ≈ β1H1In which H is1∈{+1,-1}n. The results are then obtained by solving the following optimization problem:
the solution to this problem is:
wherein beta is1Is a real number. The above problem can be considered as a first order binarization quantization problem, and a first order residual tensor R can be defined by calculating the difference between the actual input and the result after the first order binarization quantization1(X):
R1(X)=X-β1H1
After the above parameter determination, R can be used1(X) represents information loss due to the approximation.
Due to R1(X) is a real number tensor, and R1(X) can be further quantized as follows:
R1(X)=β2H2
wherein beta is2Is a real number, H2∈{+1,-1}n
Then a second order residual quantization of the input data can be obtained:
X=β1H1+R1(X)≈β1H12H2
wherein beta is1and beta2Is a real scalar, H1And H2is the binarized residual tensor β1H1called the first order binary input tensor, and β2H2Referred to as the second order binarized input tensor. The above problem can be solved using a similar approach to the previous solution process.
Firstly, the corresponding optimization problem is solved:
the solution of the above equation is:
here, the number of the first and second electrodes,is the optimum beta1,H1take value of, for arbitrary beta1,H1Can all calculate the corresponding R1(X), corresponding R can be calculated in the same way2(X); for further explanation, values are given here
The binary approximation method provided by the embodiment of the invention has better theoretical and practical effects than the existing traditional method.
The information loss of the first and second order approximation methods can be compared. In the conventional first-order binarization method, R1(X) is defined as the residual approximation tensor. Similarly, the second order residual tensor in the new proposed method is defined as R2(X)=R1(X)-β2H2. In the optimization process, the following inequality can be obtained:
therefore, if the L2 norm of the residual tensor is used to represent the information loss, it can be demonstrated that the second-order residual quantization method proposed in the embodiment reduces the amount of information loss.
Further, the second order residual quantization method can be extended to K order residual quantization:
wherein,
higher order input tensors can be obtained by recursively calculating the residual tensor. In fact, if the order becomes higher, the information loss becomes less, but the amount of calculation increases much. It can be found that the effect of the second and third order residual quantization is good enough. In the embodiment, second-order residual quantization is mainly used, and appropriate operation amount can be kept on the premise of ensuring the residual quantization effect.
The HORQ binarization input is accepted, and high-order binarization filtering is carried out in the forward and backward calculation processes. First is the tensor reshaping process, as shown in figure 1. Assume that the binary input data is a tensor matrix X having dimensions ofThe convolution filter has dimension W ofFor the weight tensor W, the division into coutA filter, each of which can be reshaped to 1 × (c)inX w × h). So that the entire matrix W is reshaped into WrDimension is cout×(cinX w × h). Defining the output of the convolutional layer by Y, then the dimension of Y isFor the input tensor X, there is a total of hout×woutA sub-tensor, each sub-tensor is reshaped into the same dimension as the filter, so that a reshaped X is obtainedrDimension of (c)in×w×h)×wout×hout. Then using matrix multiplication Yr=WrXrTo represent the result of the convolution operation. Finally, the Y is putrTo Y to complete the entire reshaping calculation process.
After the tensor reshaping process, the next structure is a convolution computation process using second order residual quantization. The second order residual quantization process is used to calculate XrAnd WrThe matrix product of (2). WrFirst quantized to:
Wr(i)≈αiBi
wherein, Wr(i)Is WrRow i of (2). Then, we use second order residual quantization to pair the input matrix XrQuantization is performed.
Xr(i)≈β1(i)H1(i)
Wherein, Xr(i)Is XrRow i of (2). The above-described binarization convolution calculation problem can be solved by using the algorithm in fig. 2.
The process of training a HORQ network using the second order residual quantization method proposed in the above embodiment is shown in fig. 3. Common steps include forward propagation, backward propagation, and parameter updates. In the forward and backward propagation process, the input is the binary expression of the input and the weight. In fact, the high-order residual quantization method in the present invention can be conveniently applied in convolutional layers and fully-connected layers. In the training process, input and weight are quantized, and a binarization convolution process is calculated layer by layer. After forward propagation, the backward propagation is performed with the binarized weights and inputs, and the gradient is calculated using the sign function sign (). The actual values of the parameters and inputs are used in updating the parameters because the magnitude of the parameter update is small at each iteration. If the weight of binarization is updated, the updated value will disappear in the binarization process, and the training of the network will be affected.
The high-performance network acceleration method based on the high-order residual quantization provided by the embodiment is an effective and accurate deep network acceleration method of a high-order binary approximation method. The concept of residual is referred to represent information loss and the residual of quantized input data of different sizes is recursively calculated to reduce information loss. By using the binarization weight and the binarization input, the size of the network is reduced to about 1/32 of the original size, and the training speed is improved by about 30 times. The method proposed by the above embodiment also provides the possibility to train a deep convolutional network on the CPU. The experimental result shows that the HORQ network provided by the embodiment has good classification effect and acceleration effect.
This is further described below in conjunction with specific examples.
Example experiments were performed on both MNIST and CIFAR-10 databases. The MNIST database is a database for image classification, containing 0-9 digital images of handwriting. For comparison with other methods, the same MLP structure as the comparative method was also used in the experiment. This structure contains three hidden layers, a 4096-dimensional second-order residual quantized connected structure and an L2-SVM structure (using the Hinge error function). In order to train the MLP, convolution processing, data preprocessing, data gain and pre-training are not used. ADAM adaptive learning is used, and batch normalization is also used for batch size 200 data to improve training speed. The MLP structure of the first order binarization connection (XNOR) was also used in the experiments to compare with the final test accuracy of the method of the present embodiment. The two methods use the same network structure, and the final result shows that the method provided by the embodiment of the invention has 0.71% higher accuracy than the XNOR method, and the method in the embodiment of the invention has faster training convergence speed. From the variation process of the value of the hind error function, the error reduction process of the HORQ method appears smoother, although both methods can reduce the hind error to a relatively small value. Other researchers' work before used binarization weight and floating point number type inputs, and the method used in the embodiment of the present invention used binarization weight and binarization inputs. The experimental results on the MNIST database show that the method used in the embodiment of the invention can realize the speed increase of the deep network and reduce the precision loss as much as possible. As shown in fig. 4 and 5.
On the CIFAR-10 database, 50000 pictures were used for training and 10000 pictures were used for testing. No data preprocessing procedure and no method of data gain was used during the experiment. In order to show the difference between the second-order residual quantization method proposed by the embodiment of the present invention and the conventional first-order binarization quantization method (XNOR), a convolutional neural network with a smaller number of layers is used in the experiment. The number of samples per batch was 50, which enabled faster training and normalized training data.
In the comparison of experimental results, the results of baseline experiments are inferior to some comparative results because the convolutional neural network structure used in the experiments is not very complex, but a shallower network is advantageous for comparing the method of the present embodiment with the XNOR. On the premise of using the same network structure, the experimental result of the embodiment of the invention is about 5% higher than that of the XNOR, and the network convergence speed is basically the same. This demonstrates the greater effectiveness of the method employed in the embodiments of the present invention.
The calculated amount of the method proposed in the embodiment of the present invention is analyzed below. For a convolutional neural network, the input isThe convolution weight tensor matrix isTotal number of operations cout×cin×wh×winhin. By using the existing CPU suitable for calculating 64-bit binary data operation, the total number of operations required for K-order residual quantization used in the embodiment of the present invention is:
K×cout×cin×wh×winhin+(K+1)×winhin=KNp+(K+1)Nn
among these calculation operations, KNpOperations pertaining to binarization precision, which can be accelerated, and another (K +1) NnThe calculation operation is a floating point number precision operation and cannot be accelerated. Thus the acceleration rate is
For the second order case, the acceleration rate of the embodiment of the present invention is
As can be seen from the above equation, the acceleration rate is independent of the width and height of the input tensor, but is related to the size of the filter and the number of channels. First, the number of channels is fixed: c. CoutcinThe filter size versus acceleration rate was observed at 10 × 10, and the results are shown in fig. 6. Then, we fix the filter size to w × h 3 × 3, input the number of channels to 3, and observe what influence the input number of channels has on the acceleration rate, and the result is shown in fig. 7. The experimental result shows that if the channel number and the size of the filter are too small, the acceleration effect is small, so that the binarization method is applied to the embodiment of the inventionIn DCNN, the use of a network layer with too few quantization channels can be avoided. If the setting parameter is cincout64 × 256, w × h 3 × 3, the second quantization residual quantization method can be accelerated by a factor of 31.98. In practice, however, the acceleration rate may be slightly lower than this value due to limitations in the memory read process and the data processing process. The relationship of the quantization order to the acceleration ratio is shown in fig. 8. The experimental result shows that the second-order and third-order residual error quantification methods provided by the embodiment of the invention have a strong acceleration effect under the condition of ensuring the accuracy effect. The complete framework of the invention is shown in fig. 9.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims (8)

1. A high-performance network acceleration method based on high-order residual quantization is characterized by comprising the following steps:
step S1, obtaining a series of binary input data with different scales through quantization and recursive operation;
step S2, carrying out convolution operation on binary input data with different scales, and combining the obtained operation results;
and step S3, training the convolutional neural network by using the result obtained in the step S2, and further completing the accelerated training of the convolutional neural network.
2. The high-performance network acceleration method based on high-order residual quantization of claim 1, characterized in that said step S1 comprises the following sub-steps:
step S11, calculating a first order residual error, and further approximating the first order residual error through thresholding operation;
in step S12, step S11 is recursively executed to obtain a series of binarized residual tensors corresponding to different quantization scales as binary input data.
3. The high-performance network acceleration method based on high-order residual quantization of claim 2, characterized in that said step S11 comprises the following sub-steps:
step S111, assuming an input data tensor matrix X, quantizing by the following process to obtain a first order residual error of the input data tensor matrix X:
X≈β1H1
wherein, beta1Is a real number; h1∈{+1,-1}nRepresenting a first order binary residual tensor; n is tensor H1Dimension (d);
step S112, optimizing the quantization result in step S111:
wherein J (-) represents a squared error loss function; the obtained optimization result is a first-order binarization quantization result:
wherein l1 is the norm of 1; value here
Step S113, calculating the actual input numberAccording to X and the first-order binary quantization result The difference between them to define a first order binarized residual tensor R1(X), further approximating a first order residual:
R1(X)=X-β1H1
R1(X) is used to indicate information loss due to the approximation.
4. The high-performance network acceleration method based on high-order residual quantization of claim 3, characterized by that, step S12 comprises the following sub-steps:
step S121, further quantizing R1(X) is represented as follows:
R1(X)=β2H2
wherein, beta2Is a real number; h2∈{+1,-1}nRepresenting a second order binary residual tensor, n being tensor H2Dimension (d);
obtaining a second order residual quantization of the input data:
X=β1H1+R1(X)≈β1H12H2
wherein, beta1and beta2Are respectively real scalar, H1And H2respectively a binary residual tensor, beta1H1called the first order binary input tensor, and β2H2Referred to as the second order binarized input tensor;
step S122, solving the optimization problem of the second-order binarization input tensor:
the obtained optimization results are as follows:
value hereFurther obtaining a second-order binarization residual tensor as follows:
R2(X)=R1(X)-β2H2
in the optimization process, the following inequality is obtained:
where the L2 norm of the second order binarized residual tensor is used to represent information loss.
5. The high-performance network acceleration method based on high-order residual quantization of claim 4, characterized by further comprising the steps of:
further quantizing the second order binarization residual tensor R2(X) quantizing the residual error of K orders to obtain the quantization of the residual error of K orders of the input data X:
wherein,
6. the high-performance network acceleration method based on high-order residual quantization of claim 1, characterized in that said step S2 comprises the following sub-steps:
step S21, reshaping the binary input data;
in step S22, convolution calculation is performed using the reshaped binary input data.
7. The high-performance network acceleration method based on high-order residual quantization of claim 6, characterized in that said step S21 comprises the following sub-steps:
step S211, assume that the dimension of binary input data isOf the input data tensor matrix X, the convolution weight tensor matrix W has the dimension ofFor W, divide into coutA filter, each filter is reshaped to 1 × (c)inX w × h); the entire convolution weight tensor matrix W is reshaped into WrW obtained after remodelingrHas a dimension of cout×(cinX w × h); defining the output of the convolutional layer by Y, then the dimension of Y isFor the input data tensor matrix X, there is a total of hout×woutA sub-tensor, each sub-tensor is reshaped into the same dimension as the filter, so that a reshaped X is obtainedrHas a dimension of (c)in×w×h)×wout×hout
Wherein, cinNumber of channels representing input tensor, winWidth, h, of each channel representing the input tensorinHeight of each channel representing the input tensor, coutRepresenting the number of convolution kernels, w representing the width of the convolution kernels, h representing the height of the convolution kernels, woutWidth, h, of each channel representing the output tensoroutRepresenting the height of each channel of the output tensor;
step S212, multiply Y using the matrixr=WrXrTo represent the calculation result of the convolution operation;
step S213, converting YrIs reshaped into Y to complete the wholeAnd (4) a remodeling calculation process.
8. The high-performance network acceleration method based on high-order residual quantization of claim 7, characterized in that said step S22 comprises the following sub-steps:
step S221, for WrAnd (3) quantization:
Wr(i)≈αiBi
wherein, Wr(i)Is Wrrow i of (a)iRepresents a constant, BiRepresents Wr(i)A first order binary approximation of;
step S222, for XrAnd (3) quantization:
Xr(i)≈β1(i)H1(i)
wherein, Xr(i)Is XrRow i of (1);
step S223, using the quantized WrAnd XrAnd carrying out binarization convolution calculation.
CN201810604458.6A 2018-06-12 2018-06-12 High performance network accelerated method based on high-order residual quantization Pending CN108805286A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810604458.6A CN108805286A (en) 2018-06-12 2018-06-12 High performance network accelerated method based on high-order residual quantization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810604458.6A CN108805286A (en) 2018-06-12 2018-06-12 High performance network accelerated method based on high-order residual quantization

Publications (1)

Publication Number Publication Date
CN108805286A true CN108805286A (en) 2018-11-13

Family

ID=64087017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810604458.6A Pending CN108805286A (en) 2018-06-12 2018-06-12 High performance network accelerated method based on high-order residual quantization

Country Status (1)

Country Link
CN (1) CN108805286A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260023A (en) * 2018-11-30 2020-06-09 罗伯特·博世有限公司 Bit interpretation for convolutional neural network input layer
CN111340201A (en) * 2018-12-19 2020-06-26 北京地平线机器人技术研发有限公司 Convolutional neural network accelerator and method for performing convolutional operation thereof
CN111914986A (en) * 2019-05-10 2020-11-10 北京京东尚科信息技术有限公司 Method for determining binary convolution acceleration index and related equipment
WO2021044244A1 (en) * 2019-09-03 2021-03-11 International Business Machines Corporation Machine learning hardware having reduced precision parameter components for efficient parameter update
CN113420788A (en) * 2020-10-12 2021-09-21 黑芝麻智能科技(上海)有限公司 Integer-based fusion convolution layer in convolutional neural network and fusion convolution method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107516129A (en) * 2017-08-01 2017-12-26 北京大学 The depth Web compression method decomposed based on the adaptive Tucker of dimension

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107516129A (en) * 2017-08-01 2017-12-26 北京大学 The depth Web compression method decomposed based on the adaptive Tucker of dimension

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZEFAN LI,BINGBING NI等: "Performance Guaranteed Network Acceleration via High-Order Residual Quantization", 《ICCV2017》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260023A (en) * 2018-11-30 2020-06-09 罗伯特·博世有限公司 Bit interpretation for convolutional neural network input layer
CN111340201A (en) * 2018-12-19 2020-06-26 北京地平线机器人技术研发有限公司 Convolutional neural network accelerator and method for performing convolutional operation thereof
CN111914986A (en) * 2019-05-10 2020-11-10 北京京东尚科信息技术有限公司 Method for determining binary convolution acceleration index and related equipment
WO2021044244A1 (en) * 2019-09-03 2021-03-11 International Business Machines Corporation Machine learning hardware having reduced precision parameter components for efficient parameter update
GB2600871A (en) * 2019-09-03 2022-05-11 Ibm Machine learning hardware having reduced precision parameter components for efficient parameter update
CN113420788A (en) * 2020-10-12 2021-09-21 黑芝麻智能科技(上海)有限公司 Integer-based fusion convolution layer in convolutional neural network and fusion convolution method

Similar Documents

Publication Publication Date Title
CN108805286A (en) High performance network accelerated method based on high-order residual quantization
US20210089922A1 (en) Joint pruning and quantization scheme for deep neural networks
CN108985317B (en) Image classification method based on separable convolution and attention mechanism
Wu et al. Easyquant: Post-training quantization via scale optimization
US20190228268A1 (en) Method and system for cell image segmentation using multi-stage convolutional neural networks
CN110334580A (en) The equipment fault classification method of changeable weight combination based on integrated increment
CN109146000B (en) Method and device for improving convolutional neural network based on freezing weight
CN114241779B (en) Short-time prediction method, computer and storage medium for urban expressway traffic flow
Nakandala et al. Incremental and approximate inference for faster occlusion-based deep cnn explanations
Aaron et al. Dynamic incremental k-means clustering
CN108805257A (en) A kind of neural network quantization method based on parameter norm
CN113256508A (en) Improved wavelet transform and convolution neural network image denoising method
CN111539444A (en) Gaussian mixture model method for modified mode recognition and statistical modeling
CN111985825A (en) Crystal face quality evaluation method for roller mill orientation instrument
Zhou et al. Online filter weakening and pruning for efficient convnets
CN108805844B (en) Lightweight regression network construction method based on prior filtering
CN111476346A (en) Deep learning network architecture based on Newton conjugate gradient method
Chikin et al. Channel balancing for accurate quantization of winograd convolutions
CN110619311A (en) Data classification method based on EEMD-ICA-SVM
CN113780550A (en) Convolutional neural network pruning method and device for quantizing feature map similarity
CN108305219B (en) Image denoising method based on irrelevant sparse dictionary
US20230153624A1 (en) Learning method and system for object tracking based on hybrid neural network
CN110837853A (en) Rapid classification model construction method
CN110288002A (en) A kind of image classification method based on sparse Orthogonal Neural Network
CN114492786A (en) Visual transform pruning method for alternative direction multipliers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181113