CN114611665A - Multi-precision hierarchical quantization method and device based on weight oscillation influence degree - Google Patents

Multi-precision hierarchical quantization method and device based on weight oscillation influence degree Download PDF

Info

Publication number
CN114611665A
CN114611665A CN202210217282.5A CN202210217282A CN114611665A CN 114611665 A CN114611665 A CN 114611665A CN 202210217282 A CN202210217282 A CN 202210217282A CN 114611665 A CN114611665 A CN 114611665A
Authority
CN
China
Prior art keywords
quantization
neural network
weight
oscillation
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210217282.5A
Other languages
Chinese (zh)
Inventor
宋萍
刘宏博
郄有田
李一凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202210217282.5A priority Critical patent/CN114611665A/en
Publication of CN114611665A publication Critical patent/CN114611665A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The multi-precision layered quantization method and device based on the weight oscillation influence degree can guarantee the accuracy of the neural network, solve the problems of structural redundancy and complex parameters of the trained neural network, reduce the calculation amount and memory consumption and realize the on-chip operation of the neural network. The method comprises the following steps: performing global feature extraction on the feature map, acquiring a channel activation value and determining a weight oscillation coefficient; adding a weight oscillation value in the trained neural network to calculate the influence on the accuracy of the neural network; sequencing the weights according to the oscillation influence degree, and setting a quantization scale; quantizing the neural network; retraining the unquantized parameters to obtain the accuracy of the neural network; and (5) setting a quantization scale, repeating the steps (4) and (5) until the lowest accuracy threshold of the neural network is reached, and finishing quantization.

Description

Multi-precision hierarchical quantization method and device based on weight oscillation influence degree
Technical Field
The invention belongs to the technical field of neural network model compression, and particularly relates to a multi-precision hierarchical quantization method based on weight oscillation influence degree and a multi-precision hierarchical quantization device based on weight oscillation influence degree.
Background
The parameters in the neural network are numerous, and a large amount of memory and computing resources are required to be occupied during operation, so that the neural network is difficult to deploy on a mobile device with limited resources.
In order to solve the above problems, the compression technology of the neural network is widely focused, mainly including pruning, low rank approximation, knowledge distillation, quantization, and the like. The pruning technology needs to set a threshold value, and removes the connection of the parameter value between the neurons lower than the threshold value to achieve the pruning purpose, but the pruning aiming at the neuron connection can generate a sparse matrix, and the sparse matrix can only realize the compression acceleration effect by a specific bottom library and hardware. The traditional quantization theory adopts uniform quantization threshold values and bit widths for the whole method, and although the neural network has certain robustness, the precision of the quantized neural network is lost, and the execution efficiency and the accuracy of the network are reduced.
Disclosure of Invention
In order to overcome the defects of the prior art, the technical problem to be solved by the invention is to provide a multi-precision hierarchical quantization method based on the weight oscillation influence degree, which can ensure the accuracy of a neural network, solve the problems of structural redundancy and complex parameters of the trained neural network, reduce the operation amount and memory consumption and realize the on-chip operation of the neural network.
The technical scheme of the invention is as follows: the multi-precision hierarchical quantization method based on the weight oscillation influence degree comprises the following steps:
(1) belongs to R through convolution kernel W ∈ RI×w×h×CThe characteristic diagram of the output after operation is X ∈ RW×H×CWherein I represents the number of channels of the input feature diagram, W and H represent the width and height of the convolution kernel respectively, W and H represent the width and height of the feature diagram respectively, and C is the number of output channels, and global feature extraction, channel activation value acquisition and weight oscillation coefficient phi determination are carried out on the feature diagram;
(2) adding a weight oscillation value in the trained neural network to calculate the influence on the accuracy of the neural network;
(3) the weights are sequenced according to the oscillation influence degree eta, and the quantization scale is set to be p1
(4) Quantizing the neural network;
(5) retraining the unquantized parameters to obtain the accuracy of the neural network;
(6) setting the quantization scale to p2And (5) repeating the steps (4) and (5) until the minimum accuracy threshold of the neural network is reached, and finishing the quantification.
According to the method, a branch quantization strategy is adopted, weights are sequenced and quantized in a layered mode according to the weight oscillation influence degree, and the accuracy of a neural network is guaranteed through a method of retraining unquantized weights; firstly, adding a global average pooling layer, a full-link layer and an activation layer to obtain activation values of a feature map, and representing the weight oscillation coefficients of corresponding filters by the activation values of different channels of the feature map; adding a weight oscillation value in the trained neural network to calculate the influence on the accuracy of the neural network; determining quantization parameters by a fine tuning fast training method, quantizing the weights according to different quantization precisions, and connecting the weights with a loss function through a straight-through estimator to obtain an optimal quantization model; therefore, the accuracy of the neural network can be ensured, the problems of structural redundancy and complex parameters of the neural network after training are solved, the calculation amount and the memory consumption are reduced, and the on-chip operation of the neural network is realized.
There is also provided a multi-precision hierarchical quantization apparatus based on a weighted oscillation influence degree, the apparatus including:
the weight oscillation coefficient acquisition module is configured to extract global features aiming at the feature map, acquire a channel activation value and determine a weight oscillation coefficient phi;
an adding module configured to add a weighted oscillation value to the trained neural network to calculate an influence on the accuracy of the neural network;
a ranking module configured to rank the weights according to an oscillation influence degree eta, setting a quantization scale to p1
A quantization module configured to quantize a neural network;
a training module configured to retrain unquantized parameters to obtain an accuracy of the neural network;
a reset parameter module configured to set the quantization scale to p2Repeatedly performAnd the quantization module and the training module finish quantization until reaching the lowest threshold of the accuracy of the neural network.
Drawings
Fig. 1 is a flowchart of a multi-precision hierarchical quantization method based on the influence of weight oscillation according to the present invention.
FIG. 2 is a diagram illustrating fine tune quantization fast training according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
In order to make the description of the present disclosure more complete and complete, the following description is given for illustrative purposes with respect to the embodiments and examples of the present invention; it is not intended to be the only form in which the embodiments of the invention may be practiced or utilized. The embodiments are intended to cover the features of the various embodiments as well as the method steps and sequences for constructing and operating the embodiments. However, other embodiments may be utilized to achieve the same or equivalent functions and step sequences.
The multi-precision hierarchical quantization method based on the weight oscillation influence degree comprises the following steps:
(1) w belongs to R through convolution kernelI×w×h×CThe characteristic diagram of the output after operation is X ∈ RW×H×CWherein I represents the number of channels of an input feature map, W and H represent the width and height of a convolution kernel respectively, W and H represent the width and height of the feature map respectively, and C is the number of output channels, and global feature extraction, channel activation value acquisition and weight oscillation coefficient phi determination are performed on the feature map;
(2) adding a weight oscillation value in the trained neural network to calculate the influence on the accuracy of the neural network;
(3) the weights are sequenced according to the oscillation influence degree eta, and the quantization scale is set to be p1
(4) Quantizing the neural network;
(5) retraining the unquantized parameters to obtain the accuracy of the neural network;
(6) setting the quantization scale to p2And (5) repeating the steps (4) and (5) until the lowest accuracy threshold of the neural network is reached, and finishing the quantification.
According to the method, a branch quantization strategy is adopted, weights are sequenced and quantized in a layered mode according to the weight oscillation influence degree, and the accuracy of a neural network is guaranteed through a method of retraining unquantized weights; firstly, acquiring an activation value of a feature map by adding a global average pooling layer, a full connection layer and an activation layer, and representing a weight oscillation coefficient of a corresponding filter by using the activation values of different channels of the feature map; adding a weight oscillation value in the trained neural network to calculate the influence on the accuracy of the neural network; determining quantization parameters by a fine tuning fast training method, quantizing the weights according to different quantization precisions, and connecting the weights with a loss function through a straight-through estimator to obtain an optimal quantization model; therefore, the accuracy of the neural network can be ensured, the problems of structural redundancy and complex parameters of the neural network after training are solved, the calculation amount and the memory consumption are reduced, and the on-chip operation of the neural network is realized.
Preferably, the step (1) comprises the following substeps:
(1.1) mapping function F by compressionsq() Performing compression mapping extraction on all channels, taking the extraction result as a description mark of the whole channel, using global averaging to obtain an average value of all characteristic parameters in the channel, and obtaining a characteristic diagram X belonging to R after global averagingW×H×CBecomes Z ∈ R1×1×CBy compressing the mapping function Fsq() Is a formula (1)
Figure BDA0003535500930000051
(1.2) obtaining channel activation value
By an excitation function Fex() Calculating a weight value according to the result of the compression mapping, and recalibrating the weight W by using the feature Z extracted in the step (1), wherein an excitation function is a formula (2):
s=Fex(Z,W)=σ(W2δ(W1Z)) (2)
where the activation function σ is ReLU, δ is sigmoid, weight
Figure BDA0003535500930000052
Obtaining a one-dimensional excitation weight W through training and learning to activate all layers, wherein s is an activation value of a channel;
(1.3) determining the weighted oscillation coefficient phi
According to the corresponding relation of the significance of the channel and the filter, the weighted oscillation coefficient phi in the filter is represented by the activation value s of the channel, the oscillation coefficients of different filters are different, and the calculation formula is as follows:
φc=sc (3)
wherein phi iscFor weighted oscillation coefficients of different filters, scLearned activation values for the different channels.
Preferably, in the step (2), the trained neural network parameters are all determined, and the weights in different channels are determined
Figure BDA0003535500930000053
By using
Figure BDA0003535500930000054
Instead, other weight parameters in the neural network are kept unchanged, the network is reconstructed, and the weight oscillation value is as follows:
Figure BDA0003535500930000055
judging the influence degree of the neural network by adding the change of the accuracy of the neural network before and after the weight oscillation value, inputting the test set which is the same as that in the network training, and obtaining the accuracy of the neural network as
Figure BDA0003535500930000056
The degree of weight oscillation influence
Figure BDA0003535500930000057
Comprises the following steps:
Figure BDA0003535500930000061
wherein the original accuracy of the neural network is I.
Preferably, in the step (3), the weights with low influence degree are quantized with high precision, and the weights with high influence degree are subjected to a low-precision quantization or non-quantization strategy, so that different quantization standards are determined for different weights with accuracy as a target.
Preferably, in the step (4),
let r denote the floating-point real number and q denote the quantized fixed-point integer, the conversion formula is as follows:
Figure BDA0003535500930000062
r=S(Q-Z) (7)
wherein, S represents a scaling coefficient between a floating-point real number and a fixed-point integer, Z represents an integer corresponding to 0 in the floating-point real number after quantization, and a calculation formula of S and Z is as follows:
Figure BDA0003535500930000063
Figure BDA0003535500930000064
when the floating-point real number is quantized into the fixed-point integer, the quantized value needs to be cut off, b represents the integer number of bits to be quantized by the floating-point real number, and is represented by Q:
Figure BDA0003535500930000065
the calculation formula of S and Z is then:
Figure BDA0003535500930000066
Figure BDA0003535500930000067
preferably, the step (4) comprises the following substeps:
(4.1) forward propagation process: simulating and quantizing the process in the forward propagation process, quantizing the weight and the activation value, then inversely quantizing the floating point number with the error, and extracting data characteristics by using the floating point number with the error;
(4.2) a back propagation process: the model utilizes the floating point number of inverse quantization back to act on the data to calculate loss function loss (L), the obtained gradient is the gradient of the weight value after analog quantization, a straight-through estimator (STE) is used for approximating the pseudo quantization of the gradient, the weight value before quantization is updated by the gradient, and the floating point model containing quantization error is obtained;
(4.3) calculating a weight final quantization value wi
(4.4) reference value S from the scaling factor by means of an iterative update1、S2、S3And selecting, performing characteristic measurement on the output characteristics obtained by using the quantized algorithm model after acting on input data and the output characteristics obtained by using the original floating point model, taking the search result with the highest characteristic similarity as a final S value, and superposing the result of the previous layer to the next layer during the characteristic measurement so as to ensure that certain correlation exists between the layers.
Preferably, in said step (4.1),
suppose the convolution kernel W ∈ RI×w×h×CHas a quantization scale of p1The number of weights to be quantized is p1xIxw xhxC; selecting the weight with low influence degree for quantization according to the sequence of the influence degree of the weight oscillation in the neural network in the step (3)
Figure BDA0003535500930000071
Wherein: w is afloatFloating point weight, wiWeighting the quantized fixed point integer;
wiinverse quantization to obtain a floating point value w 'with quantization error'f
w′f=S(wi-Z) (14)
W 'of'fIs a floating point value with quantization error.
Preferably, in said step (4.3),
Figure BDA0003535500930000072
it will be understood by those skilled in the art that all or part of the steps in the method of the above embodiments may be implemented by hardware instructions related to a program, the program may be stored in a computer-readable storage medium, and when executed, the program includes the steps of the method of the above embodiments, and the storage medium may be: ROM/RAM, magnetic disks, optical disks, memory cards, and the like. Therefore, in accordance with the method of the present invention, the present invention also includes a multi-precision hierarchical quantization apparatus based on the influence of weighted oscillation, which is generally expressed in the form of functional blocks corresponding to the steps of the method. The device includes:
the weight oscillation coefficient acquisition module is configured to extract global features aiming at the feature map, acquire a channel activation value and determine a weight oscillation coefficient phi;
an adding module configured to add a weighted oscillation value to the trained neural network to calculate an influence on the accuracy of the neural network;
a ranking module configured to rank the weights according to an oscillation influence magnitude η, setting a quantization scale to p1
A quantization module configured to quantize a neural network;
a training module configured to retrain unquantized parameters to obtain an accuracy of the neural network;
a reset parameter module configured to set the quantization scale to p2And repeatedly executing the quantization module and the training module until the lowest threshold of the accuracy of the neural network is reached, and finishing the quantization.
The invention is described in detail below by way of example with reference to the accompanying drawings.
As shown in fig. 1, a multi-precision hierarchical quantization method based on the influence degree of weight oscillation includes the following steps:
step one, determining a weight oscillation coefficient phi;
suppose W is belonged to R through a certain convolution kernelI×w×h×CThe characteristic diagram of the output after operation is X ∈ RW×H×CWherein I represents the number of channels of the input feature map, W and H represent the width and height of the convolution kernel respectively, W and H represent the width and height of the feature map respectively, and C is the number of output channels;
step (1) of global feature extraction
Compression mapping function Fsq() All channels can be compressed, mapped and extracted, the extraction result is used as a description mark of the whole channel, the average value of all characteristic parameters in the channel is obtained by using global averaging, and the characteristic diagram X belongs to R after global averagingW×H×CBecomes Z ∈ R1×1×CBy compressing the mapping function Fsq() Comprises the following steps:
Figure BDA0003535500930000091
step (2) obtaining channel activation value
Excitation function Fex() The weight values can be calculated according to the result of compression mapping, the degree of the model result to each channel is learned, the channels are independent after being calibrated for better fitting nonlinear characteristics, the calibration between the channels is nonlinear, and a sigmoid activation function is needed to realize the conditions. Recalibrating the weight W using the features Z extracted in step (1), the mathematical expression of the excitation function being:
s=Fex(Z,W)=σ(W2δ(W1Z))
Where the activation function σ is ReLU and δ is sigmoid. Weight of
Figure BDA0003535500930000092
A one-dimensional excitation weight W is obtained through training and learning to activate all layers, and s is an activation value of a channel
Step (3) determining a weight oscillation coefficient phi
According to the corresponding relation of the significance of the channel and the filter, the weighted oscillation coefficient phi in the filter is represented by the activation value s of the channel, the oscillation coefficients of different filters are different, and the calculation formula is as follows:
φc=sc
wherein phi iscFor weighted oscillation coefficients of different filters, scLearned activation values for the different channels.
Determining the weight oscillation influence degree eta;
the trained neural network parameters are all determined, and the weights in different channels are determined
Figure BDA0003535500930000093
By using
Figure BDA0003535500930000094
Instead, other weight parameters in the neural network are kept unchanged, the network is reconstructed, and the weight oscillation value is as follows:
Figure BDA0003535500930000095
as can be seen from the formula,
Figure BDA0003535500930000096
both by the weight itself and by the filter in which it is located.
Judging the influence degree and the output of the neural network by the change of the accuracy of the neural network before and after adding the weight oscillation valueEntering the same test set as the training network to obtain the neural network with the accuracy of
Figure BDA0003535500930000097
The degree of weight oscillation influence
Figure BDA0003535500930000101
Comprises the following steps:
Figure BDA0003535500930000102
wherein the original accuracy of the neural network is I.
Thirdly, sorting the weights according to the oscillation influence degree eta, and setting the quantization scale as p1
Aiming at the problem that the execution precision and efficiency of the algorithm are reduced due to the fact that the traditional quantization theory adopts a uniform quantization threshold value and bit width for the whole algorithm, a layered quantization strategy is provided, the weight with low influence degree is quantized with high precision, the weight with high influence degree is quantized with low precision or is not quantized, and different quantization standards are determined for different weights by taking accuracy as a target.
Step four, quantifying the neural network;
let r denote the floating-point real number and q denote the quantized fixed-point integer, the conversion formula is as follows:
Figure BDA0003535500930000103
r=S(Q-Z)
wherein, S represents a scaling coefficient between a floating-point real number and a fixed-point integer, Z represents an integer corresponding to 0 in the floating-point real number after quantization, and a calculation formula of S and Z is as follows:
Figure BDA0003535500930000104
Figure BDA0003535500930000105
when the floating-point real number is quantized into the fixed-point integer, the quantized value needs to be cut off, b represents the integer bit number to be quantized by the floating-point real number, and is represented by Q:
Figure BDA0003535500930000106
the calculation formula of S and Z can be expressed as:
Figure BDA0003535500930000107
Figure BDA0003535500930000108
the parameter values of the model are approximately in Gaussian distribution, high-frequency values with small contribution degree exist at the edges of the numerical values, irrelevant high-frequency details are removed, and the quantization precision can be improved to a certain extent. Therefore, combining the above formula analysis, obtaining an appropriate S value can reduce the loss caused by quantization, and further analyzing to determine an appropriate r in the floating-point real numbermax、rminAnd the quantization bit width b is a key to guarantee quantization performance.
The fine tuning quantization fast training method is shown in fig. 2, and comprises the following specific steps:
step (1), forward propagation process
The process of analog quantization is carried out in the forward propagation process, the weight value and the activation value are quantized and then inversely quantized to return an error floating point number, and data features are extracted by using the error floating point number.
Suppose the convolution kernel W ∈ RI×w×h×CHas a quantization scale of p1The number of weights to be quantized is p1xIxw xhxC; and selecting the weight with low influence degree for quantization according to the sequence of the influence degree of the weight oscillation in the neural network in the third step.
Figure BDA0003535500930000111
Wherein: w is afloatFloating point weight, wiWeighting the quantized fixed point integer;
wiinverse quantization to obtain a floating point value w 'with quantization error'f
w′f=S(wi-Z)
W 'of'fIs a floating point value with quantization error.
Step (2), back propagation process
In the quantization process, due to operations such as truncation and rounding, quantization errors are generated, and the loss function loss (L) is calculated on data by the model through the action of floating point numbers subjected to inverse quantization.
The gradient obtained is the gradient of the weight after the analog quantization, and since the quantized function is a piecewise function, there are cases where the gradient is absent and the gradient is zero, the pseudo-quantization of the gradient is approximated by a pass-through estimator (STE). And updating the weight value before quantization, namely the parameters of the original floating point model by using the gradient to finally obtain the floating point model containing the quantization error.
Step (3) calculating the final weight quantization value
Figure BDA0003535500930000121
Step (4) determination of scaling factor S
By means of iterative updating, from the scaling factor reference value S1、S2、S3And selecting, performing characteristic measurement on the output characteristics obtained by using the quantized algorithm model after acting on input data and the output characteristics obtained by using the original floating point model, taking the search result with the highest characteristic similarity as a final S value, and superposing the results of the previous layer to the next layer during the characteristic measurement, so that certain correlation between the layers is ensured.
Step five, retraining the unquantized parameters to obtain the accuracy of the neural network
Step six, setting the quantization scale as p2And repeating the fourth step and the fifth step until the lowest accuracy threshold of the neural network is reached, and finishing the quantification.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent variations and modifications made to the above embodiment according to the technical spirit of the present invention still belong to the protection scope of the technical solution of the present invention.

Claims (9)

1. The multi-precision hierarchical quantization method based on the weight oscillation influence degree is characterized by comprising the following steps: which comprises the following steps:
(1) belongs to R through convolution kernel W ∈ RI×w×h×CThe characteristic diagram of the output after operation is X ∈ RW×H×CWherein I represents the number of channels of the input feature diagram, W and H represent the width and height of the convolution kernel respectively, W and H represent the width and height of the feature diagram respectively, and C is the number of output channels, and global feature extraction, channel activation value acquisition and weight oscillation coefficient phi determination are carried out on the feature diagram;
(2) adding a weight oscillation value in the trained neural network to calculate the influence on the accuracy of the neural network;
(3) the weights are sequenced according to the oscillation influence degree eta, and the quantization scale is set to be p1
(4) Quantizing the neural network;
(5) retraining the unquantized parameters to obtain the accuracy of the neural network;
(6) setting the quantization scale to p2And (5) repeating the steps (4) and (5) until the minimum accuracy threshold of the neural network is reached, and finishing the quantification.
2. The method of claim 1, wherein the method comprises: the step (1) comprises the following sub-steps:
(1.1) mapping function F by compressionsq() All channels are compressed, mapped and extracted, and the extraction result is used as the description of the whole channelMarking, using global average to obtain average value of all characteristic parameters in channel, after global average making characteristic diagram X be element RW×H×CBecomes Z ∈ R1×1×CBy compressing the mapping function Fsq() Is a formula (1)
Figure FDA0003535500920000011
(1.2) obtaining channel activation value
By means of an excitation function Fex() Calculating a weight value according to the result of the compression mapping, and recalibrating the weight W by using the feature Z extracted in the step (1), wherein an excitation function is a formula (2):
s=Fex(Z,W)=σ(W2δ(W1Z)) (2)
where the activation function σ is ReLU, δ is sigmoid, weight
Figure FDA0003535500920000021
Obtaining a one-dimensional excitation weight W through training and learning to activate all layers, wherein s is an activation value of a channel;
(1.3) determining the weighted oscillation coefficient phi
According to the corresponding relation of the significance of the channel and the filter, the weighted oscillation coefficient phi in the filter is represented by the activation value s of the channel, the oscillation coefficients of different filters are different, and the calculation formula is as follows:
φc=sc (3)
wherein phicFor weighted oscillation coefficients of different filters, scLearned activation values for different channels.
3. The method according to claim 2, wherein the method comprises: in the step (2), the trained neural network parameters are all determined, and the weights in different channels are used
Figure FDA0003535500920000022
By using
Figure FDA0003535500920000023
Instead, other weight parameters in the neural network are kept unchanged, the network is reconstructed, and the weight oscillation value is as follows:
Figure FDA0003535500920000024
judging the influence degree of the neural network accuracy by adding the change of the neural network accuracy before and after the weight oscillation value, inputting the test set which is the same as that of the training network, and obtaining the neural network accuracy of
Figure FDA0003535500920000025
The degree of weight oscillation influence
Figure FDA0003535500920000026
Comprises the following steps:
Figure FDA0003535500920000027
wherein the original accuracy of the neural network is I.
4. The method of claim 3, wherein the method comprises: in the step (3), the weight with low influence degree is quantized with high precision, the weight with high influence degree adopts a low-precision quantization or non-quantization strategy, and different quantization standards are determined for different weights by taking accuracy as a target.
5. The method of claim 4, wherein the method comprises: in the step (4), the step (C) is carried out,
let r denote the floating-point real number and q denote the quantized fixed-point integer, the conversion formula is as follows:
Figure FDA0003535500920000031
r=S(Q-Z) (7)
wherein, S represents a scaling coefficient between a floating-point real number and a fixed-point integer, Z represents an integer corresponding to 0 in the floating-point real number after quantization, and a calculation formula of S and Z is as follows:
Figure FDA0003535500920000032
Figure FDA0003535500920000033
when the floating-point real number is quantized into the fixed-point integer, the quantized value needs to be cut off, b represents the integer number of bits to be quantized by the floating-point real number, and is represented by Q:
Figure FDA0003535500920000034
the calculation formula of S and Z is then:
Figure FDA0003535500920000035
Figure FDA0003535500920000036
6. the method of claim 5, wherein the method comprises: the step (4) comprises the following sub-steps:
(4.1) forward propagation process: simulating and quantizing the process in the forward propagation process, quantizing the weight and the activation value, then inversely quantizing the floating point number with the error, and extracting data characteristics by using the floating point number with the error;
(4.2) a back propagation process: the model utilizes the floating point number of inverse quantization back to act on the data to calculate loss function loss (L), the obtained gradient is the gradient of the weight value after analog quantization, a straight-through estimator (STE) is used for approximating the pseudo quantization of the gradient, the weight value before quantization is updated by the gradient, and the floating point model containing quantization error is obtained;
(4.3) calculating a weight final quantization value wi
(4.4) reference value S from the scaling factor by means of an iterative update1、S2、S3And selecting, performing characteristic measurement on the output characteristics obtained by using the quantized algorithm model after acting on input data and the output characteristics obtained by using the original floating point model, taking the search result with the highest characteristic similarity as a final S value, and superposing the result of the previous layer to the next layer during the characteristic measurement so as to ensure that certain correlation exists between the layers.
7. The method of claim 6, wherein: in the step (4.1), the step (C),
suppose the convolution kernel W ∈ RI×w×h×CHas a quantization scale of p1The number of weights to be quantized is p1xIxw xhxC; selecting the weight with low influence degree for quantization according to the sequence of the influence degree of the weight oscillation in the neural network in the step (3)
Figure FDA0003535500920000041
Wherein: w is afloatFloating point weight, wiWeighting the quantized fixed point integer;
wiinverse quantization to obtain a floating point value w 'with quantization error'f
w′f=S(wi-Z) (14)
W 'of'fFloating point values with quantization errors.
8. The method of claim 7, wherein the method comprises: in the step (4.3), the step of,
Figure FDA0003535500920000042
9. the multi-precision layered quantization device based on the weight oscillation influence degree is characterized in that: the device includes:
the weight oscillation coefficient acquisition module is configured to perform global feature extraction on the feature map, acquire a channel activation value and determine a weight oscillation coefficient phi;
an adding module configured to add a weighted oscillation value to the trained neural network to calculate an influence on the accuracy of the neural network;
a ranking module configured to rank the weights according to an oscillation influence degree eta, setting a quantization scale to p1
A quantization module configured to quantize a neural network;
a training module configured to retrain unquantized parameters to obtain an accuracy of the neural network;
a reset parameter module configured to set the quantization scale to p2And repeatedly executing the quantization module and the training module until the lowest threshold of the accuracy of the neural network is reached, and finishing the quantization.
CN202210217282.5A 2022-03-07 2022-03-07 Multi-precision hierarchical quantization method and device based on weight oscillation influence degree Pending CN114611665A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210217282.5A CN114611665A (en) 2022-03-07 2022-03-07 Multi-precision hierarchical quantization method and device based on weight oscillation influence degree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210217282.5A CN114611665A (en) 2022-03-07 2022-03-07 Multi-precision hierarchical quantization method and device based on weight oscillation influence degree

Publications (1)

Publication Number Publication Date
CN114611665A true CN114611665A (en) 2022-06-10

Family

ID=81860663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210217282.5A Pending CN114611665A (en) 2022-03-07 2022-03-07 Multi-precision hierarchical quantization method and device based on weight oscillation influence degree

Country Status (1)

Country Link
CN (1) CN114611665A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115238873A (en) * 2022-09-22 2022-10-25 深圳市友杰智新科技有限公司 Neural network model deployment method and device, and computer equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115238873A (en) * 2022-09-22 2022-10-25 深圳市友杰智新科技有限公司 Neural network model deployment method and device, and computer equipment
CN115238873B (en) * 2022-09-22 2023-04-07 深圳市友杰智新科技有限公司 Neural network model deployment method and device, and computer equipment

Similar Documents

Publication Publication Date Title
CN107688850B (en) Deep neural network compression method
CN107239825B (en) Deep neural network compression method considering load balance
CN111860982A (en) Wind power plant short-term wind power prediction method based on VMD-FCM-GRU
CN112016674A (en) Knowledge distillation-based convolutional neural network quantification method
CN114037844A (en) Global rank perception neural network model compression method based on filter characteristic diagram
CN111723915B (en) Target detection method based on deep convolutional neural network
CN112766484A (en) Floating point neural network model quantization system and method
CN110929798A (en) Image classification method and medium based on structure optimization sparse convolution neural network
CN112733997A (en) Hydrological time series prediction optimization method based on WOA-LSTM-MC
CN114677548A (en) Neural network image classification system and method based on resistive random access memory
CN107292855B (en) Image denoising method combining self-adaptive non-local sample and low rank
CN114611665A (en) Multi-precision hierarchical quantization method and device based on weight oscillation influence degree
CN114971675A (en) Second-hand car price evaluation method based on deep FM model
CN112613604A (en) Neural network quantification method and device
CN112686384A (en) Bit-width-adaptive neural network quantization method and device
CN116956997A (en) LSTM model quantization retraining method, system and equipment for time sequence data processing
CN112308213A (en) Convolutional neural network compression method based on global feature relationship
CN115170902B (en) Training method of image processing model
CN115392441A (en) Method, apparatus, device and medium for on-chip adaptation of quantized neural network model
Chin et al. A high-performance adaptive quantization approach for edge CNN applications
CN113554097B (en) Model quantization method and device, electronic equipment and storage medium
CN114139678A (en) Convolutional neural network quantization method and device, electronic equipment and storage medium
CN113095328A (en) Self-training-based semantic segmentation method guided by Gini index
CN116992944B (en) Image processing method and device based on leavable importance judging standard pruning
Khoram et al. TOCO: A framework for compressing neural network models based on tolerance analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination