CN116524173A - Deep learning network model optimization method based on parameter quantization - Google Patents

Deep learning network model optimization method based on parameter quantization Download PDF

Info

Publication number
CN116524173A
CN116524173A CN202310162619.1A CN202310162619A CN116524173A CN 116524173 A CN116524173 A CN 116524173A CN 202310162619 A CN202310162619 A CN 202310162619A CN 116524173 A CN116524173 A CN 116524173A
Authority
CN
China
Prior art keywords
quantization
layer
network model
parameter
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310162619.1A
Other languages
Chinese (zh)
Inventor
钮赛赛
邵艳明
蔡彬
史庆杰
张晗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Aerospace Control Technology Institute
Original Assignee
Shanghai Aerospace Control Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Aerospace Control Technology Institute filed Critical Shanghai Aerospace Control Technology Institute
Priority to CN202310162619.1A priority Critical patent/CN116524173A/en
Publication of CN116524173A publication Critical patent/CN116524173A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

A deep learning network model optimization method based on parameter quantization aims at the limitation of hardware resources such as memory, power consumption and the like faced when an intelligent information processing platform based on deep learning is built on a missile-borne platform, real-time requirements and algorithm light-weight requirements in an infrared image target detection and identification scene are met, a light-weight network model based on YOLOv3-tiny is combined with a low-bit quantization and channel-level quantization method, and the quantization method of weight parameters is realized step by step in the model retraining process.

Description

Deep learning network model optimization method based on parameter quantization
Technical Field
The invention relates to a deep learning network model optimization method based on parameter quantification, and belongs to the technical field of computer vision.
Background
The current infrared target detection method based on deep learning mostly achieves the recognition effect of high accuracy by establishing a high-performance network model. However, in the missile-borne platform environment, the embedded hardware platform with limited resources such as space, power consumption and the like is difficult to adapt to the complex calculation power requirement and redundant storage requirement of the deep neural network, so that the software and hardware platform system suitable for deep learning calculation of the missile-borne platform is researched from the aspects of low-power consumption, miniaturized intelligent hardware, low-complexity deep learning network model optimization and the like. When the solidified intelligent hardware platform is given, the method starts from the aspect of researching low-complexity deep learning network model optimization, can effectively save the storage space of an intelligent processor, reduce the complexity scale of the processor, improve the operation efficiency of the processor and reduce the power consumption required by the processor.
The low number of bits in the computer occupies less memory than the higher number of floating point numbers. Quantization this model compression method uses low precision numbers instead of high precision floating point representations on convolutional neural network parameters. For example, an 8-bit integer is used to replace a 32-bit single-precision floating point number of an original convolutional neural network, so that the storage space occupied by the network model is reduced to be one fourth of the original one. This low-precision representation reduces to some extent the representation redundancy present in the network, i.e. the network characteristics can be well expressed with quantized parameters without requiring too high precision. However, in some cases, the quantized parameter expression accuracy is difficult to meet the expression accuracy required by the target task, which may result in a decrease in network accuracy. The task of network quantization is to express the network with as few bits as possible with as little loss of network accuracy as possible, and to achieve a balance between quantization bit number and accuracy loss.
In reference [ CN 114170512A ], aiming at the defects of high complexity and low reasoning speed of the existing remote sensing SAR target detection method, a divided training set and test set are acquired from a disclosed remote sensing SAR target detection data set to expand and data enhance; the existing lightweight network is adjusted, pruning and mixed precision quantification are carried out, a final remote sensing SAR target detection model based on combination of network pruning and parameter quantification is obtained, and training cost is saved while detection precision is improved.
In reference [ CN 111767993A ], a method, a system, a device and a storage medium for quantizing a convolutional neural network INT8 are provided, and the whole model pure integer operation is realized by performing offline nonlinear INT8 quantization on model convolutional layer parameters, input and output, and meanwhile, the quantization precision is improved.
In reference [ Li Gushi ] deep neural network compression and acceleration research [ D ]. Beijing university of post, 2020], aiming at serious network oscillation caused by One-time quantization of floating point parameters and activation values to low-bit values (One-Shot quantization), the problem that quantized network training is difficult to converge and has low precision, an incremental quantization algorithm in the dimension of an output channel is provided, and network fluctuation in the quantization process is reduced by iteratively quantizing the network parameters and the activation values. In each quantization iteration, only a part of network parameters and network activation values corresponding to the output channels are selected and quantized according to rules. And to further mitigate fluctuations in the network, the quantized weights and activation values should be disjoint in the output channel dimension in each quantization iteration.
Aiming at the problems that the existing network model optimization method mostly adopts a single branch reduction or quantization compression mode and the application lack of the existing network quantization method in light weight of the infrared target recognition network, the invention aims at the requirements and characteristics of infrared target recognition, combines INT8 quantization, channel quantization and other modes, and stepwise realizes the low-complexity optimization method of the infrared target recognition network model based on the YOLOv3-tiny network.
Disclosure of Invention
The invention solves the technical problems that: aiming at the limitations of hardware resources such as memory, power consumption and the like faced when an intelligent information processing platform based on deep learning is built on a missile-borne platform in the prior art, the optimization method of the deep learning network model based on parameter quantification is provided.
The invention solves the technical problems by the following technical proposal:
a deep learning network model optimization method based on parameter quantization comprises the following steps:
constructing a lightweight network model based on YOLOv3-tiny and training to obtain preliminary floating point type network weight parameters; the lightweight network model comprises a convolution layer, a batch normalization layer, an activation function, a maximum pooling layer, an upsampling layer and a routing layer,
carrying out channel level quantification on the designed lightweight network model;
and retraining the obtained preliminary network, and realizing quantization of the network weight in a stepwise quantization mode in the retraining process.
In the lightweight network model, the convolution layer is used for extracting high-dimensional features from an input image, and specifically comprises the following steps:
wherein w is n Represents the weight of the nth layer, x n-1 Representing the input eigenvalue of the nth layer, o n Represents the output of the nth layer, K is the width of the convolution kernel, C n The number of channels for outputting the feature value for the nth layer.
The batch normalization layer comprises the following specific steps:
wherein x and y respectively represent input and output of the batch normalization layer, mu (i) 、σ (i) Mean and variance of feature map in ith channel in a batch, gamma (i) And beta (i) Is a learnable channel level parameter in the normalization layer, epsilon is used to avoid data overflow.
The activation function employs a ReLU function as the employed activation function between two convolutional layers:
ReLU=max(0,x)。
the maximum pooling layer is used for reducing data dimension, reducing calculated amount, enhancing invariance of image characteristics and increasing receptive field; the up-sampling layer is used for recovering image features to an input dimension to realize target position output, and the routing layer acquires the feature quantity of multi-scale fusion through the output feature values of the two cascade convolution layers.
The channel-level quantization of the designed lightweight network model is specifically as follows:
different quantization intervals are used for carrying out quantization parameter matching on different channels of each layer so as to improve the model precision of the lightweight network model, and the channel level quantization operation of the channel j in each layer i is specifically as follows:
by max ij And min ij Recording the distribution interval of the channel parameters of the layer, and cutting the long tail weight to obtain the quantization range d of the parameters ij Average value of m ij The weight record of channel j in current layer i is w ij Quantization weight wq ij And the recovered weight wr ij According to d ij 、m ij And quantization bit number b;
and traversing the quantization parameters of each channel of each layer to match, and finishing channel-level quantization.
The step-by-step quantization mode adopted in the retraining process is specifically as follows:
model retraining of a preset iteration step number is carried out on the lightweight network model after channel-level quantization processing, and random quantization is carried out on weights of all layers in the retraining process so as to eliminate the dependence of the lightweight network model on fixed characteristic quantity until the lightweight network model converges, wherein:
the model retraining steps are as follows:
in the forward reasoning process, a quantization range is selected through parameter distribution; limiting the parameters exceeding the quantization range within the quantization range, and recording the full-precision weight as an updating basis;
updating the scaling factor and the quantization range d based on the average absolute error of the full-precision parameter and the quantized parameter ij Mean value m ij
In the error back propagation process, according to the loss between the target obtained by forward reasoning of the quantized parameters and the actual target under the loss function, gradually and reversely updating each weight parameter.
Through multi-round forward reasoning and back propagation in multi-step iteration, the effects of step-by-step quantization of weight parameters of the network model and reconvergence of the network reasoning are achieved.
In the model retraining process, each convolution layer is quantized, the quantization sequence of the weight is randomly selected from each convolution layer, so that the step-by-step quantization of the weight is achieved, and the learning capacity of the convolution layer is improved through coexistence of quantization parameters and full-precision parameters.
In the model retraining process, a batch normalization layer is fused to a convolution layer through a progressive fusion strategy, batch normalization parameters are transferred to a front convolution layer, mean and variance updating are kept, the convolution layer learns through the incoming mean and variance, updating of the mean and variance is stopped in the fusion stage to eliminate independent batch normalization parameters, and fusion of the batch normalization layer and the convolution layer is completed to reduce difficulty in deployment of a deep learning network model on hardware.
In the model retraining process, the quantization process of the convolution layer specifically comprises the following steps:
for input feature A in Weight W conv And bias B conv Quantization is carried out to obtainAnd->By passing throughConvolution->Obtaining quantized convolution output M q According to M q And->Quantization range, bias->And M is as follows q Adding after inverse quantization, and obtaining final output characteristic value A by the added result through an activation function out
The parameter quantization is specifically as follows:
the method comprises the steps of adopting a uniform quantization mode to preset a quantization bit width k, wherein the distances between neighboring quantization points are equal:
x q =Q k (x r ,α)
wherein x is r For tensors, for weights or offsets or activation values, α for scaling factors, q for integer tensors involved in the calculation in the integer arithmetic unit, x q For quantized parameters, Q represents a quantization function, clip is a truncation function, round is a rounding function, and return a rounded value of the floating point number, where:
the scaling factor is used for overcoming the long tailing phenomenon existing in the weight distribution of the convolution layer and realizing the quantization correction in the interval range, and specifically comprises the following steps:
in the forward reasoning stage, parameters of the batch normalization layer are fixed, and the method specifically comprises the following steps:
y=ξ (i) o+η (i)
where o is the output of the previous layer of convolution layer, and the quantized convolution operation:
o=α a q a α w q w
wherein alpha is a q a And alpha w q w Respectively representing the quantized activation value and the weight;
the quantized convolution process after the batch normalization layer and the convolution layer are combined is as follows:
after the batch normalization layer is combined with the convolution layer after the previous quantization, the combined output is quantized again, and the specific method is as follows:
wherein alpha is β The channel-level scaling factor tensor of beta is that the initial value is the absolute maximum value on each channel of beta, and the scaling factors alpha of heavy, offset and activation values are floating point numbers, so that full integer operation cannot be realized;
shift-quantize the scaling factor:
the scaling factor of shift quantization can use bit shift left or shift right to carry out floating point operation, and the convolution calculation after quantization is specifically as follows:
the gradient calculation in the weight parameter error back propagation process in the retraining process is specifically as follows:
the step-by-step quantization process of the network weight parameters is completed by the parameter quantization calculation method in the retraining process, so that the optimization of the network model is realized.
Compared with the prior art, the invention has the advantages that:
according to the deep learning network model optimization method based on parameter quantization, the number of network model weight parameters required to be directly stored on an AI processor is greatly reduced, the computational power requirements of an algorithm on the processor are reduced, and the realization of a deep network model of an intelligent missile-borne information processing platform can be completed, so that the storage space and the calculation power consumption of the intelligent algorithm in the processor are effectively saved, and the calculation efficiency of a hardware platform is improved. And the power consumption of the hardware platform is reduced.
Drawings
FIG. 1 is a schematic diagram of an optimization flow of a deep learning network model provided by the invention;
FIG. 2 is a diagram of a YOLOv3-tiny data flow provided by the invention;
FIG. 3 is a step-by-step retraining flowchart provided by the invention;
Detailed Description
A deep learning network model optimization method based on parameter quantization aims at the limitation of hardware resources such as memory and power consumption faced when an intelligent information processing platform based on deep learning is built on a missile-borne platform, and aims at the limitation of hardware resources such as memory and power consumption faced when the intelligent information processing platform based on deep learning is built on the missile-borne platform, a lightweight network model based on YOLOv 3-tini is combined with a channel level quantization method, and the parameter quantization method of the network model is realized in a multi-step quantization mode.
The lightweight network model of YOLOv3-tiny is a simplified version of YOLOv3, has less memory space and computational overhead, and is suitable for deployment on embedded devices. The network model comprises a convolution layer, a batch normalization layer, an activation function, a maximum pooling layer, an up-sampling layer and a routing layer, and the specific flow is as follows:
constructing a lightweight network model based on YOLOv 3-tiny;
carrying out channel level quantification on the lightweight network model;
carrying out retraining quantization on the lightweight network model subjected to channel-level quantization treatment;
and carrying out parameter quantification of the trained lightweight network model.
The convolution layer is used for extracting high-dimensional features from an input image, and specifically comprises the following steps:
wherein w is n Represents the weight of the nth layer, x n-1 Representing the input eigenvalue of the nth layer, o n Represents the output of the nth layer, K is the width of the convolution kernel, C n Outputting the number of channels of the characteristic value for the nth layer;
the batch normalization layer comprises the following concrete steps:
wherein x and y respectively represent input and output of the batch normalization layer, mu (i) 、σ (i) Mean and variance of feature map in ith channel in a batch, gamma (i) And beta (i) Is a learnable channel level parameter in the normalization layer, epsilon is used to avoid data overflow;
the activation function employs a ReLU function as the employed activation function between two convolutional layers:
ReLU=max(0,x);
the maximum pooling layer is used for reducing data dimension, reducing calculated amount, enhancing invariance of image characteristics and increasing receptive field; the up-sampling layer is used for recovering image features to an input dimension to realize target position output, and the routing layer acquires multi-scale fused feature values through output feature values of the two cascaded convolution layers;
the channel level quantization is specifically:
different quantization intervals are used for carrying out quantization parameter matching on different channels of each layer so as to improve the model precision of the lightweight network model, and the channel level quantization operation of the channel j in each layer i is specifically as follows:
by max ij And min ij Recording the distribution interval of the channel parameters of the layer, and cutting the long tail weight to obtain the quantization range d of the parameters ij Average value of m ij The weight record of channel j in current layer i is w ij Quantization weight wq ij And the recovered weight wr ij According to d ij 、m ij And quantization bit number b;
after traversing the quantization parameters of each channel of each layer, finishing channel level quantization;
the retraining quantization is specifically as follows:
model retraining of a preset step number is carried out on the lightweight network model after channel-level quantization processing, and random quantization is carried out on weights of all layers in the retraining process so as to eliminate the dependence of the lightweight network model on fixed characteristic quantity until the lightweight network model converges, wherein:
the model retraining steps are as follows:
selecting a quantization range through parameter distribution; limiting the parameters exceeding the quantization range within the quantization range, and recording the full-precision weight as an updating basis;
updating the scaling factor, d, based on the average absolute error of the full-precision parameter and the quantized parameter ij And m ij
In the error back propagation process, according to the loss between the target obtained by forward reasoning of the quantized parameters and the actual target under the loss function, gradually and reversely updating each weight parameter.
Through multi-round forward reasoning and back propagation in multi-step iteration, the effects of step-by-step quantization of weight parameters of the network model and reconvergence of the network reasoning are achieved.
In the model retraining process, quantizing each convolution layer, randomly selecting the quantization sequence of the weight in each convolution layer to achieve step quantization of the weight, and improving the learning capacity of the convolution layer through coexistence of quantization parameters and full-precision parameters;
the batch normalization layer is fused to the convolution layer through a progressive fusion strategy, batch normalization parameters are transferred to the front convolution layer, mean and variance updating is kept, the convolution layer learns through the incoming mean and variance, in the fusion stage, updating of the mean and variance is stopped to eliminate independent batch normalization parameters, and fusion of the batch normalization layer and the convolution layer is completed to reduce difficulty in deployment of a deep learning network model on hardware;
the quantization process of the convolution layer in the model retraining process is specifically as follows:
for input feature A in Weight W conv And bias B conv Quantization is carried out to obtainAnd->By passing throughConvolution->Obtaining quantized convolution output M q According to M q And->Quantization range, bias->And M is as follows q Adding after inverse quantization, and obtaining final output characteristic value A by the added result through an activation function out
The parameter quantization is specifically as follows:
the method comprises the steps of adopting a uniform quantization mode to preset a quantization bit width k, wherein the distances between neighboring quantization points are equal:
x q =Q k (x r ,α)
wherein x is r For tensors, for weights or offsets or activation values, α for scaling factors, q for integer tensors involved in the calculation in the integer arithmetic unit, x q For quantized parameters, for network training, the scaling factor α, Q represents a quantization function, clip is a truncation function, round is a rounding function, and return a rounded value of a floating point number, where:
the scaling factor is used for overcoming the long tailing phenomenon existing in the weight distribution of the convolution layer and realizing the quantization correction in the interval range, and specifically comprises the following steps:
in the forward reasoning stage, parameters of the batch normalization layer are fixed, and the method specifically comprises the following steps:
y=ξ (i) o+η (i)
where o is the output of the previous layer of convolution layer, and the quantized convolution operation:
o=α a q a α w q w
wherein alpha is a q a And alpha w q w Representing quantized activation values and weights, respectivelyWeighing;
the quantized convolution process after the batch normalization layer and the convolution layer are combined is as follows:
after the batch normalization layer is combined with the convolution layer after the previous quantization, the combined output is quantized again, and the specific method is as follows:
wherein alpha is β The channel-level scaling factor tensor of beta is that the initial value is the absolute maximum value on each channel of beta, and the scaling factors alpha of heavy, offset and activation values are floating point numbers, so that full integer operation cannot be realized;
shift-quantize the scaling factor:
the scaling factor of shift quantization can use bit shift left or shift right to carry out floating point operation, and the convolution calculation after quantization is specifically as follows:
in the retraining process, the gradient calculation in the weight parameter error back propagation process is specifically as follows:
the step-by-step quantization process of the network weight parameters is completed by the parameter quantization calculation method in the retraining process, so that the optimization of the network model is realized.
The following further description of the preferred embodiments is provided in connection with the accompanying drawings of the specification:
in the current embodiment, the overall implementation flow is shown in fig. 1, and the low-complexity optimization method of the deep learning network model based on parameter quantization comprises a lightweight network model based on YOLOv3-tiny, and the parameter quantization method of the network model is realized by combining a channel level quantization method and a multi-step quantization mode.
As shown in fig. 2, the lightweight network model of YOLOv3-tiny is a simplified version of YOLOv3, with less memory space and computational overhead, suitable for deployment on embedded devices. The network model comprises a convolution layer, a batch normalization layer, an activation function, a maximum pooling layer, an up-sampling layer and a routing layer.
Wherein the convolution layer is used to extract high-dimensional features from the input image, the operation of which is as follows:
wherein w is n Represents the weight of the nth layer, x n-1 Representing the input eigenvalue of the nth layer, o n Represents the output of the nth layer, K is the width of the convolution kernel, C n The number of channels for outputting the feature value for the nth layer.
The batch normalization layer was expressed using the following operations:
wherein x and y respectively represent input and output of the batch normalization layer, mu (i) 、σ (i) Mean and variance of feature map in ith channel in a batch, gamma (i) And beta (i) Is two learnable channel level parameters in the normalization layer, epsilon is used to avoid data overflow.
The ReLU function is used as an activation function between two convolution layers:
ReLU=max(0,x)
the routing layer in YOLO acquires features extracted from the first half of the neural network by concatenating two output feature values of the same size from different convolutional layers.
The channel level quantization can use different quantization intervals for different channels of each layer, so that the quantization intervals can be better matched with the distribution of parameters of each channel. The channel-level quantization can better save the difference information among the channels, and is beneficial to improving the model precision. The quantization operation of channel j in convolutional layer i can be described as follows: first by max ij And min ij Recording the distribution interval of the channel parameters of the layer, and then cutting the long tail weight according to the parameter distribution to obtain the quantization range of the parameters as d ij Average value of m ij The weight of channel j in convolution layer i is recorded as w ij Quantization weight wq ij And the recovered weight wr ij According to d ij 、m ij And quantization bit number b.
As shown in FIG. 3, the adopted multi-step quantization mode decomposes one-step quantization in model retraining into multiple steps, so that stability in model training is ensured, and meanwhile, random quantization is carried out on weights during training, so that the robustness of the model is enhanced, the dependence of the model on fixed features is eliminated, and the model is enabled to be converged more.
The training phase is divided into two main steps by the model retraining process adopted. In a first step, the quantization range is selected by means of a parameter distribution. Parameters beyond the quantization range are limited in range, and full-precision weights are recorded as update basis. In the second stepIn d ij And m ij And updating according to the average absolute error of the full-precision parameter and the quantized parameter.
The quantization process of the retraining convolution layers is to randomly select the quantization sequence of the weight value in each convolution layer so as to achieve the step quantization of the weight value. This can significantly reduce the disturbance of the model during training, avoiding model escape from the global minimum. Meanwhile, the coexistence of the quantization parameter and the full-precision parameter can volatilize the learning capacity of the full-precision parameter.
Further, to reduce the difficulty of deploying the model on hardware, batch normalization structures in the network model are fused onto the convolutional layer preceding it. The operation is decomposed into two phases by adopting a progressive fusion strategy. In the learning phase, the batch normalization parameters are transferred to the pre-convolution layer and the mean and variance updates are maintained. The convolution layer learns the distribution of activations by means of the incoming mean and variance. In the fusion phase, updating of the mean and variance is stopped, thereby eliminating the independent batch normalization parameters.
Further, in the process of quantizing the convolution layer, the input feature A is firstly input in Weight W conv And bias B conv Quantization is carried out to obtainAnd->By->Convolution->Obtaining quantized convolution output M q . Due to M q And->The quantization ranges are different, the bias is required>And M is as follows q Adding after inverse quantization, and enabling the added result to pass through an activation function to obtain a final output characteristic value A out
Further, the parameter quantization of the network model adopts a uniform quantization mode, wherein the distances between quantization points of adjacent neighbors are equal. Given a quantization bit width k, the quantization process can be expressed by:
x q =Q k (x r ,α)
wherein x is r The tensor may be a weight, bias or activation value. Alpha is a scaling factor, q is an integer tensor participated in calculation in an integer arithmetic unit, x q Is a quantized parameter used for network training. The scaling factor alpha is critical for low bit quantization. Q represents a quantization function. The clip is a truncated function. round is a round function that returns a rounded value for the floating point number.
Further, in order to solve the difficulty in selecting the scaling factor alpha caused by the long tailing phenomenon existing in the weight distribution in the convolutional neural network, a learner-driven scaling factor alpha is introduced to realize the interval-variable clamping function. The quantization process is modified to be as follows:
in order to be able to update the scaling factor α in neural network training, the gradient of α in the back propagation process is calculated by:
further, in the forward reasoning stage, parameters of the batch normalization layer are fixed, and the following formula is obtained:
y=ξ (i) o+η (i)
where o is the output of the previous layer of convolution layer, the quantized convolution operation can be represented by:
o=α a q a α w q w
wherein alpha is a q a And alpha w q w Representing the quantized activation value and weight, respectively.
Further, the batch normalization layer and the convolution layer are combined, and the combined quantized convolution process is as follows:
the merging is characterized in that after merging the batch normalization layer and the convolution layer after the previous quantization, the merged output is quantized again. The low-bit integer operation is adopted to simplify further convolution calculation, and the operation process is as follows:
wherein alpha is β Channel-level scaling factor tensor being betaIts initial value is the absolute maximum on each channel of beta. At this time, the scaling factors alpha of the weight, the bias and the activation value are floating point numbers, and cannot realize full integer operation.
Further shift-quantizes the scaling factor:
shifting quantized scaling factors may use bit shift left or shift right operations to promote floating point operations, and the quantized convolution calculation is shown as follows:
the final convolution operation only includes an integer multiply-add operation of the weight tensor and the activation value tensor, and a bit shift operation of the scaling factor, without any floating point operations. Wherein two bits are used after the weight quantization, and eight bits are used for both bias and activation values. The quantized convolution operation reduces the required memory use, bandwidth overhead and improves the utilization rate of resources.
Although the present invention has been described in terms of the preferred embodiments, it is not intended to be limited to the embodiments, and any person skilled in the art can make any possible variations and modifications to the technical solution of the present invention by using the methods and technical matters disclosed above without departing from the spirit and scope of the present invention, so any simple modifications, equivalent variations and modifications to the embodiments described above according to the technical matters of the present invention are within the scope of the technical matters of the present invention.
What is not described in detail in the present specification belongs to the known technology of those skilled in the art.

Claims (13)

1. The deep learning network model optimization method based on parameter quantization is characterized by comprising the following steps of:
constructing a lightweight network model based on YOLOv3-tiny and training to obtain preliminary floating point type network weight parameters; the lightweight network model comprises a convolution layer, a batch normalization layer, an activation function, a maximum pooling layer, an upsampling layer and a routing layer,
carrying out channel level quantification on the designed lightweight network model;
and (3) retraining the obtained preliminary network, and realizing quantification of the network weight in a stepwise quantification mode in the retraining process.
2. The deep learning network model optimization method based on parameter quantization according to claim 1, wherein:
in the lightweight network model, the convolution layer is used for extracting high-dimensional features from an input image, and specifically comprises the following steps:
wherein w is n Represents the weight of the nth layer, x n-1 Representing the input eigenvalue of the nth layer, o n Represents the output of the nth layer, K is the width of the convolution kernel, C n The number of channels for outputting the feature value for the nth layer.
3. The deep learning network model optimization method based on parameter quantization according to claim 2, wherein:
the batch normalization layer comprises the following specific steps:
wherein x and y respectively represent input and output of the batch normalization layer, mu (i) 、σ (i) Mean and variance of feature map in ith channel in a batch, gamma (i) And beta (i) Is a learnable channel level parameter in the normalization layer, epsilon is used to avoid data overflow.
4. A method for optimizing a deep learning network model based on parameter quantization according to claim 3, wherein:
the activation function employs a ReLU function as the employed activation function between two convolutional layers:
ReLU=max(0,x)。
5. the deep learning network model optimization method based on parameter quantization according to claim 4, wherein:
the maximum pooling layer is used for reducing data dimension, reducing calculated amount, enhancing invariance of image characteristics and increasing receptive field; the up-sampling layer is used for recovering image features to an input dimension to realize target position output, and the routing layer acquires the feature quantity of multi-scale fusion through the output feature values of the two cascade convolution layers.
6. The deep learning network model optimization method based on parameter quantization according to claim 5, wherein:
the channel-level quantization of the designed lightweight network model is specifically as follows:
different quantization intervals are used for carrying out quantization parameter matching on different channels of each layer so as to improve the model precision of the lightweight network model, and the channel level quantization operation of the channel j in each layer i is specifically as follows:
by max ij And min ij Recording the distribution interval of the channel parameters of the layer, and cutting the long tail weight to obtain the quantization range d of the parameters ij Average value of m ij The weight record of channel j in current layer i is w ij Quantization weight wq ij And the recovered weight wr ij According to d ij 、m ij And quantization bit number b;
and traversing the quantization parameters of each channel of each layer to match, and finishing channel-level quantization.
7. The deep learning network model optimization method based on parameter quantization according to claim 6, wherein:
the step-by-step quantization mode adopted in the retraining process is specifically as follows:
model retraining of a preset iteration step number is carried out on the lightweight network model after channel-level quantization processing, and random quantization is carried out on weights of all layers in the retraining process so as to eliminate the dependence of the lightweight network model on fixed characteristic quantity until the lightweight network model converges, wherein:
the model retraining steps are as follows:
in the forward reasoning process, a quantization range is selected through parameter distribution; limiting the parameters exceeding the quantization range within the quantization range, and recording the full-precision weight as an updating basis;
updating the scaling factor and the quantization range d based on the average absolute error of the full-precision parameter and the quantized parameter ij Mean value m ij
In the error back propagation process, gradually and reversely updating each weight parameter according to the loss between the target obtained by forward reasoning of the quantized parameter and the actual target under the loss function;
through multi-round forward reasoning and back propagation in multi-step iteration, the effects of step-by-step quantization of weight parameters of the network model and reconvergence of the network reasoning are achieved.
8. The method for optimizing the deep learning network model based on parameter quantification according to claim 7, wherein the method comprises the following steps:
in the model retraining process, each convolution layer is quantized, the quantization sequence of the weight is randomly selected from each convolution layer, so that the step-by-step quantization of the weight is achieved, and the learning capacity of the convolution layer is improved through coexistence of quantization parameters and full-precision parameters.
9. The deep learning network model optimization method based on parameter quantization according to claim 8, wherein:
in the model retraining process, a batch normalization layer is fused to a convolution layer through a progressive fusion strategy, batch normalization parameters are transferred to a front convolution layer, mean and variance updating are kept, the convolution layer learns through the incoming mean and variance, updating of the mean and variance is stopped in the fusion stage to eliminate independent batch normalization parameters, and fusion of the batch normalization layer and the convolution layer is completed to reduce difficulty in deployment of a deep learning network model on hardware.
10. The deep learning network model optimization method based on parameter quantization according to claim 8, wherein:
in the model retraining process, the quantization process of the convolution layer specifically comprises the following steps:
for input feature A in Weight W conv And bias B conv Quantization is carried out to obtainAnd->By->Convolution->Obtaining quantized convolution output M q According to M q And->Quantization range, bias->And M is as follows q Adding after inverse quantization, and obtaining final output characteristic value A by the added result through an activation function out
11. The deep learning network model optimization method based on parameter quantization according to claim 10, wherein:
the parameter quantization is specifically as follows:
the method comprises the steps of adopting a uniform quantization mode to preset a quantization bit width k, wherein the distances between neighboring quantization points are equal:
x q =Q k (x r ,α)
wherein x is r For tensors, for weights or offsets or activation values, α for scaling factors, q for integer tensors involved in the calculation in the integer arithmetic unit, x q For quantized parameters, Q represents a quantization function, clip is a truncation function, round is a rounding function, and return a rounded value of the floating point number, where:
the scaling factor is used for overcoming the long tailing phenomenon existing in the weight distribution of the convolution layer and realizing the quantization correction in the interval range, and specifically comprises the following steps:
12. the method for optimizing a deep learning network model based on parameter quantification of claim 11, wherein the method comprises the steps of:
in the forward reasoning stage, parameters of the batch normalization layer are fixed, and the method specifically comprises the following steps:
y=ξ (i) o+η (i)
where o is the output of the previous layer of convolution layer, and the quantized convolution operation:
o=α a q a α w q w
wherein alpha is a q a And alpha w q w Respectively representing the quantized activation value and the weight;
the quantized convolution process after the batch normalization layer and the convolution layer are combined is as follows:
after the batch normalization layer is combined with the convolution layer after the previous quantization, the combined output is quantized again, and the specific method is as follows:
wherein alpha is β The channel-level scaling factor tensor of beta is that the initial value is the absolute maximum value on each channel of beta, and the scaling factors alpha of heavy, offset and activation values are floating point numbers, so that full integer operation cannot be realized;
shift-quantize the scaling factor:
the scaling factor of shift quantization can use bit shift left or shift right to carry out floating point operation, and the convolution calculation after quantization is specifically as follows:
13. the method for optimizing the deep learning network model based on parameter quantification according to claim 7, wherein the method comprises the following steps:
the gradient calculation in the weight parameter error back propagation process in the retraining process is specifically as follows:
the step-by-step quantization process of the network weight parameters is completed by the parameter quantization calculation method in the retraining process, so that the optimization of the network model is realized.
CN202310162619.1A 2023-02-24 2023-02-24 Deep learning network model optimization method based on parameter quantization Pending CN116524173A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310162619.1A CN116524173A (en) 2023-02-24 2023-02-24 Deep learning network model optimization method based on parameter quantization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310162619.1A CN116524173A (en) 2023-02-24 2023-02-24 Deep learning network model optimization method based on parameter quantization

Publications (1)

Publication Number Publication Date
CN116524173A true CN116524173A (en) 2023-08-01

Family

ID=87390996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310162619.1A Pending CN116524173A (en) 2023-02-24 2023-02-24 Deep learning network model optimization method based on parameter quantization

Country Status (1)

Country Link
CN (1) CN116524173A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117392613A (en) * 2023-12-07 2024-01-12 武汉纺织大学 Power operation safety monitoring method based on lightweight network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117392613A (en) * 2023-12-07 2024-01-12 武汉纺织大学 Power operation safety monitoring method based on lightweight network
CN117392613B (en) * 2023-12-07 2024-03-08 武汉纺织大学 Power operation safety monitoring method based on lightweight network

Similar Documents

Publication Publication Date Title
CN111079781B (en) Lightweight convolutional neural network image recognition method based on low rank and sparse decomposition
CN108337000B (en) Automatic method for conversion to lower precision data formats
CN110555450B (en) Face recognition neural network adjusting method and device
CN110555508B (en) Artificial neural network adjusting method and device
CN110288086B (en) Winograd-based configurable convolution array accelerator structure
TW201918939A (en) Method and apparatus for learning low-precision neural network
CN110659725B (en) Neural network model compression and acceleration method, data processing method and device
CN111612147A (en) Quantization method of deep convolutional network
Meng et al. Two-bit networks for deep learning on resource-constrained embedded devices
CN109978135B (en) Quantization-based neural network compression method and system
Roy et al. Pruning filters while training for efficiently optimizing deep learning networks
Nazari et al. Tot-net: An endeavor toward optimizing ternary neural networks
CN111696149A (en) Quantization method for stereo matching algorithm based on CNN
CN112633477A (en) Quantitative neural network acceleration method based on field programmable array
CN116524173A (en) Deep learning network model optimization method based on parameter quantization
CN110874627B (en) Data processing method, data processing device and computer readable medium
KR20190130443A (en) Method and apparatus for quantization of neural network
CN113595993A (en) Vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation
Liu et al. Computation-performance optimization of convolutional neural networks with redundant kernel removal
CN112686384A (en) Bit-width-adaptive neural network quantization method and device
Nazari et al. Multi-level binarized lstm in eeg classification for wearable devices
CN114970853A (en) Cross-range quantization convolutional neural network compression method
CN114943335A (en) Layer-by-layer optimization method of ternary neural network
Qi et al. Learning low resource consumption cnn through pruning and quantization
CN117151178A (en) FPGA-oriented CNN customized network quantification acceleration method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination