WO2023165139A1 - 模型量化方法、装置、设备、存储介质及程序产品 - Google Patents

模型量化方法、装置、设备、存储介质及程序产品 Download PDF

Info

Publication number
WO2023165139A1
WO2023165139A1 PCT/CN2022/125817 CN2022125817W WO2023165139A1 WO 2023165139 A1 WO2023165139 A1 WO 2023165139A1 CN 2022125817 W CN2022125817 W CN 2022125817W WO 2023165139 A1 WO2023165139 A1 WO 2023165139A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
substructure
quantization
model
activation
Prior art date
Application number
PCT/CN2022/125817
Other languages
English (en)
French (fr)
Inventor
魏秀颖
龚睿昊
李雨杭
刘祥龙
余锋伟
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023165139A1 publication Critical patent/WO2023165139A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/02Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to but not limited to the field of information technology, and in particular relates to a model quantification method, device, equipment, storage medium and computer program product.
  • model quantization can quantify the weights and activation values in the neural network from the original floating-point type. to low bit width (such as 8-bit, 4-bit, 3-bit, 2-bit, etc.) integers. After the model is quantized, the storage space required for the quantized neural network model is reduced, and the calculation form is changed from the original floating-point operation to the calculation of lower-cost low-bit wide integer data.
  • the quantized model has the problem of insufficient precision, especially in the case of model quantization with a low bit width (such as 3 bits, 2 bits, etc.), the accuracy of the model will drop significantly , which cannot meet the application requirements.
  • the embodiments of the present disclosure provide a model quantization method, device, equipment, storage medium and computer program product.
  • an embodiment of the present disclosure provides a model quantification method, the method comprising:
  • an embodiment of the present disclosure provides a model quantization device, the device comprising:
  • the first acquisition part is configured to acquire the first output data of at least one first network substructure in the first network model; wherein, each of the first output data is performed on the calibration data set by using the first network model processed;
  • an embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, some or all of the steps in the above method are implemented.
  • each first output data is obtained by using the first network model to process the calibration data set;
  • the latter second network model is determined as the third network model.
  • FIG. 1 is a schematic diagram of the implementation flow of a model quantification method provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of the implementation flow of a model quantification method provided by an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of the implementation flow of a model quantification method provided by an embodiment of the present disclosure
  • FIG. 5B is a schematic diagram of an implementation of adjusting the rounding method adopted for the quantization weight value in the kth block structure provided by an embodiment of the present disclosure
  • FIG. 7 is a schematic diagram of a hardware entity of a computer device provided by an embodiment of the present disclosure.
  • the first network model can be any suitable neural network model to be quantized, and can be a full-precision neural network model.
  • the first network model can be a 32-bit floating-point parameter or a 16-bit floating-point parameter type parameter neural network model, of course, the embodiments of the present disclosure do not limit the floating point number of the first network model.
  • the first network model may adopt any suitable neural network structure, including but not limited to one or more of ResNet-18, ResNet-50, MobileNetV2, EfficientNet-Lite, RegNet, BERT, and the like.
  • the first network model may be realized based on a convolutional neural network or a transformer network (Transformer), which is not limited here.
  • the structure of the neural network model can include stages (stage), block (block), and processing layer (layer) according to different granularities.
  • Each neural network model can include at least one stage, and each stage can include at least one block (block).
  • Each block may include at least one processing layer (layer), wherein the processing layer may be, for example, an input layer, a convolutional layer, a pooling layer, a downsampling layer, a linear correction unit, a fully connected layer, a batch normalization layer, and the like.
  • the first network model may include at least one first network substructure, and the first network substructure may be a single processing layer, or a block structure including at least two processing layers, or a A stage structure that includes at least two block structures.
  • those skilled in the art may determine at least one first network substructure in the first network model with an appropriate granularity according to actual conditions, which is not limited in this embodiment of the present disclosure.
  • the calibration data set may include at least one piece of image data, point cloud data, or voice data, etc., and the embodiment of the present disclosure does not limit the type of data in the calibration data set.
  • the calibration data set may be preset or sampled from a specific data set, which is not limited here.
  • the calibration data set may be determined according to a task to be performed, and the task to be performed may include but not limited to at least one of a classification task, an object detection task, and the like.
  • the calibration data set may be an ImageNet data set or the like.
  • the calibration data set can be a MS COCO data set, etc.
  • output data (referred to as first output data) of at least one first network substructure in the first network model can be obtained.
  • first output data may be predetermined, or may be obtained by processing the calibration data set in real time, which is not limited here.
  • Step S102 using the second network model to process the calibration data set based on the activation quantification identification of at least one second network substructure in the second network model, to obtain the first Two output data; wherein, the second network model is obtained after quantizing the first network model, and the activation quantization identification of each second network substructure indicates whether the second network substructure is Activation values are quantified.
  • the second network model is obtained after quantizing the first network model.
  • any suitable model quantization algorithm may be used to quantify the first network model to obtain the second network model.
  • pre-training may be performed on the first network model, and an appropriate quantization operation may be performed on the pre-trained first network model to obtain the second network model.
  • quantization-aware training may be performed on the first network model, and the obtained trained first network model is the quantized second network model.
  • a large quantization loss may exist in the second network model obtained after quantizing the first network model, which can be further optimized to reduce the quantization loss.
  • the structure of the second network model is the same as that of the first network model, that is, each first network substructure in the first network model has a corresponding first network substructure in the second network model The same second network substructure.
  • the difference is that the parameters in the second network model have been quantized to a preset bit width.
  • the second network model may be a quantization model for quantizing the first network model with a bit width of 1 bit or a bit width of 2 bits, a quantization model for quantizing the first network model with a bit width of 4 bits, or a quantization model for the first network model with a bit width of 8 bits. Quantization model of bit width quantization and so on.
  • the parameters of different second network substructures in the second network model can be quantized according to different bit widths, or can be quantized according to the same bit width; the quantization of weight values in the same second network substructure Quantization with the activation value may use the same bit width or a different bit width, which is not limited in the embodiments of the present disclosure.
  • the bit width used to quantify each first network substructure in the first network model may be based on at least one of the tasks to be performed, the deployment equipment to be deployed with the third network model, etc. One sure.
  • the hardware information such as at least one of storage resource amount, computing resource amount, hardware type, power consumption, etc.
  • the bit width used for quantifying the network substructure can enable the third network model to better meet the requirements for deployment on the deployment device.
  • it may be determined according to the target task requirement of the task to be performed such as at least one of task type, task time consumption, task processing accuracy, task processing speed, etc.
  • the bit width used for quantizing the substructure can make the third network model better meet the requirements of the target task.
  • an activation quantization flag can be used to represent whether the activation value of the second network substructure to quantify.
  • the activation quantization indicator of each second network substructure may be a first indicator indicating that the activation value of the second network substructure is quantized, or a second indicator indicating that the activation value of the second network substructure is not quantized.
  • the activation quantization identifier of each second network substructure can be determined in a random manner. For example, based on the set quantization probability, the activation quantization identifier of each second network substructure may be determined as the first identifier or the second identifier.
  • the activation quantization identifier of each second network substructure may be assigned a value in advance according to a specific quantization setting rule. For example, each second network substructure in the second network model can be sorted, the activation quantization identifier of the second network substructure with an odd number is determined as the first identifier, and the second network substructure with an even number is determined as the first identifier. The activation quantization identifier of is determined as the second identifier.
  • output data (referred to as second output data) of at least one second network substructure in the second network model can be obtained.
  • Step S103 for each first network substructure, based on the first output data of the first network substructure and the second network substructure corresponding to the first network substructure in the second network model
  • the second output data of the structure adjusts the parameters of the second network substructure.
  • each first network substructure in the first network model based on the output data of the first network substructure and the second network substructure corresponding to the first network substructure in the second network model 2.
  • Outputting data performing at least one adjustment to the parameters of the second network substructure.
  • the adjustable parameters in the second network substructure may include but not limited to the quantization parameters of each weight value in the second network substructure (such as quantization step size, preset precision of quantization scale, quantization symmetry, quantization bit width and quantization granularity etc.), the quantization parameters of each activation value, the rounding method used to quantify the weight values in the second network substructure (such as rounding up or rounding down, etc.), and the weight values in the second network substructure At least one of quantization functions for quantizing weight values, etc.
  • the quantization parameters of each weight value in the second network substructure such as quantization step size, preset precision of quantization scale, quantization symmetry, quantization bit width and quantization granularity etc.
  • the quantization parameters of each activation value such as quantization step size, preset precision of quantization scale, quantization symmetry, quantization bit width and quantization granularity etc.
  • the quantization parameters of each activation value such as quantization step size, preset precision of quantization scale, quantization symmetry, quant
  • each second network substructure may be adjusted by using an appropriate parameter optimization algorithm according to the actual situation, such as gradient descent method, simulated annealing method, etc., which is not limited in the embodiments of the present disclosure.
  • Step S104 if it is determined that the preset condition is satisfied, the adjusted second network model is determined as the third network model.
  • the preset conditions may include, but are not limited to, that the number of times the parameters of each second network substructure are adjusted reaches a set threshold, and the loss value between each first output data and the corresponding second output data is less than the set At least one of the set first loss threshold, the total loss value between each first output data and the corresponding second output data is smaller than the set second loss threshold, and the like.
  • the parameters of each second network substructure in the second network model can be adjusted at least once, and the adjustment to the parameters of each second network substructure in the second network model is determined to meet the preset condition In the case of , the adjusted second network model is determined as the third network model.
  • the adjusted second network model can be used for the calibration data set based on the activation quantification identification of at least one second network substructure in the second network model Perform processing to obtain new second output data of each second network substructure; and for each first network substructure, based on the first output data of the first network substructure, and the second network model and The new second output data of the second network substructure corresponding to the first network substructure adjusts the parameters of the second network substructure again.
  • each first output data is obtained by using the first network model to process the calibration data set;
  • the latter second network model is determined as the third network model.
  • each of the second network substructures is a processing layer.
  • the parameters in the second network model can be adjusted according to the granularity of the processing layer, so that the quantization band for reducing activation values can be learned according to the granularity of the processing layer.
  • the accuracy of parameter adjustment in the second network model can be improved, so that the accuracy of the quantized model can be further improved.
  • the determination in the above step S104 that the preset condition is satisfied may include at least one of the following steps S111 and S112:
  • Step S111 when it is determined that the loss value of each second network substructure satisfies a preset loss constraint based on each of the first output data and each of the second output data, it is determined that the preset conditions.
  • the loss value of each second network substructure may be the loss between the second output data of the second network substructure and the first output data of the corresponding first network substructure in the adjusted second network model value.
  • the loss constraint may be a preset target for adjusting the second network model, and may include a constraint on the loss value of each second network substructure in the adjusted second network model.
  • the loss constraint may include but not limited to that the loss value of each second network substructure is less than the set first loss threshold, and the sum of the loss values of each second network substructure in the second network model is less than the set first threshold. At least one of two loss thresholds and the like.
  • Step S112 in a case where the number of adjustments to each of the second network substructures satisfies a preset number of times constraint, determine that the preset condition is met.
  • the number of times constraint may be a preset goal of adjusting the second network model, and may include a constraint on the number of times of adjusting each second network substructure.
  • the number of times constraints may include but not limited to that the number of times of adjustments to each second network substructure reaches the set first number threshold, and the maximum number of times of adjustments to each second network substructure reaches the set At least one of the second threshold value of the number of times, the average value of the number of times of adjusting each second network substructure reaches the set third threshold value of times, and the like.
  • the above method may also include:
  • Step S121 quantify each first network substructure in the first network model according to at least one bit width to obtain a second network model; wherein, the second network model includes At least one second network substructure of the network substructure, each second network substructure includes one of the following: stage structure, block structure, processing layer.
  • the at least one bit width may include but not limited to at least one of 1 bit width, 2 bit width, 3 bit width, 4 bit width, 8 bit width and the like.
  • the second network substructure in the second network model quantized with 1-bit width, 2-bit width, 3-bit width, 4-bit width and/or 8-bit width can be supported
  • the parameters are adjusted so that the quantized model can achieve higher accuracy, which can meet the model quantification requirements of various tasks and deployment devices, and improve the universality of model quantization applications.
  • it can be supported to adjust the parameters of the second network substructure in the second network model quantized with extremely low bit widths such as 1-bit, 2-bit, and 3-bit, so that the extremely low-bit
  • the accuracy of the wide-quantized model is effectively improved, so that the accuracy of the model can be improved while reducing the storage resources and computing resources required for deploying the quantized model.
  • the above method may also include the following steps S131 to S133:
  • Step S131 sampling at least one candidate sample from the set candidate data set.
  • the candidate data set may be preset, including at least one sample used to calibrate the second network model.
  • any suitable sampling manner may be used to sample at least one candidate sample from the candidate data set, which is not limited in this embodiment of the present disclosure.
  • a set number of candidate samples may be randomly sampled from the candidate data set, at least one candidate sample may be uniformly sampled from the candidate data set, and at least one candidate sample may be screened out from the candidate data set according to set filtering conditions.
  • Step S132 performing data enhancement processing on each of the candidate samples to obtain at least one target sample.
  • any suitable data enhancement processing may be performed on the candidate sample to obtain the corresponding target sample.
  • the data enhancement processing may include but not limited to at least one of random flipping, random cropping, adding perturbation, and the like.
  • the same data enhancement processing may be applied to different candidate samples, or different data enhancement processing may be applied to different candidate samples, which is not limited in this embodiment of the present disclosure.
  • Step S133 Obtain the calibration data set based on the at least one target sample.
  • some or all of the obtained target samples can be added to the calibration dataset.
  • At least one target sample is obtained by sampling at least one candidate sample from the set candidate data set, and performing data enhancement processing on each candidate sample, and a calibration data set is obtained based on the at least one target sample.
  • the diversity of samples in the calibration data set can be increased, thereby further enhancing the ability of the second network model to learn to reduce the influence of quantization of activation values , which can further improve the flatness of the model under quantitative disturbance.
  • An embodiment of the present disclosure provides a model quantization method, which can be executed by a processor of a computer device. As shown in Figure 2, the method includes the following steps S201 to S205:
  • Step S201 acquiring first output data of at least one first network substructure in the first network model; wherein each of the first output data is obtained by processing a calibration data set using the first network model.
  • the above-mentioned step S201 corresponds to the above-mentioned step S101, and the specific implementation manner of the above-mentioned step S101 can be referred to for implementation.
  • Step S202 determining the calibration data set as input data of the first second network substructure in the second network model.
  • the calibration data set can be input into the second network model, and the calibration data set is the input data of the first second network substructure in the second network model.
  • Step S203 for each second network substructure in the second network model, using the second network substructure, based on the weight value and activation quantization identifier of the second network substructure, to the second network substructure
  • the input data of the network substructure is processed to obtain the second output data of the second network substructure, and the second output data is used as the input data of the next second network substructure.
  • the second network model is obtained after quantizing the first network model, and the activation quantization identifier of each second network substructure indicates whether to quantify the activation value of the second network substructure .
  • the second output data of each second network substructure in the second network model can be used as the input data of the next second network substructure.
  • the input data can be processed to obtain the second network substructure based on the weight value and activation quantization identification of the second network substructure.
  • Second output data By inputting the input data of each second network substructure into the second network substructure, the input data can be processed to obtain the second network substructure based on the weight value and activation quantization identification of the second network substructure. Second output data.
  • each second network substructure includes at least one processing layer, and the input data of each second network substructure can be used as the input data of the first processing layer in the second network substructure, and The output data of each processing layer in the second network substructure can be used as the input data of the next processing layer, and the output data of the last processing layer in the second network substructure is the second output of the second network substructure data.
  • the input data of the processing layer can be processed based on the weight value of the processing layer to obtain the activation value of the processing layer, and the activation quantization identification in the second network substructure
  • the activation value of the processing layer can be quantized based on the set quantization parameter, and the quantized activation value is determined as the The output data of the processing layer; in the case where the activation quantization identification of the second network substructure is a second identification that does not quantify the activation value of the second network substructure, the activation value can be determined as the activation value of the processing layer Output Data.
  • Step S204 for each first network substructure, based on the first output data of the first network substructure and the second network substructure corresponding to the first network substructure in the second network model
  • the second output data of the structure adjusts the parameters of the second network substructure.
  • Step S205 if it is determined that the preset condition is satisfied, the adjusted second network model is determined as the third network model.
  • the above-mentioned steps S204 to S205 correspond to the above-mentioned steps S103 to S104 respectively, and the specific implementation manners of the above-mentioned steps S103 to S104 can be referred to for implementation.
  • the calibration data set is determined as the input data of the first second network substructure in the second network model, and for each second network substructure in the second network model, the second The network substructure processes the input data of the second network substructure based on the weight value and activation quantization identifier of the second network substructure to obtain second output data of the second network substructure, and converts the second The output data is used as the input data of the next second network substructure. In this way, the second output data of each second network substructure can be obtained quickly and accurately.
  • the input data of the second network substructure is processed based on the weight value and the activation quantization identifier of the second network substructure in step S203 to obtain the second network substructure
  • the second output data of the structure may include the following steps S211 to S212:
  • Step S211 based on the rounding method of the weight value in the second network substructure and the first quantization parameter, quantize the weight value of the second network substructure to obtain a quantized weight value.
  • the rounding manner of the weight value may include, but not limited to, rounding up, rounding down, or rounding.
  • the first quantization parameter may be a quantization parameter used to quantize the weight value in the second network substructure, including but not limited to quantization step size, preset precision of quantization scale, quantization symmetry, quantization bit width and quantization granularity, etc. at least one of the .
  • the second network model can be used to process the calibration data set in advance to obtain the output data of the second network model, and the weights used in the second network substructure can be obtained by performing statistics on the output data The value is quantized with the first quantization parameter.
  • the first quantization parameter for quantizing the weight values in the second network substructure may be an adjustable parameter of the second network substructure, and may be adjusted during the process of adjusting the parameters of the second network substructure The second quantization parameter is learned.
  • Step S212 based on the quantized weight value and the activation quantization identifier of the second network substructure, process the input data of the second network substructure to obtain the second output of the second network substructure data.
  • the input data of the processing layer can be processed based on the quantized weight value of the processing layer to obtain the activation value of the processing layer, in which
  • the activation quantization flag of the second network substructure is the first flag representing the quantization of the activation value of the second network substructure
  • the activation value of the processing layer can be quantized based on the set quantization parameter, and The quantized activation value is determined as the output data of the processing layer; in the case where the activation quantization identification of the second network substructure is a second identification that does not quantize the activation value of the second network substructure, it can be The activation value is determined as the output data of the processing layer.
  • the weight value in the second network substructure can be quantized through the rounding method of the weight value in each second network substructure and the first quantization parameter, and based on the activation of each second network substructure
  • the quantization flag determines whether to quantize the activation value in the second network substructure, and on this basis, the input data of each second network substructure can be processed to obtain the second output data of each second network substructure .
  • the second network substructure in the case that the activation quantization flag of the second network substructure is the first flag, during the process of processing the input data of the second network substructure, the second network substructure The activation values in the structure are quantized based on the second quantization parameter.
  • the second quantization parameter may be a quantization parameter used to quantize the activation value in the second network substructure, including but not limited to quantization step size, preset precision of quantization scale, quantization symmetry, quantization bit width and quantization at least one of granularity and the like.
  • the second network model can be used to process the calibration data set in advance to obtain the output data of the second network model, and the activation of the second network substructure can be obtained by performing statistics on the output data.
  • the value is quantized with the second quantization parameter.
  • the second quantization parameter that quantifies the activation value in the second network substructure may be an adjustable parameter of the second network substructure, and may be adjusted during the process of adjusting the parameters of the second network substructure The second quantization parameter is learned.
  • the second network substructure in the case that the activation quantization identifier of the second network substructure is the second identifier, during the process of processing the input data of the second network substructure, the second network substructure The activations in the structure are not quantized.
  • the activation quantization flag of at least one second network substructure in the second network model can be set as the second flag, so that the at least one second network substructure is not
  • the activation value of the network substructure is quantified to improve the diversity of the influence of the quantization of the activation value on each second network substructure, thereby further enhancing the ability of the second network model to learn to reduce the impact of the quantization of the activation value, and then
  • the flatness of the model under quantitative perturbation can be further improved, so that the adjusted second network model has the flatness of a common perspective, thereby improving the accuracy of the model.
  • An embodiment of the present disclosure provides a model quantization method, which can be executed by a processor of a computer device. As shown in Figure 3, the method includes the following steps S301 to S305:
  • Step S301 acquiring first output data of at least one first network substructure in the first network model; wherein each of the first output data is obtained by processing a calibration data set using the first network model.
  • the above-mentioned step S301 corresponds to the above-mentioned step S101, and the specific implementation manner of the above-mentioned step S101 can be referred to for implementation.
  • Step S302 for each second network substructure in the second network model, randomly assign values to activation quantization identifiers of the second network substructures based on set probability distribution parameters.
  • the probability distribution parameter may be any suitable parameter that can characterize the possibility of at least one value of the activation quantization identifier of the second network substructure, which may include but not limited to assign the activation quantization identifier of the second network substructure to the first At least one of the probability of an identification, the probability of assigning the quantified activation identification of the second network substructure as the second identification, and the like.
  • the activation quantization identifier of each second network substructure can be randomly assigned as the first identifier or the second identifier.
  • the probability distribution parameters may be preset, and those skilled in the art may set appropriate probability distribution parameters according to actual conditions, which is not limited in the embodiments of the present disclosure.
  • Step S303 using the second network model to process the calibration data set based on the activation quantification identification of at least one second network substructure in the second network model, to obtain the first Two output data; wherein, the second network model is obtained after quantizing the first network model, and the activation quantization identification of each second network substructure indicates whether the second network substructure is Activation values are quantified.
  • Step S304 for each first network substructure, based on the first output data of the first network substructure and the second network substructure corresponding to the first network substructure in the second network model
  • the second output data of the structure adjusts the parameters of the second network substructure.
  • Step S305 if it is determined that the preset condition is met, the adjusted second network model is determined as the third network model.
  • the above-mentioned steps S303 to S305 correspond to the above-mentioned steps S102 to S104 respectively, and the specific implementation manners of the above-mentioned steps S102 to S104 can be referred to for implementation.
  • the random assignment of the activation quantization identifier of the second network substructure based on the set probability distribution parameters described in step S302 may include at least one of steps S311 and S312:
  • Step S311 in the case that the probability distribution parameter includes a quantization probability, based on the quantization probability, randomly assign a value to the activation quantization identifier of the second network substructure; wherein the quantization probability characterizes the second network
  • the activation quantification identifier of the substructure is assigned a probability of the first identifier, and the first identifier represents the quantification of the activation value of the corresponding second network substructure.
  • the activation quantization identifier of the second network substructure may be randomly assigned as the first identifier or the second identifier.
  • the quantization probability may be 0, 0.25, 0.5, 0.75 or 1, etc.
  • the activation quantization identification of the second network substructure can be randomly assigned as the first identification or the second identification, wherein the activation quantization of the second network substructure
  • the probability of assigning the identifier as the first identifier is the quantized probability p
  • the probability of assigning the activated quantized identifier of the second network substructure as the second identifier is 1-p.
  • Step S312 in the case that the probability distribution parameter includes a quantized inactivation probability, randomly assign a value to the quantized activation identifier of the second network substructure based on the quantized inactivation probability; wherein, the quantized inactivation probability
  • the activation quantification identifier characterizing the second network substructure is assigned a probability of the second identifier, and the second identifier indicates that the activation value of the corresponding second network substructure is not quantified.
  • the quantified activation identifier of the second network substructure can be randomly assigned as the first identifier or the second identifier.
  • those skilled in the art can pre-set the appropriate quantitative inactivation probability according to the actual situation.
  • the quantified inactivation probability can be 0, 0.25, 0.5, 0.75 or 1, etc.
  • the activation quantization identifier of the second network substructure can be randomly assigned as the first identifier or the second identifier, wherein the second network substructure
  • the probability of assigning the activation quantization identifier of the structure to the first identifier is the 1-q
  • the probability of assigning the activation quantization identifier of the second network substructure to the second identifier is the quantized inactivation probability q.
  • quantization probability and/or quantization inactivation probability can be set for each second network substructure in the second network model, or can be set separately for each second network substructure in the second network model Different quantization probabilities and/or quantization inactivation probabilities are not limited by the embodiments of the present disclosure.
  • the activation quantization identifier of each second network substructure in the second network model is randomly assigned, so that the activation quantification identifier of each second network substructure is calculated according to the probability
  • the distribution parameters are randomly assigned as the first identifier or the second identifier.
  • An embodiment of the present disclosure provides a model quantization method, which can be executed by a processor of a computer device. As shown in Figure 4, the method includes the following steps S401 to S404:
  • Step S401 acquiring first output data of at least one first network substructure in the first network model; wherein each of the first output data is obtained by processing a calibration data set using the first network model.
  • Step S402 using the second network model to process the calibration data set based on the activation quantification identification of at least one second network substructure in the second network model, to obtain the first Two output data; wherein, the second network model is obtained after quantizing the first network model, and the activation quantization identification of each second network substructure indicates whether the second network substructure is Activation values are quantified.
  • the above-mentioned steps S401 to S402 correspond to the above-mentioned steps S101 to S102 respectively, and the specific implementation manners of the above-mentioned steps S101 to S102 can be referred to for implementation.
  • Step S403 for each first network substructure, based on the first output data of the first network substructure and the second network substructure corresponding to the first network substructure in the second network model
  • the second output data of the structure determines the loss value of the second network substructure, and adjusts the parameters of the second network substructure based on the loss value.
  • the loss value of each second network substructure may be the loss between the second output data of the second network substructure and the first output data of the corresponding first network substructure in the adjusted second network model value.
  • the calibration data set includes at least one target sample
  • the first output data may include first sub-data obtained by processing each target sample respectively
  • the second output data may include The second sub-data obtained by processing; for each target sample, based on the similarity between the first sub-data corresponding to the target sample and the second sub-data, it can be determined that the second network substructure performs the target sample
  • Processing loss value based on the mean square error between the loss values of the second network substructure processing each target sample, the loss value of the second network substructure can be determined.
  • the parameters of the second network substructure may be adjusted.
  • the parameters of the second network substructure may be adjusted when it is determined that the change of the loss value of the second network substructure converges.
  • Step S404 if it is determined that the preset condition is met, the adjusted second network model is determined as the third network model.
  • the above-mentioned step S404 corresponds to the above-mentioned step S104, and the specific implementation manner of the above-mentioned step S104 can be referred to for implementation.
  • Two output data determine the loss value of the second network substructure, and adjust the parameters of the second network substructure based on the loss value, so that the parameters of the second network substructure can be adjusted to maintain
  • the consistency of the output between the second network substructure and the corresponding first network substructure can effectively improve the accuracy of the quantized model.
  • the parameters of the second network substructure include rounding methods used to quantify the weight values in the second network substructure; based on the loss value described in step S403 above, Adjusting parameters of the second network substructure includes:
  • Step S411 based on the loss value, update the rounding method used for quantizing the weight value in the second network substructure, so that in the process of processing the calibration data set, the The quantization function in a rounding manner quantizes the weight values in the second network substructure.
  • the corresponding quantization function can be determined.
  • the quantization function can be used to quantize the weight values in the second network substructure to a specific bit width.
  • an appropriate optimization algorithm can be used to update the rounding method used to quantify the weight value in the second network substructure according to the actual situation, such as gradient descent method, simulated annealing method, etc., which is not included in the embodiments of the present disclosure. Not limited.
  • the rounding method used for quantizing the weight values in the second network substructure is updated.
  • the fine-tuning of the quantized second network model can be realized by updating the rounding method of weight value quantization, so that the accuracy of the adjusted second network model (ie, the third network model) can be further improved.
  • Model quantization methods in the related art are mainly divided into two categories: quantization-aware training and post-training quantization.
  • Quantization-aware training requires a complete model training process, requiring a large amount of sample data and a large amount of GPU computing power for end-to-end model training. After training, quantization can obtain a quantized model based on a small amount of sample data and a small amount of GPU computing power.
  • the post-training quantization scheme in the related art can round and quantize the weight value in the pre-trained full-precision network model, and learn to quantify the weight value by modeling the quantization of the weight value as noise and constructing a quantization optimization goal.
  • the rounding method used to quantize the value (eg, round up or round down).
  • the post-training quantization scheme in the related art is not good enough for the quantization of 2-bit, 3-bit and other low-bit width quantization and detection tasks that are more complicated than classification tasks, and the accuracy of the obtained quantization model is not high enough.
  • each block structure/processing layer is used to fine-tune the quantized weight value (ie, weight fine-tuning), and the quantization parameters (such as step size, etc.) that quantize the activation value are determined after weight fine-tuning, and are not considered during weight fine-tuning Quantization of activation values. That is to say, the weight fine-tuning in the related art is independent from the quantization of the activation value.
  • the obtained weight values in the quantized model are the same.
  • an embodiment of the present disclosure provides a model quantization method, which introduces quantization of activation values during weight fine-tuning, which can effectively improve the performance of the quantized model and make the quantized model flatter in some directions (That is, the quantized loss of the quantized model obtained under the quantized perturbation of the weight changes relatively little).
  • the quantization of the activation value during weight fine-tuning can be randomly deactivated, that is, it is randomly determined whether the activation value in the model to be adjusted (corresponding to the second network model in the preceding embodiment) is to be adjusted during weight fine-tuning.
  • Quantization can improve the diversity of the influence of the quantization of the activation value on each second network substructure, so that the quantized and adjusted model (corresponding to the third network model in the preceding embodiment) has the flatness of the general perspective, thereby bringing precision improvement.
  • At least one processing layer of the full-precision network model to be quantized (corresponding to the first network model) is regarded as a block structure (for example, the bottleneck layer is regarded as a block structure), and the quantized weight value in each block structure
  • the rounding method used is adjusted to adjust the quantized weights.
  • Scheme 2 As shown in FIG. 5B , before adjusting the rounding method adopted for the quantization weight value in the k-th block structure, quantize the activation values in the first block structure to the k-th block structure;
  • K is a positive integer greater than 1, and k is greater than 1 and does not exceed K.
  • the second network model can be obtained by quantizing the first network model, and the above schemes 1 to 3 are used to adjust the rounding method used to quantify the weight value of each block structure in the second network model , the performance of the corresponding third network model obtained is shown in Table 1.
  • the accuracy of the third network model obtained based on scheme 1, scheme 2 and scheme 3 respectively is 18.88, 45.74, 48.07; for the first network model using ResNet-50, the accuracy of the third network model based on scheme 1, scheme 2, and scheme 3 were 4.34, 46.98, 49.07 respectively; for the first network model using MobileNetV2 model, the accuracy of the third network model obtained based on scheme 1, scheme 2, and scheme 3 is 5.83, 50.71, and 51.20 respectively; for the first network model using RegNet-600MF, the accuracy of the third network model based on scheme 1, scheme 2, and scheme 3 is respectively obtained The accuracy of the third network model is 42.77, 60.94, 62.07; for the first network model using MnasNet, the accuracy of the third network model based on scheme 1, scheme 2, and scheme 3 are 26.62, 58.79, 60.19 respectively.
  • the accuracy of the third network model obtained by using scheme 2 and scheme 3 far exceeds scheme 1 From the obtained accuracy of the third network model, it can be seen that considering the quantization of activation values during weight fine-tuning can enable the fine-tuning of weights to learn to reduce the impact of quantization of activation values, which can bring huge performance improvements.
  • the accuracy of the third network model obtained by scheme 3 is higher than that of the third network model obtained by scheme 2. It can be seen that the quantization of activation values partially introduced during weight fine-tuning is better than fully introduced.
  • the final loss target can be analyzed by simultaneously modeling quantization of weight values and activation values as noise.
  • the quantization noise can be passed by to represent, among them, is the parameter after quantization, and a is the parameter of full precision.
  • the range of noise expressed in this way will be affected by the parameter range or quantization step size.
  • the quantization of the activation value can be simulated as the activation quantization noise u and the quantized activation value
  • the relationship between the full-precision activation value a is expressed in the form of multiplication: This is formally equivalent to addition.
  • the optimization objective of model quantification can be expressed as the following formula 1:
  • x is the input data of the block structure in the second network model to be weight fine-tuned, which is sampled from the calibration data set Dc
  • u(x) represents the activation introduced by quantizing the activation value when the input data is x
  • w is the weight value of full precision in the block structure
  • L(w,x,1) is the output data of the block structure in the full-precision network model corresponding to the second network model
  • the optimization goal is to optimize the rounding method used to quantify the weight values of each block structure in the second network model, so as to find a make minimum.
  • the optimization goal adds an item related to the quantization of the activation value.
  • the activation quantization noise u(x) is transferred to the weight quantization noise, and the transferred weight quantization noise can be expressed as 1+v(x), can be converted into the form shown in the following formula 2:
  • item (3-1) is the optimization goal in the scheme that does not consider the quantization of activation value in the weight fine-tuning stage
  • (3-2) is a new item brought after the quantization of activation value is introduced.
  • the flatness of the quantized weight value can be improved, that is, the change of the network loss caused by the quantized weight value under the action of disturbance 1+v(x) is relatively small. That is to say, on the input data x, for the activation quantization noise u(x), there is a corresponding weight quantization noise v(x), which makes the quantized model flatter under the perturbation of the weight quantization noise v(x).
  • both scheme 2 and scheme 3 above introduce the quantization of the activation value in the weight fine-tuning stage, so that the term (3-2) in formula 3 can be additionally optimized to obtain a flatter quantized model.
  • model quantization For the optimization goal of model quantization expressed in the form of formula 3 above, since post-training quantization (i.e. offline quantization) is very sensitive to calibration data, in practical application scenarios, it is not only necessary to obtain a flatter quantization model on the calibration data set, It is more necessary to make the loss of the optimization target of model quantification as small as possible on the test data set.
  • the direction of model flatness can be diversified by diversifying the weight quantization noise v(x) to meet the required model flatness on the test data set, so that the performance of the quantized model on the test data set is also Better.
  • the weight quantization noise v(x) can be modified by randomly discarding part of the activation quantization noise u(x), thereby introducing different weight quantization noise v(x) in the weight fine-tuning stage.
  • the weight can be effectively increased
  • the diversity of quantization noise v(x) can further improve the performance of the quantized model in the test data set, and make the quantized model flat from a general perspective, and can further expand offline quantization to 2 bits wide quantization.
  • the model quantization method provided by the embodiments of the present disclosure can achieve a 3% improvement in accuracy on the ImageNet classification task when a 4-bit quantization bit width is used, especially for some more A lightweight network model suitable for the mobile terminal, which can even achieve a 51.49% accuracy improvement when using a 2-bit quantization bit width; on the MS COCO dataset, for detection tasks, the two-stage Faster RCNN model can achieve a 4-bit quantization bit width
  • the precision of the model obtained by quantizing the following is similar to that of the full-precision floating-point model.
  • For a single-stage RetinaNet when the weight quantization bit width is 2 bits and the activation quantization bit width is 4 bits, it can even reach 6.5 mean average precision (Mean Average Precision, mAP) growth.
  • the model quantification method provided by the embodiments of the present disclosure can be applied in various scenarios such as deployment scenarios of large models, model application scenarios on edge devices, classification task scenarios, and detection task scenarios, and can be applied to different tasks and different models and different quantization bit widths.
  • Model quantization based on the model quantization method provided by the embodiments of the present disclosure can, on the one hand, enable the quantized model to meet the requirements of inference speed and have higher precision; on the other hand, can enable the quantized model to meet the requirements of low power consumption need.
  • FIG. 6 is a schematic diagram of the composition and structure of a model quantization device provided by an embodiment of the present disclosure.
  • the model quantization device 600 includes: a first acquisition part 610 , a processing part 620 , an adjustment part 630 and a first determination part 640 ,in:
  • the first obtaining part 610 is configured to obtain the first output data of at least one first network substructure in the first network model; wherein, each of the first output data is a calibration data set using the first network model processed;
  • the processing part 620 is configured to use the second network model to process the calibration data set based on the activation quantization identification of at least one second network substructure in the second network model to obtain each of the second network The second output data of the substructure; wherein, the second network model is obtained after quantizing the first network model, and the activation quantization identification of each second network substructure indicates whether the second The activation value of the network substructure is quantified;
  • the adjustment part 630 is configured to, for each of the first network substructures, based on the first output data of the first network substructure, and the output data corresponding to the first network substructure in the second network model The second output data of the second network substructure adjusts the parameters of the second network substructure;
  • the first determining part 640 is configured to determine the adjusted second network model as the third network model when it is determined that the preset condition is satisfied.
  • the processing part is further configured to: determine the calibration dataset as input data for a first second network substructure in the second network model; for the second network model For each second network substructure in , use the second network substructure to process the input data of the second network substructure based on the weight value and activation quantization identifier of the second network substructure, to obtain second output data of the second network substructure, and use the second output data as input data of the next second network substructure.
  • the processing part is further configured to: quantize the weight values of the second network substructure based on the rounding manner of the weight values in the second network substructure and the first quantization parameter, Obtaining a quantized weight value; based on the quantized weight value and the activation quantization identifier of the second network substructure, processing the input data of the second network substructure to obtain the second network substructure The second output data of .
  • the second network substructure in the case that the activation quantization flag of the second network substructure is the first flag, during the process of processing the input data of the second network substructure, the second network substructure The activation value in the structure is quantized based on the second quantization parameter; and/or, in the case where the activation quantization identification of the second network substructure is the second identification, in the case of the second network substructure During the processing of the input data, the activation values in the second network substructure are not quantized.
  • the apparatus further includes: a second acquisition part configured to, for each second network substructure in the second network model, based on a set probability distribution parameter, to the second The activation quantification identification of the network substructure is randomly assigned.
  • the second acquiring part is further configured to: if the probability distribution parameter includes a quantization probability, based on the quantization probability, randomly assign a value to the activation quantization identifier of the second network substructure,
  • the quantization probability represents the probability that the activation quantization identifier of the second network substructure is assigned a first identifier, and the first identifier represents the quantification of the activation value of the corresponding second network substructure; and/or, in the In the case where the probability distribution parameter includes a quantized inactivation probability, based on the quantized inactivation probability, the quantized activation identifier of the second network substructure is randomly assigned; wherein the quantified inactivation probability represents the second The activation quantification identifier of the network substructure is assigned a probability of the second identifier, and the second identifier indicates that the activation value of the corresponding second network substructure is not quantified.
  • the adjusting part is further configured to: determine a loss value of the second network substructure based on the first output data and the second output data; Adjust the parameters of the second network substructure.
  • the parameters of the second network substructure include the rounding method used to quantize the weight values in the second network substructure; the adjustment part is further configured to: based on the loss Value, update the rounding method used to quantize the weight value in the second network substructure, so that in the process of processing the calibration data set, use the quantization function corresponding to the rounding method to The weight values in the second network substructure are quantized.
  • the first determining part is further configured as at least one of the following: based on each of the first output data and each of the second output data, determine each of the second network sub- When the loss value of the structure satisfies the preset loss constraint, it is determined that the preset condition is met; when the number of adjustments to each second network substructure meets the preset number of constraints, it is determined that the preset condition is met. the preset conditions.
  • the apparatus further includes: a quantization part configured to quantize each first network substructure in the first network model according to at least one bit width to obtain the second network model; wherein, the second network model includes at least one second network substructure respectively corresponding to each of the first network substructures, and each of the second network substructures includes one of the following: stage structure, Block structure, processing layers.
  • a "part" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course, it may also be a unit, and it may also be a module or a non-module of.
  • the above model quantification method is implemented in the form of software function modules and sold or used as an independent product, it can also be stored in a computer-readable storage medium.
  • a software product which is stored in a storage medium and includes several instructions to make a
  • a computer device which may be a personal computer, a server, or a network device, etc.
  • the aforementioned storage medium includes: various media that can store program codes such as U disk, mobile hard disk, read-only memory (Read Only Memory, ROM), magnetic disk or optical disk.
  • embodiments of the present disclosure are not limited to any specific combination of hardware and software.
  • An embodiment of the present disclosure provides a computer device, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the steps in the above method when executing the program.
  • An embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps in the above method are implemented.
  • the computer readable storage medium may be transitory or non-transitory.
  • An embodiment of the present disclosure provides a computer program product.
  • the computer program product includes a non-transitory computer-readable storage medium storing a computer program. When the computer program is read and executed by a computer, a part or part of the above-mentioned method is implemented. All steps.
  • the computer program product can be specifically realized by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in some embodiments, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) and the like.
  • FIG. 7 is a schematic diagram of a hardware entity of a computer device in an embodiment of the present disclosure.
  • the hardware entity of the computer device 700 includes: a processor 701, a communication interface 702, and a memory 703, wherein:
  • Processor 701 generally controls the overall operation of computer device 700 .
  • the communication interface 702 enables the computer device to communicate with other terminals or servers through the network.
  • the memory 703 is configured to store instructions and applications executable by the processor 701, and can also cache data to be processed or processed by the processor 701 and various modules in the computer device 700 (for example, image data, audio data, voice communication data and Video communication data) can be realized by flash memory (FLASH) or random access memory (Random Access Memory, RAM). Data transmission may be performed between the processor 701 , the communication interface 702 and the memory 703 through the bus 704 .
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units; they may be located in one place or distributed to multiple network units; Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may be used as a single unit, or two or more units may be integrated into one unit; the above-mentioned integration
  • the unit can be realized in the form of hardware or in the form of hardware plus software functional unit.
  • the above-mentioned integrated units of the present disclosure are realized in the form of software function modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.
  • the computer software product is stored in a storage medium and includes several instructions to make a A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage medium includes various media capable of storing program codes such as removable storage devices, ROMs, magnetic disks or optical disks.
  • the embodiment of the present disclosure discloses a model quantification method, device, device, storage medium, and computer program product, wherein the method includes: acquiring the first network model and processing the calibration data set at least one of the first network models The first output data of a network substructure; using the second network model obtained after quantizing the first network model, based on the activation quantization identification of at least one second network substructure in the second network model, the calibration data set is processed , to obtain the second output data of each second network substructure; for each first network substructure, based on the first output data of the first network substructure, and the second network corresponding to the first network substructure The second output data of the substructure adjusts the parameters of the second network substructure; when it is determined that the preset condition is satisfied, the adjusted second network model is determined as the third network model.
  • the accuracy of the quantized model can be improved, and the flatness of the model under quantization disturbance can be improved, so that models quantized with different bit widths can achieve higher accuracy, and thus can meet different tasks and Model quantification requirements on different deployment devices improve the breadth of model quantification applications.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本公开实施例公开了一种模型量化方法、装置、设备、存储介质及计算机程序产品,其中,所述方法包括:获取第一网络模型对校准数据集进行处理后第一网络模型中至少一个第一网络子结构的第一输出数据;利用对第一网络模型进行量化后得到的第二网络模型,基于第二网络模型中至少一个第二网络子结构的激活量化标识,对校准数据集进行处理,得到每一第二网络子结构的第二输出数据;针对每一第一网络子结构,基于该第一网络子结构的第一输出数据,和与该第一网络子结构对应的第二网络子结构的第二输出数据,对该第二网络子结构的参数进行调整;在确定满足预设条件的情况下,将调整后的第二网络模型确定为第三网络模型。

Description

模型量化方法、装置、设备、存储介质及程序产品
相关申请的交叉引用
本公开基于申请号为202210208524.4、申请日为2022年03月04日、申请名称为“模型量化方法、装置、设备、存储介质及程序产品”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。
技术领域
本公开涉及但不限于信息技术领域,尤其涉及一种模型量化方法、装置、设备、存储介质及计算机程序产品。
背景技术
现代深度学习技术通过消耗更大的内存和算力来追求更高的性能。虽然大模型可以在云端进行训练,但由于计算资源(包括延迟、能源和内存消耗)有限,将模型直接部署在边缘设备是非常困难的。通过模型量化、剪枝、蒸馏、轻量级网络设计和权重矩阵分解等技术,可以加速深度模型的推理,其中,模型量化可以将神经网络中的权值和激活值从原始的浮点类型量化到低位宽(如8比特、4比特、3比特、2比特等)整型。模型量化后得到的量化神经网络模型所需要的储存空间降低了,计算形式也从原始的浮点型运算变为了代价更小的低位宽整型数据的计算。
相关技术中的模型量化方案中,量化后的模型存在精度不够高的问题,尤其是在采用低比特位宽(如3比特、2比特等)进行模型量化的情况下,模型的精度会大幅下降,无法满足应用需求。
发明内容
有鉴于此,本公开实施例提供一种模型量化方法、装置、设备、存储介质及计算机程序产品。
本公开实施例的技术方案是这样实现的:
一方面,本公开实施例提供一种模型量化方法,所述方法包括:
获取第一网络模型中至少一个第一网络子结构的第一输出数据;其中,每一所述第一输出数据是利用所述第一网络模型对校准数据集进行处理得到的;
利用第二网络模型,基于所述第二网络模型中至少一个第二网络子结构的激活量化标识,对所述校准数据集进行处理,得到每一所述第二网络子结构的第二输出数据;其中,所述第二网络模型是对所述第一网络模型进行量化后得到的,每一所述第二网络子结构的激活量化标识表征是否对所述第二网络子结构的激活值进行量化;
针对每一所述第一网络子结构,基于所述第一网络子结构的第一输出数据,和所述第二网络模型中与所述第一网络子结构对应的第二网络子结构的第二输出数据,对所述第二网络子结构的参数进行调整;
在确定满足预设条件的情况下,将调整后的所述第二网络模型确定为第三网络模型。
另一方面,本公开实施例提供一种模型量化装置,所述装置包括:
第一获取部分,被配置为获取第一网络模型中至少一个第一网络子结构的第一输出数据;其中,每一所述第一输出数据是利用所述第一网络模型对校准数据集进行处理得到的;
处理部分,被配置为利用第二网络模型,基于所述第二网络模型中至少一个第二网络子结构的激活量化标识,对所述校准数据集进行处理,得到每一所述第二网络子结构的第二输出数据;其中,所述第二网络模型是对所述第一网络模型进行量化后得到的, 每一所述第二网络子结构的激活量化标识表征是否对所述第二网络子结构的激活值进行量化;
调整部分,被配置为针对每一所述第一网络子结构,基于所述第一网络子结构的第一输出数据,和所述第二网络模型中与所述第一网络子结构对应的第二网络子结构的第二输出数据,对所述第二网络子结构的参数进行调整;
第一确定部分,被配置为在确定满足预设条件的情况下,将调整后的所述第二网络模型确定为第三网络模型。
再一方面,本公开实施例提供一种计算机设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述方法中的部分或全部步骤。
又一方面,本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述方法中的部分或全部步骤。
又一方面,本公开实施例提供一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算机设备中运行时,所述计算机设备中的处理器执行用于实现上述方法中的部分或全部步骤。
又一方面,本公开实施例提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序被计算机读取并执行时,实现上述方法中的部分或全部步骤。
本公开实施例中,通过获取第一网络模型中至少一个第一网络子结构的第一输出数据,每一第一输出数据是利用第一网络模型对校准数据集进行处理得到的;利用对第一网络模型进行量化后得到的第二网络模型,基于第二网络模型中至少一个第二网络子结构的激活量化标识,对校准数据集进行处理,得到每一第二网络子结构的第二输出数据,每一第二网络子结构的激活量化标识表征是否对该第二网络子结构的激活值进行量化;针对每一第一网络子结构,基于该第一网络子结构的第一输出数据,和第二网络模型中与该第一网络子结构对应的第二网络子结构的第二输出数据,对该第二网络子结构的参数进行调整;在确定满足预设条件的情况下,将调整后的第二网络模型确定为第三网络模型。这样,由于在对量化后的第二网络模型中各第二网络子结构的参数进行调整的过程中,基于每一第二网络子结构的激活量化标识确定是否对该第二网络子结构的激活值进行量化,从而可以使得第二网络模型在各第二网络子结构的调整过程中,学习减少激活值的量化带来的影响,如此,能够提高量化后模型的精度,并能提升模型在量化扰动下的平坦度,使得采用不同比特位宽进行量化的模型均能达到较高的精度,进而可以满足不同任务以及不同部署设备上的模型量化需求,提高模型量化应用的广泛性。
附图说明
图1为本公开实施例提供的一种模型量化方法的实现流程示意图;
图2为本公开实施例提供的一种模型量化方法的实现流程示意图;
图3为本公开实施例提供的一种模型量化方法的实现流程示意图;
图4为本公开实施例提供的一种模型量化方法的实现流程示意图;
图5A为本公开实施例提供的一种对第k个块结构中量化权重值所采用的取整方式进行调整的实现示意图;
图5B为本公开实施例提供的一种对第k个块结构中量化权重值所采用的取整方式进行调整的实现示意图;
图5C为本公开实施例提供的一种对第k个块结构中量化权重值所采用的取整方式进行调整的实现示意图;
图6为本公开实施例提供的一种模型量化装置的组成结构示意图;
图7为本公开实施例提供的一种计算机设备的硬件实体示意图。
具体实施方式
为了使本公开的目的、技术方案和优点更加清楚,下面结合附图和实施例对本公开的技术方案进一步详细阐述,所描述的实施例不应视为对本公开的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本公开保护的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
在以下的描述中,所涉及的术语“第一/第二/第三”仅仅是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一/第二/第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本公开实施例能够以除了在这里图示或描述的以外的顺序实施。
除非另有定义,本文所使用的所有的技术和科学术语与属于本公开的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本公开的目的,不是旨在限制本公开。
本公开实施例提供一种模型量化方法,该方法可以由计算机设备的处理器执行。其中,计算机设备指的可以是服务器、笔记本电脑、平板电脑、台式计算机、智能电视、机顶盒、移动设备(例如移动电话、便携式视频播放器、个人数字助理、专用消息设备、便携式游戏设备)等具备数据处理能力的设备。图1为本公开实施例提供的一种模型量化方法的实现流程示意图,如图1所示,该方法包括如下步骤S101至步骤S104:
步骤S101,获取第一网络模型中至少一个第一网络子结构的第一输出数据;其中,每一所述第一输出数据是利用所述第一网络模型对校准数据集进行处理得到的。
这里,第一网络模型可以是任意合适的待量化的神经网络模型,可以是全精度的神经网络模型,示例性的,所述第一网络模型可以是32位浮点型参数或者16位浮点型参数的神经网络模型,当然,本公开实施例对第一网络模型的浮点位数不做限定。在实施时,第一网络模型可以采用任意合适的神经网络结构,包括但不限于ResNet-18、ResNet-50、MobileNetV2、EfficientNet-Lite、RegNet、BERT等中的一种或多种。第一网络模型可以是基于卷积神经网络实现的,也可以是基于转换器网络(Transformer)实现的,这里并不限定。
神经网络模型的结构按照不同粒度可以包括阶段(stage)、块(block)、处理层(layer),每个神经网络模型可以包括至少一个阶段,每个阶段可以包括至少一个块(block),每个块可以包括至少一个处理层(layer),其中,处理层可以例如输入层、卷积层、池化层、下采样层、线性修正单元、全连接层、批量归一化层等。在本公开实施例中,第一网络模型中可以包括至少一个第一网络子结构,第一网络子结构可以是单个的处理层,也可以是包括至少两个处理层的块结构,还可以是包括至少两个块结构的阶段结构。在实施时,本领域技术人员可以根据实际情况采用合适的粒度确定第一网络模型中的至少一个第一网络子结构,本公开实施例对此并不限定。
校准数据集中可以包括至少一个图像数据、点云数据、或语音数据等等,本公开实施例对校准数据集中数据的类型不做限定。在实施时,校准数据集可以是预先设定的,也可以是从特定的数据集中采样得到的,这里并不限定。在一些实施方式中,校准数据集可以是根据待执行的任务确定的,待执行的任务可以包括但不限于分类任务、目标检测任务等中的至少之一。例如,在待执行的任务为图像分类任务的情况下,校准数据集可以为ImageNet数据集等。又如,在待执行的任务为目标检测任务的情况下,校准数 据集可以为MS COCO数据集等。
利用第一网络模型对校准数据集进行处理,可以得到第一网络模型中至少一个第一网络子结构的输出数据(记为第一输出数据)。在实施时,每一第一网络子结构的第一输出数据可以是预先确定的,也可以是实时对校准数据集进行处理得到的,这里并不限定。
步骤S102,利用第二网络模型,基于所述第二网络模型中至少一个第二网络子结构的激活量化标识,对所述校准数据集进行处理,得到每一所述第二网络子结构的第二输出数据;其中,所述第二网络模型是对所述第一网络模型进行量化后得到的,每一所述第二网络子结构的激活量化标识表征是否对所述第二网络子结构的激活值进行量化。
这里,第二网络模型是对第一网络模型进行量化后得到的。在实施时,可以采用任意合适的模型量化算法对第一网络模型进行量化,得到第二网络模型。例如,可以对第一网络模型进行预训练,并对预训练后的第一网络模型进行合适的量化操作,得到第二网络模型。又如,可以对第一网络模型进行量化感知训练,得到的训练后的第一网络模型即为量化后的第二网络模型。对第一网络模型进行量化后得到的第二网络模型中可能会存在较大的量化损失,可以通过进一步优化以减少该量化损失。
第二网络模型的结构与第一网络模型的结构相同,即所述第一网络模型中具有的每一第一网络子结构,在第二网络模型中均具有对应的与该第一网络子结构相同的第二网络子结构。区别在于,第二网络模型中的参数已量化至预设的比特位宽。例如,第二网络模型可以是对第一网络模型按照1比特位宽或2比特位宽量化的量化模型、对第一网络模型按照4比特位宽量化的量化模型、对第一网络模型按照8比特位宽量化的量化模型等等。当然,第二网络模型中不同的第二网络子结构的参数可以是按照不同的比特位宽量化的,也可以是按照相同的比特位宽量化的;同一第二网络子结构中权重值的量化和激活值的量化可以采用相同的比特位宽,也可以采用不同的比特位宽,本公开实施例对此并不限定。在一些实施方式中,对第一网络模型中的每一第一网络子结构进行量化所采用的比特位宽可以是根据待执行的任务、待部署第三网络模型的部署设备等中的至少之一确定的。例如,可以根据待部署第三网络模型的部署设备的硬件信息(如存储资源量、运算资源量、硬件类型、功耗等中的至少之一)确定对第一网络模型中的每一第一网络子结构进行量化所采用的比特位宽,这样可以使得第三网络模型可以更好地满足在该部署设备上进行部署的需求。又如,可以根据待执行的任务的目标任务需求(如任务类型、任务耗时量、任务处理精度、任务处理速度等中的至少之一)确定对第一网络模型中的每一第一网络子结构进行量化所采用的比特位宽,这样可以使得第三网络模型可以更好地满足该目标任务需求。
在利用第二网络模型对校准数据集进行处理的过程中,对于第二网络模型中的每一第二网络子结构,可以通过一个激活量化标识来表征是否对该第二网络子结构的激活值进行量化。每一第二网络子结构的激活量化标识可以是表征对该第二网络子结构的激活值进行量化的第一标识,或者表征不对该第二网络子结构的激活值进行量化的第二标识。在利用第二网络模型对校准数据集进行处理的过程中,可以基于设定的量化参数对激活量化标识为第一标识的第二网络子结构中的激活值进行量化,而对于激活量化标识为第二标识的第二网络子结构,不对该第二网络子结构中的激活值进行量化。在实施时,第二网络模型中每一第二网络子结构的激活量化标识可以是预先设定的,也可以是在利用第二网络模型对校准数据集进行处理的过程中动态确定的,本公开实施例对此并不限定。在利用第二网络模型对校准数据集进行多次处理的过程中,第二网络模型中各第二网络子结构的激活量化标识可以是相同的,也可以是不同的。
在一些实施方式中,可以在利用第二网络模型对校准数据集进行处理的过程中,采 用随机的方式确定每一第二网络子结构的激活量化标识。例如,可以基于设定的量化概率,将各第二网络子结构的激活量化标识确定为第一标识或第二标识。
在一些实施方式中,可以预先按照特定的量化设置规则,为每一第二网络子结构的激活量化标识赋值。例如,可以对第二网络模型中的每一第二网络子结构进行排序,将序号为奇数的第二网络子结构的激活量化标识确定为第一标识,将序号为偶数的第二网络子结构的激活量化标识确定为第二标识。
利用第二网络模型对校准数据集进行处理,可以得到第二网络模型中至少一个第二网络子结构的输出数据(记为第二输出数据)。
步骤S103,针对每一所述第一网络子结构,基于所述第一网络子结构的第一输出数据,和所述第二网络模型中与所述第一网络子结构对应的第二网络子结构的第二输出数据,对所述第二网络子结构的参数进行调整。
这里,对于第一网络模型中的每一第一网络子结构,可以基于该第一网络子结构的输出数据和第二网络模型中与该第一网络子结构对应的第二网络子结构的第二输出数据,对该第二网络子结构的参数进行至少一次调整。通过对第二网络子结构的参数进行至少一次调整,可以减少每一第二网络子结构的第二输出数据与对应的第一网络子结构的第一输出数据之间的损失,从而可以减少第二网络模型的量化损失。
第二网络子结构中可调整的参数可以包括但不限于第二网络子结构中各权重值的量化参数(如量化步长、量化尺度的预设精度、量化对称性、量化位宽和量化粒度等)、各激活值的量化参数、对第二网络子结构中的各权重值进行量化所采用的取整方式(如向上取整或向下取整等)、对第二网络子结构中的权重值进行量化的量化函数等中的至少之一。
在实施时,可以根据实际情况采用合适的参数优化算法对每一第二网络子结构的参数进行调整,如梯度下降法、模拟退火法等,本公开实施例对此并不限定。
步骤S104,在确定满足预设条件的情况下,将调整后的所述第二网络模型确定为第三网络模型。
这里,预设条件可以包括但不限于对各第二网络子结构的参数进行调整的次数达到设定的次数阈值、每一第一输出数据与对应的第二输出数据之间的损失值小于设定的第一损失阈值、各第一输出数据与对应的第二输出数据之间的总损失值小于设定的第二损失阈值等中的至少一种。
在一些实施方式中,可以对第二网络模型中的各第二网络子结构的参数进行至少一次调整,在确定对第二网络模型中各第二网络子结构的参数进行的调整满足预设条件的情况下,将调整后的第二网络模型确定为第三网络模型。
在一些实施方式中,可以在不满足该预设条件的情况下,利用调整后的第二网络模型,基于该第二网络模型中至少一个第二网络子结构的激活量化标识,对校准数据集进行处理,得到每一第二网络子结构的新的第二输出数据;并针对每一第一网络子结构,基于该第一网络子结构的第一输出数据,和该第二网络模型中与该第一网络子结构对应的第二网络子结构的新的第二输出数据,对该第二网络子结构的参数进行再次调整。
本公开实施例中,通过获取第一网络模型中至少一个第一网络子结构的第一输出数据,每一第一输出数据是利用第一网络模型对校准数据集进行处理得到的;利用对第一网络模型进行量化后得到的第二网络模型,基于第二网络模型中至少一个第二网络子结构的激活量化标识,对校准数据集进行处理,得到每一第二网络子结构的第二输出数据,每一第二网络子结构的激活量化标识表征是否对该第二网络子结构的激活值进行量化;针对每一第一网络子结构,基于该第一网络子结构的第一输出数据,和第二网络模型中与该第一网络子结构对应的第二网络子结构的第二输出数据,对该第二网络子结构的参 数进行调整;在确定满足预设条件的情况下,将调整后的第二网络模型确定为第三网络模型。这样,由于在对量化后的第二网络模型中各第二网络子结构的参数进行调整的过程中,基于每一第二网络子结构的激活量化标识确定是否对该第二网络子结构的激活值进行量化,从而可以使得第二网络模型在各第二网络子结构的调整过程中,学习减少激活值的量化带来的影响,如此,能够提高量化后模型的精度,并能提升模型在量化扰动下的平坦度,使得采用不同比特位宽进行量化的模型均能达到较高的精度,进而可以满足不同任务以及不同部署设备上的模型量化需求,提高模型量化应用的广泛性。
在一些实施方式中,每一所述第二网络子结构为处理层,这样,可以按照处理层粒度对第二网络模型中的参数进行调整,从而可以按照处理层粒度学习减少激活值的量化带来的影响,进而可以提高对第二网络模型中参数调整的准确性,使得量化后模型的精度得到进一步提高。
在一些实施例中,上述步骤S104中所述的确定满足预设条件,可以包括以下步骤S111和步骤S112中的至少之一:
步骤S111,在基于每一所述第一输出数据和每一所述第二输出数据,确定每一所述第二网络子结构的损失值满足预设的损失约束的情况下,确定满足所述预设条件。
这里,每一第二网络子结构的损失值可以是调整后的第二网络模型中该第二网络子结构的第二输出数据与对应的第一网络子结构的第一输出数据之间的损失值。损失约束可以是预先设定的对第二网络模型进行调整的目标,可以包括针对调整后的第二网络模型中各第二网络子结构的损失值的约束。例如,损失约束可以包括但不限于每一第二网络子结构的损失值均小于设定的第一损失阈值、第二网络模型中各第二网络子结构的损失值的总和小于设定的第二损失阈值等中的至少一种。
步骤S112,在对每一所述第二网络子结构进行调整的次数满足预设的次数约束的情况下,确定满足所述预设条件。
这里,次数约束可以是预先设定的对第二网络模型进行调整的目标,可以包括对每一第二网络子结构进行调整的次数的约束。例如,次数约束可以包括但不限于对每一第二网络子结构进行调整的次数均达到设定的第一次数阈值、对各第二网络子结构进行调整的次数中的最大值达到设定的第二次数阈值、对各第二网络子结构进行调整的次数的平均值达到设定的第三次数阈值等中的至少一种。
在实施时,本领域技术人员可以根据实际情况设定合适的损失约束或次数约束,本公开实施例对此并不限定。
在一些实施例中,上述方法还可以包括:
步骤S121,对所述第一网络模型中的每一第一网络子结构,按照至少一种位宽进行量化,得到第二网络模型;其中,第二网络模型中包括分别对应于每一第一网络子结构的至少一个第二网络子结构,每一第二网络子结构包括以下之一:阶段结构、块结构、处理层。
这样,可以支持按照不同粒度对第二网络模型中的参数进行调整,以按照不同粒度学习减少激活值的量化带来的影响,从而可以提高对第二网络模型中参数调整的灵活性。
在一些实施方式中,所述至少一种位宽可以包括但不限于1比特位宽、2比特位宽、3比特位宽、4比特位宽、8比特位宽等中的至少之一。
在上述实施例中,可以支持对采用1比特位宽、2比特位宽、3比特位宽、4比特位宽和/或8比特位宽进行量化的第二网络模型中的第二网络子结构的参数进行调整,使得量化后的模型能够达到较高的精度,从而可以满足多种任务以及多种部署设备上的模型量化需求,提高模型量化应用的广泛性。例如,可以支持对采用1比特位宽、2比特位宽、3比特位宽等极低比特位宽进行量化的第二网络模型中的第二网络子结构的参数进 行调整,使得极低比特位宽量化后的模型的精度得到有效提高,从而可以在减少部署量化后的模型所需的存储资源以及运算资源的同时,提高模型的精度。
在一些实施例中,上述方法还可以包括如下步骤S131至步骤S133:
步骤S131,从设定的候选数据集中采样至少一个候选样本。
这里,候选数据集可以是预先设定的,包括至少一个用于对第二网络模型进行校准的样本。在实施时,可以采用任意合适的采样方式从候选数据集中采样至少一个候选样本,本公开实施例对此并不限定。例如,可以从候选数据集中随机采样设定数量的候选样本,可以从候选数据集中均匀采样至少一个候选样本,还可以按照设定的筛选条件,从候选数据集中筛选出至少一个候选样本。
步骤S132,对每一所述候选样本进行数据增强处理,得到至少一个目标样本。
这里,针对每一候选样本,可以对该候选样本进行任意合适的数据增强处理,得到对应的目标样本。例如,数据增强处理可以包括但不限于随机翻转、随机裁剪、增加扰动等中的至少一种。
在实施时,可以对不同的候选样本采用相同的数据增强处理,也可以对不同的候选样本采用不同的数据增强处理,本公开实施例对此并不限定。
步骤S133,基于所述至少一个目标样本,得到所述校准数据集。
这里,可以将得到的部分或全部目标样本添加至校准数据集中。
上述实施例中,通过从设定的候选数据集中采样至少一个候选样本,并对每一候选样本进行数据增强处理,得到至少一个目标样本,基于至少一个目标样本,得到校准数据集。这样,由于校准数据集中的数据是对候选样本进行数据增强处理后得到的,可以增加校准数据集中样本的多样性,从而可以进一步增强第二网络模型学习减少激活值的量化带来的影响的能力,进而可以进一步提升模型在量化扰动下的平坦度。
本公开实施例提供一种模型量化方法,该方法可以由计算机设备的处理器执行。如图2所示,该方法包括如下步骤S201至步骤S205:
步骤S201,获取第一网络模型中至少一个第一网络子结构的第一输出数据;其中,每一所述第一输出数据是利用所述第一网络模型对校准数据集进行处理得到的。
这里,上述步骤S201对应于前述步骤S101,在实施时可以参照前述步骤S101的具体实施方式。
步骤S202,将所述校准数据集确定为第二网络模型中的第一个第二网络子结构的输入数据。
这里,可以将校准数据集输入第二网络模型中,该校准数据集即为该第二网络模型中的第一个第二网络子结构的输入数据。
步骤S203,针对所述第二网络模型中的每一第二网络子结构,利用所述第二网络子结构,基于所述第二网络子结构的权重值和激活量化标识,对所述第二网络子结构的输入数据进行处理,得到所述第二网络子结构的第二输出数据,并将所述第二输出数据作为下一个第二网络子结构的输入数据。
这里,所述第二网络模型是对所述第一网络模型进行量化后得到的,每一所述第二网络子结构的激活量化标识表征是否对所述第二网络子结构的激活值进行量化。
在实施时,第二网络模型中的每一第二网络子结构的第二输出数据可以作为下一个第二网络子结构的输入数据。
通过将每一第二网络子结构的输入数据输入该第二网络子结构,可以基于该第二网络子结构的权重值和激活量化标识,对该输入数据进行处理得到该第二网络子结构的第二输出数据。
在一些实施方式中,每一第二网络子结构中包括至少一个处理层,可以将每一第二 网络子结构的输入数据作为该第二网络子结构中第一个处理层的输入数据,并且该第二网络子结构中每一处理层的输出数据可以作为下一个处理层的输入数据,该第二网络子结构中最后一个处理层的输出数据即为该第二网络子结构的第二输出数据。针对第二网络子结构中的每一处理层,可以基于该处理层的权重值对该处理层的输入数据进行处理,得到该处理层的激活值,在该第二网络子结构的激活量化标识为表征对该第二网络子结构的激活值进行量化的第一标识的情况下,可以基于设定的量化参数对该处理层的激活值进行量化,并将该量化后的激活值确定为该处理层的输出数据;在该第二网络子结构的激活量化标识为表征不对该第二网络子结构的激活值进行量化的第二标识的情况下,可以将该激活值确定为该处理层的输出数据。
步骤S204,针对每一所述第一网络子结构,基于所述第一网络子结构的第一输出数据,和所述第二网络模型中与所述第一网络子结构对应的第二网络子结构的第二输出数据,对所述第二网络子结构的参数进行调整。
步骤S205,在确定满足预设条件的情况下,将调整后的所述第二网络模型确定为第三网络模型。
这里,上述步骤S204至步骤S205分别对应于前述步骤S103至步骤S104,在实施时可以参照前述步骤S103至步骤S104的具体实施方式。
本公开实施例中,将校准数据集确定为第二网络模型中的第一个第二网络子结构的输入数据,并针对第二网络模型中的每一第二网络子结构,利用该第二网络子结构,基于该第二网络子结构的权重值和激活量化标识,对该第二网络子结构的输入数据进行处理,得到该第二网络子结构的第二输出数据,并将该第二输出数据作为下一个第二网络子结构的输入数据。这样,可以快速准确地得到每一第二网络子结构的第二输出数据。
在一些实施例中,上述步骤S203中所述的基于所述第二网络子结构的权重值和激活量化标识,对所述第二网络子结构的输入数据进行处理,得到所述第二网络子结构的第二输出数据,可以包括如下步骤S211至步骤S212:
步骤S211,基于所述第二网络子结构中权重值的取整方式和第一量化参数,对所述第二网络子结构的权重值进行量化,得到量化后的权重值。
这里,权重值的取整方式可以包括但不限于向上取整、向下取整或四舍五入等。
第一量化参数可以是用于对第二网络子结构中的权重值进行量化的量化参数,包括但不限于量化步长、量化尺度的预设精度、量化对称性、量化位宽和量化粒度等中的至少之一。在一些实施方式中,可以预先利用第二网络模型对校准数据集进行处理,得到该第二网络模型的输出数据,并通过对该输出数据进行统计得到用于对第二网络子结构中的权重值进行量化的第一量化参数。在一些实施方式中,对第二网络子结构中的权重值进行量化的第一量化参数可以是第二网络子结构的可调整参数,在对第二网络子结构的参数进行调整的过程中可以对该第二量化参数进行学习。
步骤S212,基于所述量化后的权重值和所述第二网络子结构的激活量化标识,对所述第二网络子结构的输入数据进行处理,得到所述第二网络子结构的第二输出数据。
在一些实施方式中,可以针对第二网络子结构中的每一处理层,基于该处理层的量化后的权重值对该处理层的输入数据进行处理,得到该处理层的激活值,在该第二网络子结构的激活量化标识为表征对该第二网络子结构的激活值进行量化的第一标识的情况下,可以基于设定的量化参数对该处理层的激活值进行量化,并将该量化后的激活值确定为该处理层的输出数据;在该第二网络子结构的激活量化标识为表征不对该第二网络子结构的激活值进行量化的第二标识的情况下,可以将该激活值确定为该处理层的输出数据。
上述实施例中,可以通过每一第二网络子结构中权重值的取整方式和第一量化参数 对第二网络子结构中的权重值进行量化,并基于每一第二网络子结构的激活量化标识确定是否对该第二网络子结构中的激活值进行量化,在此基础上对每一第二网络子结构的输入数据进行处理,可以得到每一第二网络子结构的第二输出数据。
在一些实施例中,在所述第二网络子结构的激活量化标识为第一标识的情况下,在对所述第二网络子结构的输入数据进行处理的过程中,所述第二网络子结构中的激活值是基于第二量化参数进行量化处理的。这里,第二量化参数可以是用于对第二网络子结构中的激活值进行量化的量化参数,包括但不限于量化步长、量化尺度的预设精度、量化对称性、量化位宽和量化粒度等中的至少之一。在一些实施方式中,可以预先利用第二网络模型对校准数据集进行处理,得到该第二网络模型的输出数据,并通过对该输出数据进行统计得到用于对第二网络子结构中的激活值进行量化的第二量化参数。在一些实施方式中,对第二网络子结构中的激活值进行量化的第二量化参数可以是第二网络子结构的可调整参数,在对第二网络子结构的参数进行调整的过程中可以对该第二量化参数进行学习。
在一些实施例中,在所述第二网络子结构的激活量化标识为第二标识的情况下,在对所述第二网络子结构的输入数据进行处理的过程中,所述第二网络子结构中的激活值未进行量化处理。这样,可以通过将第二网络模型中的至少一个第二网络子结构的激活量化标识设置为第二标识,使得在利用第二网络模型对校准数据进行处理的过程中,不对该至少一个第二网络子结构的激活值进行量化处理,以提高激活值的量化对各第二网络子结构影响的多样性,从而可以进一步增强第二网络模型学习减少激活值的量化带来的影响的能力,进而可以进一步提升模型在量化扰动下的平坦度,使得调整后的第二网络模型具有通用视角的平坦性,从而带来模型精度的提升。
本公开实施例提供一种模型量化方法,该方法可以由计算机设备的处理器执行。如图3所示,该方法包括如下步骤S301至步骤S305:
步骤S301,获取第一网络模型中至少一个第一网络子结构的第一输出数据;其中,每一所述第一输出数据是利用所述第一网络模型对校准数据集进行处理得到的。
这里,上述步骤S301对应于前述步骤S101,在实施时可以参照前述步骤S101的具体实施方式。
步骤S302,针对第二网络模型中的每一第二网络子结构,基于设定的概率分布参数,对所述第二网络子结构的激活量化标识进行随机赋值。
这里,概率分布参数可以是任意合适的能够表征第二网络子结构的激活量化标识的至少一种取值的可能性的参数,可以包括但不限于第二网络子结构的激活量化标识赋值为第一标识的概率、第二网络子结构的激活量化标识赋值为第二标识的概率等中的至少一种。基于该基于概率分布参数,可以随机地将每一第二网络子结构的激活量化标识赋值为第一标识或第二标识等。
在实施时,概率分布参数可以是预先设定的,本领域技术人员可以根据实际情况设置合适的概率分布参数,本公开实施例对此并不限定。
步骤S303,利用第二网络模型,基于所述第二网络模型中至少一个第二网络子结构的激活量化标识,对所述校准数据集进行处理,得到每一所述第二网络子结构的第二输出数据;其中,所述第二网络模型是对所述第一网络模型进行量化后得到的,每一所述第二网络子结构的激活量化标识表征是否对所述第二网络子结构的激活值进行量化。
步骤S304,针对每一所述第一网络子结构,基于所述第一网络子结构的第一输出数据,和所述第二网络模型中与所述第一网络子结构对应的第二网络子结构的第二输出数据,对所述第二网络子结构的参数进行调整。
步骤S305,在确定满足预设条件的情况下,将调整后的所述第二网络模型确定为 第三网络模型。
这里,上述步骤S303至步骤S305分别对应于前述步骤S102至步骤S104,在实施时可以参照前述步骤S102至步骤S104的具体实施方式。
在一些实施例中,上述步骤S302中所述的基于设定的概率分布参数,对所述第二网络子结构的激活量化标识进行随机赋值,可以包括步骤S311和步骤S312中的至少之一:
步骤S311,在所述概率分布参数包括量化概率的情况下,基于所述量化概率,对所述第二网络子结构的激活量化标识进行随机赋值;其中,所述量化概率表征所述第二网络子结构的激活量化标识赋值为第一标识的概率,所述第一标识表征对对应的第二网络子结构的激活值进行量化。
这里,按照量化概率,可以将第二网络子结构的激活量化标识随机赋值为第一标识或第二标识等。在实施时,本领域技术人员可以预先根据实际情况设定合适的量化概率,例如,量化概率可以是0、0.25、0.5、0.75或1等。
例如,在量化概率为p的情况下,可以基于该量化概率p,将第二网络子结构的激活量化标识随机赋值为第一标识或第二标识,其中,将第二网络子结构的激活量化标识赋值为第一标识的概率为该量化概率p,将第二网络子结构的激活量化标识赋值为第二标识的概率为1-p。
步骤S312,在所述概率分布参数包括量化失活概率的情况下,基于所述量化失活概率,对所述第二网络子结构的激活量化标识进行随机赋值;其中,所述量化失活概率表征所述第二网络子结构的激活量化标识赋值为第二标识的概率,所述第二标识表征不对对应的第二网络子结构的激活值进行量化。
这里,按照量失活概率可以将第二网络子结构的激活量化标识随机赋值为第一标识或第二标识。在实施时,本领域技术人员可以预先根据实际情况设定合适的量化失活概率。例如,量化失活概率可以是0、0.25、0.5、0.75或1等。
例如,在量化失活概率为q的情况下,可以基于该量化失活概率q,将第二网络子结构的激活量化标识随机赋值为第一标识或第二标识,其中,将第二网络子结构的激活量化标识赋值为第一标识的概率为该1-q,将第二网络子结构的激活量化标识赋值为第二标识的概率为该量化失活概率q。
需要说明的是,可以为第二网络模型中的各第二网络子结构设定相同的量化概率和/或量化失活概率,也可以为第二网络模型中的各第二网络子结构分别设置不同的量化概率和/或量化失活概率,本公开实施例对此并不限定。
上述实施例中,基于设定的概率分布参数,对第二网络模型中的每一第二网络子结构的激活量化标识进行随机赋值,使得每一第二网络子结构的激活量化标识按照该概率分布参数被随机赋值为第一标识或第二标识。这样,可以增加对各第二网络子结构中的激活值进行量化的随机性,也即增加了激活值的量化对量化损失影响的多样性,从而可以进一步增强第二网络模型学习减少激活值的量化带来的影响的能力,进而可以进一步提升模型在量化扰动下的平坦度,带来模型精度的进一步提升。
本公开实施例提供一种模型量化方法,该方法可以由计算机设备的处理器执行。如图4所示,该方法包括如下步骤S401至步骤S404:
步骤S401,获取第一网络模型中至少一个第一网络子结构的第一输出数据;其中,每一所述第一输出数据是利用所述第一网络模型对校准数据集进行处理得到的。
步骤S402,利用第二网络模型,基于所述第二网络模型中至少一个第二网络子结构的激活量化标识,对所述校准数据集进行处理,得到每一所述第二网络子结构的第二输出数据;其中,所述第二网络模型是对所述第一网络模型进行量化后得到的,每一所 述第二网络子结构的激活量化标识表征是否对所述第二网络子结构的激活值进行量化。
这里,上述步骤S401至步骤S402分别对应于前述步骤S101至步骤S102,在实施时可以参照前述步骤S101至步骤S102的具体实施方式。
步骤S403,针对每一所述第一网络子结构,基于所述第一网络子结构的第一输出数据,和所述第二网络模型中与所述第一网络子结构对应的第二网络子结构的第二输出数据,确定所述第二网络子结构的损失值,并基于所述损失值,对所述第二网络子结构的参数进行调整。
这里,每一第二网络子结构的损失值可以是调整后的第二网络模型中该第二网络子结构的第二输出数据与对应的第一网络子结构的第一输出数据之间的损失值。通过比较第二网络子结构的第二输出数据与对应的第一网络子结构的第一输出数据,可以得到该第二网络子结构的损失值。
在一些实施方式中,校准数据集中包括至少一个目标样本,第一输出数据中可以包括分别对每一目标样本进行处理得到的第一子数据,第二输出数据中可以包括分别对每一目标样本进行处理得到的第二子数据;针对每一目标样本,可以基于该目标样本对应的第一子数据与第二子数据之间的相似度,可以确定该第二网络子结构对该目标样本进行处理的损失值;基于该第二网络子结构对各目标样本进行处理的损失值之间的均方误差,可以确定该第二网络子结构的损失值。
在一些实施方式中,可以在第二网络子结构的损失值小于设定的第三损失阈值的情况下,对该第二网络子结构的参数进行调整。
在一些实施方式中,可以在确定第二网络子结构的损失值的变化收敛的情况下,对该第二网络子结构的参数进行调整。
步骤S404,在确定满足预设条件的情况下,将调整后的所述第二网络模型确定为第三网络模型。
这里,上述步骤S404对应于前述步骤S104,在实施时可以参照前述步骤S104的具体实施方式。
本公开实施例中,针对每一第一网络子结构,基于该第一网络子结构的第一输出数据,和第二网络模型中与该第一网络子结构对应的第二网络子结构的第二输出数据,确定该第二网络子结构的损失值,并基于该损失值,对该第二网络子结构的参数进行调整,这样,可以通过对该第二网络子结构的参数进行调整来保持第二网络子结构与对应的第一网络子结构之间输出的一致性,从而可以有效提高量化后模型的精度。
在一些实施例中,所述第二网络子结构的参数包括对所述第二网络子结构中的权重值进行量化所采用的取整方式;上述步骤S403中所述的基于所述损失值,对所述第二网络子结构的参数进行调整,包括:
步骤S411,基于所述损失值,更新对所述第二网络子结构中的权重值进行量化所采用的取整方式,以在对所述校准数据集进行处理的过程中,利用对应于所述取整方式的量化函数对所述第二网络子结构中的权重值进行量化。
这里,基于对第二网络子结构中的各权重值进行量化所采用的取整方式,可以确定相应的量化函数。在对校准数据集进行处理的过程中,可以利用该量化函数将该第二网络子结构中的权重值量化为特定的比特位宽。在实施时,可以根据实际情况采用合适的优化算法更新对第二网络子结构中的权重值进行量化所采用的取整方式,如梯度下降法、模拟退火法等,本公开实施例对此并不限定。
上述实施例中,基于第二网络子结构的损失值,更新对该第二网络子结构中的权重值进行量化所采用的取整方式。这样,通过对权重值量化的取整方式进行更新可以实现对量化后的第二网络模型的微调,从而可以进一步提高调整后的第二网络模型(即第三 网络模型)的精度。
相关技术中的模型量化方法主要分为两类:量化感知训练和训练后量化。量化感知训练要求完整的模型训练流程,需要大量的样本数据和大量的GPU算力进行端到端的模型训练。而训练后量化可以基于少量样本数据和少量的GPU算力得到量化模型。相关技术中的训练后量化方案,可以对预训练后的全精度网络模型中的权重值进行取整量化,并通过将权重值的量化建模为噪声,并构建量化优化目标,来学习对权重值进行量化所采用的取整方式(如向上取整或向下取整)。但是,相关技术中的训练后量化方案对于2比特、3比特等低比特位宽的量化以及相对分类任务更复杂的检测任务中表现不够好,得到的量化模型精度不够高。
发明人在实施本公开的过程中,经过研究发现,相关技术中的训练后量化方案仅将权重值的量化建模为噪声,并通过调整对权重值进行量化所采用的取整方式,重构每个块结构/处理层的输出来微调量化后的权重值(即权重微调),而对激活值进行量化的量化参数(如步长等)是在权重微调之后确定,在权重微调期间没有考虑对激活值的量化。也就是说,相关技术中的权重微调与激活值的量化之间是独立的,例如,在将全精度网络模型的激活值量化为2比特位宽的情况下与量化为3比特位宽的情况下,获得的量化后的模型中的权重值是相同。
有鉴于此,本公开实施例提供一种模型量化方法,该方法在权重微调期间引入对激活值的量化,可以有效提升量化后模型的性能,并能使得量化后的模型在部分方向上更加平坦(也即在权重的量化扰动下得到的量化后的模型的量化损失变化相对较小)。在一些实施例中,可以通过随机失活权重微调期间对激活值的量化,也即随机确定是否在权重微调期间对待调整的模型(对应前述实施例中的第二网络模型)中的激活值进行量化,可以提高激活值的量化对各第二网络子结构影响的多样性,使得量化并调整后的模型(对应前述实施例中的第三网络模型)具有通用视角的平坦性,从而带来精度的提升。
下面分别说明本公开实施例提供的模型量化方法中权重微调期间激活值的量化对权重微调的影响、以及权重微调期间对激活值进行量化的方式。
1)激活值的量化对权重微调的影响:
为了研究权重微调期间激活值的量化对权重微调的影响,发明人在ImageNet数据集上进行了初步实验。该实验中,将待量化的全精度网络模型(对应第一网络模型)的至少一个处理层视为一个块结构(例如将瓶颈层作为一个块结构),并对每个块结构中量化权重值所采用的取整方式进行调整,以调整量化后的权重。
以第一网络模型中包括K个块结构为例,在对第k个块结构中量化权重值所采用的取整方式进行调整之前,考虑模型推理过程中是否分别对从第1个块结构到第k个块结构中的激活值进行量化,并对比了如下三种方案中量化后的并调整后的模型的精度:
方案1:如图5A所示,在对第k个块结构中量化权重值所采用的取整方式进行调整之前,对第1个块结构至第k个块结构中的激活值都不进行量化;
方案2:如图5B所示,在对第k个块结构中量化权重值所采用的取整方式进行调整之前,对第1个块结构至第k个块结构中的激活值都进行量化;
方案3:如图5C所示,在对第k个块结构中量化权重值所采用的取整方式进行调整之前,对第1个块结构至第k-1个块结构中的激活值都进行量化,对第k个块结构中的激活值不进行量化。
其中,K为大于1的正整数,k大于1且不超过K。
在实验中,对第一网络模型进行量化处理可以得到第二网络模型,并采用上述方案1至方案3分别对第二网络模型中量化每一块结构中的权重值所采用的取整方式进行调整,得到的相应的第三网络模型的性能如表1所示,其中,对于采用ResNet-18的第一 网络模型,基于方案1、方案2、方案3分别得到的第三网络模型的精度分别为18.88、45.74、48.07;对于采用ResNet-50的第一网络模型,基于方案1、方案2、方案3分别得到的第三网络模型的精度分别为4.34、46.98、49.07;对于采用MobileNetV2的第一网络模型,基于方案1、方案2、方案3分别得到的第三网络模型的精度分别为5.83、50.71、51.20;对于采用RegNet-600MF的第一网络模型,基于方案1、方案2、方案3分别得到的第三网络模型的精度分别为42.77、60.94、62.07;对于采用MnasNet的第一网络模型,基于方案1、方案2、方案3分别得到的第三网络模型的精度分别为26.62、58.79、60.19。其中,对于采用ResNet-18、ResNet-50的第一网络模型,计算的是对权重值和激活值均按照2比特位宽进行量化的结果,对于采用RegNet-600MF、MobileNetV2、MnasNet的第一网络模型,计算的是对权重值和激活值均按照3比特位宽进行量化的结果。
表1采用方案1至方案3分别得到的第三网络模型的精度示例表
神经网络结构 方案1 方案2 方案3
ResNet-18 18.88 45.74 48.07
ResNet-50 4.34 46.98 49.07
MobileNetV2 5.83 50.71 51.20
RegNet-600MF 42.77 60.94 62.07
MasNet 26.62 58.79 60.19
在上述实验中,对于极低比特的量化(例如,对权重值和激活值均按照2或3比特位宽进行量化),采用方案2和方案3得到的第三网络模型的精度远超过方案1得到的第三网络模型的精度,可见,在权重微调期间考虑激活值的量化可以使得权重的微调能够学习减少激活值的量化带来的影响,从而能够带来巨大的性能提升。此外,采用方案3得到的第三网络模型的精度高于方案2得到的第三网络模型的精度,可见,在权重微调期间部分引入激活值的量化比全引入效果更好。
为了进一步探索激活量化将如何影响权重微调,可以通过同时将权重值的量化和激活值的量化模拟成噪声来分析最终损失目标。
通常量化噪声可以通过
Figure PCTCN2022125817-appb-000001
来表示,其中,
Figure PCTCN2022125817-appb-000002
为量化后的参数、a为全精度的参数,但是,这种方式表示的噪声的范围会受到参数范围或量化步长的影响。为了消除这种影响,可以将激活值的量化模拟成的激活量化噪声u与量化后的激活值
Figure PCTCN2022125817-appb-000003
全精度的激活值a之间的关系通过乘法形式表示:
Figure PCTCN2022125817-appb-000004
这在形式上和加法是等价的。这样,模型量化的优化目标可以如下公式1所示:
Figure PCTCN2022125817-appb-000005
其中,x为待进行权重微调的第二网络模型中块结构的输入数据,是从校准数据集D c中采样得到的,u(x)表示输入数据为x时对激活值进行量化引入的激活量化噪声,w为该块结构中全精度的权重值,
Figure PCTCN2022125817-appb-000006
为该块结构中量化后的权重值,L(w,x,1)为对应于第二网络模型的全精度网络模型中该块结构的输出数据,
Figure PCTCN2022125817-appb-000007
为第二网络模型中对激活值进行量化后(即引入的激活量化噪声后)的输出数据,
Figure PCTCN2022125817-appb-000008
表示均方误差函数。该优化目标是需要优化对第二网络模型中各块结构的权重值进行量化所采用的取整方式,以找到一个
Figure PCTCN2022125817-appb-000009
使得
Figure PCTCN2022125817-appb-000010
最小。
相比于权重微调阶段不考虑激活值的量化的方案,该优化目标中增加了与激活值的量化相关的项。为了便于分析,这里将激活量化噪声u(x)转移到权重量化噪声上,转移后的权重量化噪声可以表示为1+v(x),
Figure PCTCN2022125817-appb-000011
可以转换为如下 公式2所示的形式:
Figure PCTCN2022125817-appb-000012
对上述公式2进行进一步形式变换,可以将
Figure PCTCN2022125817-appb-000013
转换如下公式3所示的形式:
Figure PCTCN2022125817-appb-000014
其中,(3-1)项是权重微调阶段不考虑激活值的量化的方案中的优化目标,(3-2)是引入激活值的量化之后带来的新的一项。通过优化(3-2)这一项可以提升量化后权重值的平坦性,即量化后的权重值在扰动1+v(x)的作用下带来的网络损失的改变比较小。也就是说,在输入数据x上,对于激活量化噪声u(x),存在相应的权重量化噪声v(x),使得量化后的模型在权重量化噪声v(x)的扰动下更加平坦。例如,前述方案2和方案3在权重微调阶段都引入了对激活值的量化,这样可以额外优化公式3中的(3-2)这一项,得到更平坦的量化后的模型。
针对上述公式3的形式表示的模型量化的优化目标,由于训练后量化(即离线量化)对校准数据非常敏感,而在实际应用场景中,不仅需要在校准数据集上得到更平坦的量化模型,更需要在测试数据集上使得模型量化的优化目标的损失也尽可能小。在实施时,可以通过多样化权重量化噪声v(x)来多样化模型平坦性的方向,以满足在测试数据集上要求的模型平坦性,从而使得量化后的模型在测试数据集的表现也比较好。
基于上述方案1至方案3的实验结果的启发,由于方案3中在部分块结构中未对激活值进行量化,也即将部分激活量化噪声u(x)置0,从而稍稍改变了权重量化噪声v(x),使得该量化后的模型的平坦性不局限于校准数据集。因而,在一些实施方式中,可以通过随机丢弃部分激活量化噪声u(x)来修改权重量化噪声v(x),从而在权重微调阶段引入不同的权重量化噪声v(x)。尤其是对于每次前向传播的时候,通过采用随机丢弃部分激活量化噪声u(x)的方式(也即随机确定是否在权重微调期间对各块结构的激活值进行量化),可以有效提高权重量化噪声v(x)的多样性,从而可以进一步提升量化后的模型在测试数据集的表现,并使得量化后的模型具有通用视角下的平坦性,还能进一步将离线量化拓展到2比特位宽的量化中。
本公开实施例提供的模型量化方法,相较于相关技术中的模型量化方法,在ImageNet分类任务上,可以在采用4比特量化位宽的情况下能达到3%的精度提升,尤其是一些更适合移动端的轻量化网络模型,在采用2比特量化位宽的情况下甚至能达到51.49%的精度提升;在MS COCO数据集上,对于检测任务,对双阶段Faster RCNN模型在4比特量化位宽下进行量化得到的模型与全精度浮点模型的精度相近,对单阶段RetinaNet在权重量化位宽为2比特、激活量化位宽为4比特的情况下,甚至能达到6.5平均精度均值(Mean Average Precision,mAP)的增长。
基于本公开实施例提供的模型量化方法可以应用在例如大模型的部署场景、边缘设备上的模型应用场景、分类任务场景、检测任务场景等多种场景中,且能适用于不同的任务、不同的模型以及不同的量化比特位宽。基于本公开实施例提供的模型量化方法进行模型量化,一方面可以使得量化后的模型满足推理速度需求的同时也能具有较高的精度;另一方面可以使得量化后的模型满足低功耗的需求。
图6为本公开实施例提供的一种模型量化装置的组成结构示意图,如图6所示,模型量化装置600包括:第一获取部分610、处理部分620、调整部分630和第一确定部 分640,其中:
第一获取部分610,被配置为获取第一网络模型中至少一个第一网络子结构的第一输出数据;其中,每一所述第一输出数据是利用所述第一网络模型对校准数据集进行处理得到的;
处理部分620,被配置为利用第二网络模型,基于所述第二网络模型中至少一个第二网络子结构的激活量化标识,对所述校准数据集进行处理,得到每一所述第二网络子结构的第二输出数据;其中,所述第二网络模型是对所述第一网络模型进行量化后得到的,每一所述第二网络子结构的激活量化标识表征是否对所述第二网络子结构的激活值进行量化;
调整部分630,被配置为针对每一所述第一网络子结构,基于所述第一网络子结构的第一输出数据,和所述第二网络模型中与所述第一网络子结构对应的第二网络子结构的第二输出数据,对所述第二网络子结构的参数进行调整;
第一确定部分640,被配置为在确定满足预设条件的情况下,将调整后的所述第二网络模型确定为第三网络模型。
在一些实施例中,所述处理部分还被配置为:将所述校准数据集确定为所述第二网络模型中的第一个第二网络子结构的输入数据;针对所述第二网络模型中的每一第二网络子结构,利用所述第二网络子结构,基于所述第二网络子结构的权重值和激活量化标识,对所述第二网络子结构的输入数据进行处理,得到所述第二网络子结构的第二输出数据,并将所述第二输出数据作为下一个第二网络子结构的输入数据。
在一些实施例中,所述处理部分还被配置为:基于所述第二网络子结构中权重值的取整方式和第一量化参数,对所述第二网络子结构的权重值进行量化,得到量化后的权重值;基于所述量化后的权重值和所述第二网络子结构的激活量化标识,对所述第二网络子结构的输入数据进行处理,得到所述第二网络子结构的第二输出数据。
在一些实施例中,在所述第二网络子结构的激活量化标识为第一标识的情况下,在对所述第二网络子结构的输入数据进行处理的过程中,所述第二网络子结构中的激活值是基于第二量化参数进行量化处理的;和/或,在所述第二网络子结构的激活量化标识为第二标识的情况下,在对所述第二网络子结构的输入数据进行处理的过程中,所述第二网络子结构中的激活值未进行量化处理。
在一些实施例中,所述装置还包括:第二获取部分,被配置为针对所述第二网络模型中的每一第二网络子结构,基于设定的概率分布参数,对所述第二网络子结构的激活量化标识进行随机赋值。
在一些实施例中,所述第二获取部分还被配置为:在概率分布参数包括量化概率的情况下,基于所述量化概率,对所述第二网络子结构的激活量化标识进行随机赋值,所述量化概率表征所述第二网络子结构的激活量化标识赋值为第一标识的概率,所述第一标识表征对对应的第二网络子结构的激活值进行量化;和/或,在所述概率分布参数包括量化失活概率的情况下,基于所述量化失活概率,对所述第二网络子结构的激活量化标识进行随机赋值;其中,所述量化失活概率表征所述第二网络子结构的激活量化标识赋值为第二标识的概率,所述第二标识表征不对对应的第二网络子结构的激活值进行量化。
在一些实施例中,所述调整部分还被配置为:基于所述第一输出数据和所述第二输出数据,确定所述第二网络子结构的损失值;基于所述损失值,对所述第二网络子结构的参数进行调整。
在一些实施例中,所述第二网络子结构的参数包括对所述第二网络子结构中的权重值进行量化所采用的取整方式;所述调整部分还被配置为:基于所述损失值,更新对所述第二网络子结构中的权重值进行量化所采用的取整方式,以在对所述校准数据集进行 处理的过程中,利用对应于所述取整方式的量化函数对所述第二网络子结构中的权重值进行量化。
在一些实施例中,所述第一确定部分还被配置为以下至少之一:在基于每一所述第一输出数据和每一所述第二输出数据,确定每一所述第二网络子结构的损失值满足预设的损失约束的情况下,确定满足所述预设条件;在对每一所述第二网络子结构进行调整的次数满足预设的次数约束的情况下,确定满足所述预设条件。
在一些实施例中,所述装置还包括:量化部分,被配置为对所述第一网络模型中的每一第一网络子结构,按照至少一种位宽进行量化,得到所述第二网络模型;其中,所述第二网络模型中包括分别对应于每一所述第一网络子结构的至少一个第二网络子结构,每一所述第二网络子结构包括以下之一:阶段结构、块结构、处理层。
以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本公开装置实施例中未披露的技术细节,请参照本公开方法实施例的描述而理解。
需要说明的是,在本公开实施例以及其他的实施例中,“部分”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是单元,还可以是模块也可以是非模块化的。
本公开实施例中,如果以软件功能模块的形式实现上述的模型量化方法,并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本公开各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本公开实施例不限制于任何特定的硬件和软件结合。
本公开实施例提供一种计算机设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述方法中的步骤。
本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述方法中的步骤。所述计算机可读存储介质可以是瞬时性的,也可以是非瞬时性的。
本公开实施例提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序被计算机读取并执行时,实现上述方法中的部分或全部步骤。该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一些实施例中,计算机程序产品体现为计算机存储介质,在一些实施例中,计算机程序产品体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
这里需要指出的是:以上存储介质、计算机程序产品和设备实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本公开存储介质、计算机程序产品和设备实施例中未披露的技术细节,请参照本公开方法实施例的描述而理解。
需要说明的是,图7为本公开实施例中计算机设备的一种硬件实体示意图,如图7所示,该计算机设备700的硬件实体包括:处理器701、通信接口702和存储器703,其中:
处理器701通常控制计算机设备700的总体操作。
通信接口702可以使计算机设备通过网络与其他终端或服务器通信。
存储器703配置为存储由处理器701可执行的指令和应用,还可以缓存待处理器701以及计算机设备700中各模块待处理或已经处理的数据(例如,图像数据、音频数据、语音通信数据和视频通信数据),可以通过闪存(FLASH)或随机访问存储器(Random Access Memory,RAM)实现。处理器701、通信接口702和存储器703之间可以通过总线704进行数据传输。
应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本公开的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本公开的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本公开实施例的实施过程构成任何限定。上述本公开实施例序号仅仅为了描述,不代表实施例的优劣。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
在本公开所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本公开各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本公开上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本公开各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本公开的实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。
工业实用性
本公开实施例公开了一种模型量化方法、装置、设备、存储介质及计算机程序产品,其中,所述方法包括:获取第一网络模型对校准数据集进行处理后第一网络模型中至少一个第一网络子结构的第一输出数据;利用对第一网络模型进行量化后得到的第二网络模型,基于第二网络模型中至少一个第二网络子结构的激活量化标识,对校准数据集进行处理,得到每一第二网络子结构的第二输出数据;针对每一第一网络子结构,基于该第一网络子结构的第一输出数据,和与该第一网络子结构对应的第二网络子结构的第二输出数据,对该第二网络子结构的参数进行调整;在确定满足预设条件的情况下,将调整后的第二网络模型确定为第三网络模型。根据本公开实施例,能够提高量化后模型的精度,并能提升模型在量化扰动下的平坦度,使得采用不同比特位宽进行量化的模型均能达到较高的精度,进而可以满足不同任务以及不同部署设备上的模型量化需求,提高模型量化应用的广泛性。

Claims (14)

  1. 一种模型量化方法,所述方法包括:
    获取第一网络模型中至少一个第一网络子结构的第一输出数据;其中,每一所述第一输出数据是利用所述第一网络模型对校准数据集进行处理得到的;
    利用第二网络模型,基于所述第二网络模型中至少一个第二网络子结构的激活量化标识,对所述校准数据集进行处理,得到每一所述第二网络子结构的第二输出数据;其中,所述第二网络模型是对所述第一网络模型进行量化后得到的,每一所述第二网络子结构的激活量化标识表征是否对所述第二网络子结构的激活值进行量化;
    针对每一所述第一网络子结构,基于所述第一网络子结构的第一输出数据,和所述第二网络模型中与所述第一网络子结构对应的第二网络子结构的第二输出数据,对所述第二网络子结构的参数进行调整;
    在确定满足预设条件的情况下,将调整后的所述第二网络模型确定为第三网络模型。
  2. 根据权利要求1所述的方法,其中,所述利用第二网络模型,基于所述第二网络模型中至少一个第二网络子结构的激活量化标识,对所述校准数据集进行处理,得到每一所述第二网络子结构的第二输出数据,包括:
    将所述校准数据集确定为所述第二网络模型中的第一个第二网络子结构的输入数据;
    针对所述第二网络模型中的每一第二网络子结构,利用所述第二网络子结构,基于所述第二网络子结构的权重值和激活量化标识,对所述第二网络子结构的输入数据进行处理,得到所述第二网络子结构的第二输出数据,并将所述第二输出数据作为下一个第二网络子结构的输入数据。
  3. 根据权利要求2所述的方法,其中,所述基于所述第二网络子结构的权重值和激活量化标识,对所述第二网络子结构的输入数据进行处理,得到所述第二网络子结构的第二输出数据,包括:
    基于所述第二网络子结构中权重值的取整方式和第一量化参数,对所述第二网络子结构的权重值进行量化,得到量化后的权重值;
    基于所述量化后的权重值和所述第二网络子结构的激活量化标识,对所述第二网络子结构的输入数据进行处理,得到所述第二网络子结构的第二输出数据。
  4. 根据权利要求3所述的方法,其中,
    在所述第二网络子结构的激活量化标识为第一标识的情况下,在对所述第二网络子结构的输入数据进行处理的过程中,所述第二网络子结构中的激活值是基于第二量化参数进行量化处理的;
    和/或,在所述第二网络子结构的激活量化标识为第二标识的情况下,在对所述第二网络子结构的输入数据进行处理的过程中,所述第二网络子结构中的激活值未进行量化处理。
  5. 根据权利要求1至4中任一项所述的方法,其中,所述方法还包括:
    针对所述第二网络模型中的每一第二网络子结构,基于设定的概率分布参数,对所述第二网络子结构的激活量化标识进行随机赋值。
  6. 根据权利要求5所述的方法,其中,所述基于设定的概率分布参数,对所述第二网络子结构的激活量化标识进行随机赋值,包括以下至少之一:
    在所述概率分布参数包括量化概率的情况下,基于所述量化概率,对所述第二网络子结构的激活量化标识进行随机赋值;其中,所述量化概率表征所述第二网络子结构 的激活量化标识赋值为第一标识的概率,所述第一标识表征对对应的第二网络子结构的激活值进行量化;
    在所述概率分布参数包括量化失活概率的情况下,基于所述量化失活概率,对所述第二网络子结构的激活量化标识进行随机赋值;其中,所述量化失活概率表征所述第二网络子结构的激活量化标识赋值为第二标识的概率,所述第二标识表征不对对应的第二网络子结构的激活值进行量化。
  7. 根据权利要求1至6中任一项所述的方法,其中,所述基于所述第一网络子结构的第一输出数据,和所述第二网络模型中与所述第一网络子结构对应的第二网络子结构的第二输出数据,对所述第二网络子结构的参数进行调整,包括:
    基于所述第一输出数据和所述第二输出数据,确定所述第二网络子结构的损失值;
    基于所述损失值,对所述第二网络子结构的参数进行调整。
  8. 根据权利要求7所述的方法,其中,所述第二网络子结构的参数包括对所述第二网络子结构中的权重值进行量化所采用的取整方式;
    所述基于所述损失值,对所述第二网络子结构的参数进行调整,包括:
    基于所述损失值,更新对所述第二网络子结构中的权重值进行量化所采用的取整方式,以在对所述校准数据集进行处理的过程中,利用对应于所述取整方式的量化函数对所述第二网络子结构中的权重值进行量化。
  9. 根据权利要求1至8中任一项所述的方法,其中,所述确定满足预设条件,包括以下至少之一:
    在基于每一所述第一输出数据和每一所述第二输出数据,确定每一所述第二网络子结构的损失值满足预设的损失约束的情况下,确定满足所述预设条件;
    在对每一所述第二网络子结构进行调整的次数满足预设的次数约束的情况下,确定满足所述预设条件。
  10. 根据权利要求1至9中任一项所述的方法,其中,所述方法还包括:
    对所述第一网络模型中的每一第一网络子结构,按照至少一种位宽进行量化,得到所述第二网络模型;其中,所述第二网络模型中包括分别对应于每一所述第一网络子结构的至少一个第二网络子结构,每一所述第二网络子结构包括以下之一:阶段结构、块结构、处理层。
  11. 一种模型量化装置,包括:
    第一获取部分,被配置为获取第一网络模型中至少一个第一网络子结构的第一输出数据;其中,每一所述第一输出数据是利用所述第一网络模型对校准数据集进行处理得到的;
    处理部分,被配置为利用第二网络模型,基于所述第二网络模型中至少一个第二网络子结构的激活量化标识,对所述校准数据集进行处理,得到每一所述第二网络子结构的第二输出数据;其中,所述第二网络模型是对所述第一网络模型进行量化后得到的,每一所述第二网络子结构的激活量化标识表征是否对所述第二网络子结构的激活值进行量化;
    调整部分,被配置为针对每一所述第一网络子结构,基于所述第一网络子结构的第一输出数据,和所述第二网络模型中与所述第一网络子结构对应的第二网络子结构的第二输出数据,对所述第二网络子结构的参数进行调整;
    第一确定部分,被配置为在确定满足预设条件的情况下,将调整后的所述第二网络模型确定为第三网络模型。
  12. 一种计算机设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至10中任一项所述方法 中的步骤。
  13. 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现权利要求1至10中任一项所述方法中的步骤。
  14. 一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序被计算机读取并执行时,实现权利要求1至10中任一项所述方法中的步骤。
PCT/CN2022/125817 2022-03-04 2022-10-18 模型量化方法、装置、设备、存储介质及程序产品 WO2023165139A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210208524.4 2022-03-04
CN202210208524.4A CN114580281A (zh) 2022-03-04 2022-03-04 模型量化方法、装置、设备、存储介质及程序产品

Publications (1)

Publication Number Publication Date
WO2023165139A1 true WO2023165139A1 (zh) 2023-09-07

Family

ID=81778969

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/125817 WO2023165139A1 (zh) 2022-03-04 2022-10-18 模型量化方法、装置、设备、存储介质及程序产品

Country Status (2)

Country Link
CN (1) CN114580281A (zh)
WO (1) WO2023165139A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117077740A (zh) * 2023-09-25 2023-11-17 荣耀终端有限公司 模型量化方法和设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580281A (zh) * 2022-03-04 2022-06-03 北京市商汤科技开发有限公司 模型量化方法、装置、设备、存储介质及程序产品
CN115238893B (zh) * 2022-09-23 2023-01-17 北京航空航天大学 面向自然语言处理的神经网络模型量化方法和装置
CN115829035B (zh) * 2022-12-29 2023-12-08 苏州市欧冶半导体有限公司 一种分布式量化方法、系统及终端设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488985A (zh) * 2020-04-08 2020-08-04 华南理工大学 深度神经网络模型压缩训练方法、装置、设备、介质
CN112580805A (zh) * 2020-12-25 2021-03-30 三星(中国)半导体有限公司 神经网络模型的量化方法和量化神经网络模型的装置
CN113780551A (zh) * 2021-09-03 2021-12-10 北京市商汤科技开发有限公司 模型量化方法、装置、设备、存储介质及计算机程序产品
CN114118384A (zh) * 2021-12-09 2022-03-01 安谋科技(中国)有限公司 神经网络模型的量化方法、可读介质和电子设备
CN114580281A (zh) * 2022-03-04 2022-06-03 北京市商汤科技开发有限公司 模型量化方法、装置、设备、存储介质及程序产品

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488985A (zh) * 2020-04-08 2020-08-04 华南理工大学 深度神经网络模型压缩训练方法、装置、设备、介质
CN112580805A (zh) * 2020-12-25 2021-03-30 三星(中国)半导体有限公司 神经网络模型的量化方法和量化神经网络模型的装置
CN113780551A (zh) * 2021-09-03 2021-12-10 北京市商汤科技开发有限公司 模型量化方法、装置、设备、存储介质及计算机程序产品
CN114118384A (zh) * 2021-12-09 2022-03-01 安谋科技(中国)有限公司 神经网络模型的量化方法、可读介质和电子设备
CN114580281A (zh) * 2022-03-04 2022-06-03 北京市商汤科技开发有限公司 模型量化方法、装置、设备、存储介质及程序产品

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117077740A (zh) * 2023-09-25 2023-11-17 荣耀终端有限公司 模型量化方法和设备
CN117077740B (zh) * 2023-09-25 2024-03-12 荣耀终端有限公司 模型量化方法和设备

Also Published As

Publication number Publication date
CN114580281A (zh) 2022-06-03

Similar Documents

Publication Publication Date Title
WO2023165139A1 (zh) 模型量化方法、装置、设备、存储介质及程序产品
CN108510067B (zh) 基于工程化实现的卷积神经网络量化方法
CN110210560B (zh) 分类网络的增量训练方法、分类方法及装置、设备及介质
Nguyen et al. Unbiased feature selection in learning random forests for high-dimensional data
WO2022042123A1 (zh) 图像识别模型生成方法、装置、计算机设备和存储介质
CN110364185B (zh) 一种基于语音数据的情绪识别方法、终端设备及介质
US20200082213A1 (en) Sample processing method and device
CN109840283B (zh) 一种基于传递关系的本地自适应知识图谱优化方法
CN109344893B (zh) 一种基于移动终端的图像分类方法
WO2021077744A1 (zh) 一种图像分类方法、装置、设备及计算机可读存储介质
WO2022095379A1 (zh) 数据降维处理方法、装置、计算机设备及存储介质
CN115578248B (zh) 一种基于风格引导的泛化增强图像分类算法
CN110705708A (zh) 卷积神经网络模型的压缩方法、装置及计算机存储介质
CN113434750B (zh) 神经网络搜索方法、装置、设备及存储介质
US10769517B2 (en) Neural network analysis
CN114863088A (zh) 一种面向长尾目标检测的分类对数归一化方法
CN114417095A (zh) 一种数据集划分方法及装置
CN113011532A (zh) 分类模型训练方法、装置、计算设备及存储介质
CN115774854B (zh) 一种文本分类方法、装置、电子设备和存储介质
CN113177627B (zh) 优化系统、重新训练系统及其方法及处理器和可读介质
WO2021244203A1 (zh) 参数优化的方法、电子设备和存储介质
US20200320393A1 (en) Data processing method and data processing device
CN114610953A (zh) 一种数据分类方法、装置、设备及存储介质
CN114139678A (zh) 卷积神经网络量化方法、装置、电子设备和存储介质
CN113887709A (zh) 神经网络自适应量化方法、装置、设备、介质和产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22929568

Country of ref document: EP

Kind code of ref document: A1