WO2023050707A1 - Network model quantization method and apparatus, and computer device and storage medium - Google Patents

Network model quantization method and apparatus, and computer device and storage medium Download PDF

Info

Publication number
WO2023050707A1
WO2023050707A1 PCT/CN2022/078256 CN2022078256W WO2023050707A1 WO 2023050707 A1 WO2023050707 A1 WO 2023050707A1 CN 2022078256 W CN2022078256 W CN 2022078256W WO 2023050707 A1 WO2023050707 A1 WO 2023050707A1
Authority
WO
WIPO (PCT)
Prior art keywords
network model
initial
model
result
preprocessing
Prior art date
Application number
PCT/CN2022/078256
Other languages
French (fr)
Chinese (zh)
Inventor
梁玲燕
董刚
赵雅倩
温东超
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023050707A1 publication Critical patent/WO2023050707A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present application relates to the field of artificial intelligence, in particular to a network model quantification method, device, computer equipment and storage medium.
  • neural network models are usually deployed on some terminal devices or edge devices. These devices generally have low computing power, and memory and power consumption are also limited. Therefore, how to reduce the large deep neural network model and realize the real deployment of the deep neural network model on the terminal has become an urgent problem to be solved while ensuring the accuracy of the model.
  • model compression methods such as quantization and cropping are usually used to reduce the size of the deep neural network model, thereby reducing the size of the large deep neural network model.
  • the embodiment of the present application provides a network model quantization method, device, computer equipment, and storage medium to solve the problem of shrinking large-scale deep neural network models through model compression such as quantization and cropping.
  • the accuracy of deep neural network models is relatively low. low problem.
  • the embodiment of the present application provides a network model quantification method, the method includes: obtaining the network model to be processed, the network model to be processed is a pre-trained full-precision network model, and the network model to be processed is Quantize the weight parameter and activation output respectively to obtain the initial weight parameter and the initial quantization parameter of the activation output, and construct the initial network model based on the initial weight parameter and the initial quantization parameter of the activation output; obtain the first calibration network model, the first calibration network The accuracy of the model is higher than that of the initial network model, and the initial weight parameters of the initial network model are adjusted based on the first calibration network model to obtain the first preprocessing model; the second calibration network model is obtained, and the accuracy of the second calibration network model is high Based on the accuracy of the initial network model, an initial quantization parameter of the activation output of the first preprocessing model is adjusted based on the second calibration network model to obtain a target network model.
  • the pre-trained full-precision network model first obtain the pre-trained full-precision network model, and use it as the network model to be processed, and then perform quantization processing on the weight and activation output of the network model to be processed according to the quantization requirements, and obtain the initial weight parameters and activation output
  • the initial quantization parameters of based on the initial weight parameters and the initial quantization parameters of the activation output, construct the initial network model. Since the weight of the network model to be processed and the initial quantization parameter of the activation output are quantized, the size of the initial network model constructed based on the initial weight parameter and the initial quantization parameter of the activation output is much smaller than the network model to be processed, thus ensuring the initial Network models can run on some end devices and edge devices.
  • the initial weight parameters of the initial network model can be adjusted based on the first calibration network model whose model accuracy is higher than that of the initial network model to obtain the first preprocessing model, so that the accuracy of the weight parameters of the first preprocessing model can be guaranteed, thereby improving the accuracy of the first preprocessing model.
  • the initial quantization parameters of the activation output of the first preprocessing model can be adjusted to obtain the target network model.
  • the weight parameters and activation output range of the target network model are more accurate, which further improves the accuracy of the target network model and solves the problem of shrinking large deep neural networks through quantization, cropping and other model compression methods. model, which makes the accuracy of the reduced deep neural network model lower.
  • the initial weight parameters of the initial network model are adjusted based on the first calibration network model to obtain the first preprocessing model, including: a learning method based on knowledge distillation, according to the first A calibration network model adjusts initial weight parameters of the initial network model to obtain a first preprocessing model.
  • the learning method based on knowledge distillation uses the first calibrated network model as the large teacher network model for the small quantized initial network model Conduct guided learning to obtain better model parameters, adjust the initial weight parameters of the initial network model according to the first calibration network model, and obtain the first preprocessing model. Therefore, the accuracy of the weight parameters of the obtained first preprocessing model can be guaranteed, and the accuracy of the first preprocessing model can be improved.
  • the learning method based on knowledge distillation adjusts the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model, including: Obtain the first training image set, the first training image set has hard labels; input the first training image set to the initial network model, and output the first result; input the first training image set to the first calibration network model, and output the first The second result: based on the hard label, the first result and the second result, the initial weight parameters of the initial network model are adjusted to obtain the first preprocessing model.
  • the first training image set with hard labels is input to the initial network model and the first calibration network model respectively, and the first result and the second result are output respectively, using the first result and the second result and The relationship between the first result and the hard label, adjust the initial weight parameters of the initial network model, so that the first result output by the initial network model can be closer to the second result and the hard label, so as to ensure that the first result obtained after weight parameter adjustment The accuracy of the preprocessed model has been improved.
  • the initial weight parameters of the initial network model are adjusted to obtain the first preprocessing model, including: based on Generate the first loss function based on the first result and the hard label; generate the second loss function based on the first result and the second result; use the first loss function and the second loss function to generate the first target loss function, and based on the first target loss
  • the function adjusts the initial weight parameters of the initial network model to obtain the first preprocessing model.
  • the first loss function is generated based on the first result output by the initial network model and the hard label of the first training image set, based on the first result output by the initial network model and the second result output by the first calibration network model Generate the second loss function.
  • the first loss function can be used to represent the gap between the first result and the hard label
  • the second loss function can be used to represent the gap between the first result and the second result. Therefore, using the first loss function and the second loss function, the generated first target loss function can characterize the first result and the hard label and the gap between the first result and the second result.
  • the initial weight parameters of the initial network model are adjusted based on the first objective loss function to obtain a first preprocessing model, thereby improving the accuracy of the first preprocessing model.
  • the initial quantization parameters of the activation output of the first preprocessing model are adjusted based on the second calibration network model to obtain the target network model, including: a learning method based on knowledge distillation , adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model; adjusting the initial quantization parameter of the activation output of the first preprocessing model according to the adjusted activation quantization threshold to obtain the target network model.
  • the learning method based on knowledge distillation adopts the second calibration network model as the large teacher network model for the small quantized first Preprocessing the model for guided learning to obtain better model parameters.
  • Adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model can ensure the accuracy of the adjusted activation quantization threshold.
  • the initial quantization parameter of the activation output of the first preprocessing model is adjusted to obtain the target network model, which can further ensure the accuracy of the initial quantization parameter of the activation output of the adjusted first preprocessing model performance, thereby improving the accuracy of the obtained target network model.
  • the learning method based on knowledge distillation, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model includes: acquiring a second training image set; input the second training image set to the first preprocessing model, and output the third result; input the second training image set to the second calibration network model, and output the fourth result; based on the third result and the fourth result, adjust Activation quantization threshold for the first preprocessing model.
  • the second training image set is input to the first preprocessing model and the second calibration network model respectively, and the third result and the fourth result are output, and based on the third result and the fourth result, the first preprocessing
  • the activation quantization threshold of the processing model is adjusted, thereby ensuring the accuracy of the adjusted activation quantization threshold and further ensuring the accuracy of the obtained target network model.
  • adjusting the activation quantization threshold of the first preprocessing model based on the third result and the fourth result includes: based on the third result and the fourth result, A second objective loss function is generated; based on the second objective loss function, an activation quantization threshold of the first preprocessing model is adjusted.
  • adjusting the activation quantization threshold of the first preprocessing model based on the second objective loss function can ensure the accuracy of the adjusted activation quantization threshold, and further ensure the accuracy of the quantization parameter of the activation output calculated based on the adjusted activation quantization threshold. Accuracy, thereby improving the accuracy of the target network model.
  • an embodiment of the present application provides a network model quantification device, which includes:
  • the quantization processing module is used to obtain the network model to be processed.
  • the network model to be processed is a pre-trained full-precision network model. According to the quantization requirements, the weight parameters and activation output of the network model to be processed are respectively quantized to obtain the initial weight parameter and activation.
  • the initial quantization parameter of the output based on the initial weight parameter and the initial quantization parameter of the activation output, constructs the initial network model;
  • the first adjustment module is used to obtain the first calibration network model, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, and the initial weight parameters of the initial network model are adjusted based on the first calibration network model to obtain the first prediction processing model;
  • the second adjustment module is used to obtain a second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the initial network model, and the initial quantization parameter of the activation output of the first preprocessing model is adjusted based on the second calibration network model , to get the target network model.
  • the above-mentioned first adjustment module is specifically used for a learning method based on knowledge distillation, and adjusts the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preset Handle the model.
  • the above-mentioned first adjustment module is specifically used to obtain the first training image set, the first training image set has a hard label; the first training image set input to the initial network model, and output the first result; input the first training image set to the first calibration network model, and output the second result; based on the hard label, the first result and the second result, adjust the initial weight of the initial network model parameters to obtain the first preprocessing model.
  • the above-mentioned first adjustment module is specifically configured to generate a first loss function based on the first result and the hard label; generate a loss function based on the first result and the second result
  • the second loss function using the first loss function and the second loss function to generate a first target loss function, and adjusting the initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.
  • the above-mentioned second adjustment module includes:
  • the first adjustment unit is used for the learning method based on knowledge distillation, and adjusts the activation quantization threshold of the first preprocessing model according to the second calibration network model;
  • the second adjustment unit is configured to adjust the initial quantization parameter of the activation output of the first preprocessing model according to the adjusted activation quantization threshold to obtain the target network model.
  • the above-mentioned first adjustment unit is specifically configured to: acquire the second training image set; input the second training image set into the first preprocessing model, outputting the third result; inputting the second training image set into the second calibration network model, outputting the fourth result; adjusting the activation quantization threshold of the first preprocessing model based on the third result and the fourth result.
  • the above-mentioned first adjustment unit is specifically configured to: generate a second target loss function based on the third result and the fourth result;
  • the objective loss function which adjusts the activation quantization threshold of the first preprocessing model.
  • an embodiment of the present application provides an electronic device/mobile terminal/server, including: a memory and a processor, the memory and the processor are connected to each other in communication, computer instructions are stored in the memory, and the processor executes the Instructions, so as to execute the network model quantification method in the first aspect or any implementation manner of the first aspect.
  • the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and the computer instructions are used to make the computer execute the first aspect or any one of the implementations of the first aspect.
  • Network Model Quantization Methods are used to make the computer execute the first aspect or any one of the implementations of the first aspect.
  • an embodiment of the present application provides a computer program product, the computer program product includes a computer program stored on a computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by the computer, the computer executes The network model quantification method in the first aspect or any implementation manner of the first aspect.
  • Fig. 1 shows a flow chart of the steps of the network model quantification method in one embodiment
  • Fig. 2a shows a schematic diagram of unsaturated mapping in PTQ model quantization in the network model quantization method in one embodiment
  • Figure 2b shows a schematic diagram of saturation mapping in PTQ model quantization in the network model quantization method in an embodiment
  • Fig. 3 shows the flow chart of the steps of the network model quantification method in another embodiment
  • Fig. 4 shows a schematic diagram of the process of adjusting initial network model weight parameters in the network model quantification method in one embodiment
  • Fig. 5 shows the flow chart of the steps of the network model quantification method in another embodiment
  • Fig. 6 shows the flow chart of the steps of the network model quantification method in another embodiment
  • Fig. 7 shows the flow chart of the steps of the network model quantification method in another embodiment
  • Fig. 8 shows a schematic diagram of the process of adjusting the activation output threshold of the first preprocessing model in the network model quantization method in another embodiment
  • Fig. 9 shows a flow chart of the steps of the network model quantification method in another embodiment
  • Fig. 10 shows a flow chart of the steps of the network model quantification method in another embodiment
  • Fig. 11 shows a schematic flowchart of a network model quantification method in another embodiment
  • Fig. 12 shows a structural block diagram of a network model quantization device in an embodiment
  • Fig. 13 shows a structural block diagram of a network model quantization device in an embodiment
  • Fig. 14 shows an internal structural diagram when the computer device of an embodiment is a server
  • Fig. 15 shows an internal structure diagram of an embodiment when the computer device is a terminal.
  • the network model quantification method provided by the embodiment of the present application can be executed by a network model quantification device, and the network model quantification device can be implemented as a computer device through software, hardware, or a combination of software and hardware.
  • the computer device may be a server or a terminal
  • the server in the embodiment of the present application may be a single server, or may be a server cluster composed of multiple servers
  • the terminal in the embodiment of the present application may be Smartphones, personal computers, tablet computers, wearable devices, and other intelligent hardware devices such as intelligent robots.
  • the execution subject is a computer device as an example for illustration.
  • a network model quantification method is provided, and the method is applied to computer equipment as an example for illustration, including the following steps:
  • Step 101 obtain the network model to be processed, the network model to be processed is a pre-trained full-precision network model, quantify the weight parameters and activation output of the network model to be processed according to the quantification requirements, and obtain the initial weight parameter and activation output of the initial Quantization parameters, based on the initial weight parameters and the initial quantization parameters of the activation output, construct the initial network model.
  • the computer device can use the first target image training set to train the neural network model, and obtain the network model to be processed.
  • the network model to be processed is a pre-trained full-precision network model.
  • the network model to be processed can be used to process tasks such as image recognition, image detection, and image classification.
  • the embodiment of the present application does not specifically limit the application scenarios of the network model to be processed.
  • the computer device may also receive a network model to be processed sent by other devices or a network model to be processed input by a user.
  • the embodiment of the present application does not specifically limit the manner in which the computer device acquires the network model to be processed.
  • the computer device performs quantization processing on the weight parameters and activation output of the network model to be processed according to the quantization requirements, and obtains the initial weight parameter and the initial quantization parameter of the activation output, and the initial quantization parameter based on the initial weight parameter and the activation output , to build an initial network model.
  • the quantitative requirement may be input by the user to the computer device based on the input component of the computer device.
  • Quantitative requirements can be changed according to the actual situation. Among them, quantization requirements can represent the bit width requirements of weight parameters and activation outputs.
  • the quantization requirement may be to reduce the size of the network model to be processed by 4 times, and to convert the weight parameter and activation output of the network model to be processed from float32 to int8.
  • the embodiment of the present application does not specifically limit the quantitative requirement.
  • the accuracy of the initial network model is much smaller than that of the network model to be processed, and the size of the initial network model is also much smaller than the size of the network model to be processed.
  • the computer device can use the post-training quantization method (Post-Training Quantization, PTQ) or the training perception quantization method (Training-Aware Quantization, TAQ) to perform quantization processing on the weight parameters and activation outputs of the network model to be processed respectively .
  • the embodiment of the present application does not specifically limit the method of separately quantizing the weight parameter and the activation output of the network model to be processed.
  • the following uses the PTQ method to quantify the weight parameters and activation outputs of the network model to be processed respectively for explanation.
  • the central idea of using the PTQ quantization method is to calculate the quantization threshold T, and determine the mapping relationship between the weight of the network model to be processed and the weight of the initial network model and the activation output of the network model to be processed and the activation output of the initial network model according to the quantization threshold T mapping relationship.
  • the mapping relationship between the activation outputs includes saturated mapping and unsaturated mapping.
  • the unsaturated mapping shown in Figure 2a is used.
  • the quantization threshold T is equal to the maximum value.
  • a saturation map is generally used, as shown in Figure 2b.
  • the quantization threshold T in saturated mapping can be searched by relative entropy divergence or mean square error method. The criterion for finding the quantization threshold T is to find such a threshold, based on which the original value is clipped, and the difference from the original value is still the smallest.
  • the part exceeding the threshold T needs to be clipped as shown in the second item of formula (1).
  • s is the quantization mapping scale factor
  • x is the original value
  • q(x, T) represents the value of x after quantization-inverse quantization
  • n is the number of bit widths to be quantized
  • T is the quantization threshold
  • Step 102 Obtain a first calibration network model, the accuracy of the first calibration network model is higher than that of the initial network model, and adjust the initial weight parameters of the initial network model based on the first calibration network model to obtain a first preprocessing model.
  • the accuracy of the first calibration network model is higher than the accuracy of the initial network model, which can represent that the performance accuracy of the first calibration network model is higher than the performance accuracy of the initial network model and the bandwidth accuracy of the parameters of the first calibration network model is higher than that of the initial network model. At least one of the bandwidth precision of the parameters.
  • the computer device may use the second target image training set to train the neural network model to obtain the first calibration network model.
  • the precision of the first calibration network model is higher than the precision of the initial network model.
  • the first calibration network model can be used for image recognition, image detection and image classification task processing. The embodiment of the present application does not specifically limit the application scenario of the first calibration network model.
  • the computer device may also receive the first calibration network model sent by other devices or the first calibration network model input by the user.
  • the embodiment of this application does not specifically describe the method for the computer device to obtain the first calibration network model. limited.
  • the computer device may adjust the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model.
  • the computer device may also compare the output result of the first calibration network model with the output result of the initial network model, adjust the initial weight parameters of the initial network model according to the comparison result, and obtain the first preprocessing model.
  • step 101 after the full-precision network is converted to a low-precision initial network model, the main reason for the decrease in model performance accuracy generally comes from two parts: the change of weight parameters and the selection of activation thresholds. In the quantization process after training, all weight parameters are usually intercepted by the same approximation method, but the same approximation method may not be suitable for all weight parameters, so this will introduce noise virtually and affect the feature extraction ability of the network model .
  • the first calibration network model is used to correct the initial weight parameters of the initial network model to reduce errors generated in the above process.
  • Step 103 obtain the second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model, adjust the initial quantization parameters of the activation output of the first preprocessing model based on the second calibration network model, and obtain target network model.
  • the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model, which can indicate that the performance accuracy of the second calibration network model is higher than the performance accuracy of the first preprocessing model and the bandwidth accuracy of the parameters of the second calibration network model is high. at least one of the bandwidth accuracy of the parameters of the first preprocessing model.
  • the computer device may use the third target image training set to train the neural network model, and obtain the second calibration network model.
  • the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model.
  • the second calibration network model can be used for image recognition, image detection and image classification task processing. The embodiment of the present application does not specifically limit the application scenario of the second calibration network model.
  • the computer device can also receive the second calibration network model sent by other devices or receive the second calibration network model input by the user.
  • the embodiment of this application does not specifically describe the method for the computer device to obtain the second calibration network model. limited.
  • the second calibration network model may be the same pre-trained full-precision network model as the network model to be processed, or may be a different pre-trained full-precision network model.
  • the computer device may adjust the initial quantization parameters of the activation output of the first preprocessing model according to the second calibration network model to obtain the target network model.
  • the initial activation threshold is further adjusted, and the computer device can also output the second calibration network model
  • the results are compared with the output results of the first preprocessing model, and the initial quantization parameters of the activation output of the first preprocessing model are adjusted according to the comparison results to obtain the target network model, thereby further reducing the cost of converting the full precision model to a low precision model. loss, which improves the accuracy of the model.
  • the initial quantization parameter of the output based on the initial weight parameter and the initial quantization parameter of the activation output, constructs the initial network model. Since the weight parameters of the network model to be processed and the initial quantization parameters of the activation output are quantized, the size of the initial network model constructed based on the initial weight parameters and the initial quantization parameters of the activation output is much smaller than the network model to be processed, thus ensuring The initial network model can run on some end devices and edge devices.
  • the initial weight parameters of the initial network model can be adjusted based on the first calibration network model with a higher accuracy than the initial network model to obtain the first preprocessing model , so that the accuracy of the weight parameters of the first preprocessing model can be guaranteed, thereby improving the accuracy of the first preprocessing model.
  • the initial quantization parameters of the activation output of the first preprocessing model may be adjusted based on the second calibration network model whose accuracy is higher than that of the first preprocessing model, to obtain the target network model.
  • the weight parameters and activation output of the target network model are more accurate, which further improves the accuracy of the target network model and solves the problem of shrinking large deep neural network models through model compression such as quantization and cropping. , which seriously reduces the accuracy of the deep neural network model.
  • the "adjust the initial weight parameters of the initial network model based on the first calibration network model to obtain the first preprocessing model" in the above step 102 may include the following content:
  • the initial weight parameters of the initial network model are adjusted according to the first calibration network model to obtain the first preprocessing model.
  • the computer device can use the knowledge distillation learning method to compare the feature vectors output by each layer of the network in the first calibration network model with the feature vectors output by each layer of the network in the initial network model, and then according to the comparison results, and the first Calibrate the weight parameters corresponding to each layer of network in the network model, and adjust the initial weight parameters in the initial network model.
  • the learning method based on knowledge distillation uses the first calibration network model as a large teacher
  • the network model guides and learns the small quantized initial network model to obtain better weight parameters, adjusts the initial weight parameters of the initial network model according to the first calibration network model, and obtains the first preprocessing model. Therefore, the accuracy of the weight parameters of the obtained first preprocessing model can be guaranteed, and the accuracy of the first preprocessing model can be improved.
  • the above-mentioned "learning method based on knowledge distillation, adjusting the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model” can be Include the following steps:
  • Step 301 acquire a first training image set.
  • the first training image set has hard labels.
  • the hard labels are labels corresponding to each image in the first training image set.
  • the hard label can represent the The target object is labeled.
  • the computer device may receive the first training image set sent by other devices, and may receive the first training image set input by the user.
  • the hard labels attached to the first training image set may be marked manually, or may be marked by a computer device based on a neural network model.
  • the embodiment of the present application does not specifically limit the manner of labeling the hard tags of the first training image set.
  • the first training image set includes multiple first training images.
  • Step 302 input the first training image set into the initial network model, and output the first result.
  • the computer device inputs the first training image set into the initial network model, and the initial network model performs feature extraction on the first training image set, and outputs a first result based on the extracted features.
  • Step 303 input the first training image set into the first calibration network model, and output the second result.
  • the computer device inputs the first training image set into the first calibration network model, the first calibration network model performs feature extraction on the first training image set, and outputs a second result based on the extracted features.
  • Step 304 based on the hard label, the first result and the second result, adjust the initial weight parameters of the initial network model to obtain a first preprocessing model.
  • the computer device compares the first result output by the initial network model with the hard labels carried by the first training image set, and compares the first result output by the initial network model with the second result output by the first calibration network model. Compared. The computer device adjusts the initial weight parameters of the initial network model according to the comparison result to obtain the first preprocessing model.
  • the image X can be an image in the first training image set
  • the teacher network is the first calibration network model
  • W_T is the weight parameter of the teacher network
  • the student network is the initial network model
  • W_S is the initial weight parameter of the student network.
  • the image X is input to the teacher network, and the teacher network outputs the second result, namely P_T.
  • the image X is input to the student network, and the student network outputs the first result, namely P_S.
  • the computer device adjusts initial weight parameters of the initial network model based on P_T, P_S and label Y to obtain a first preprocessing model.
  • the first training image set with hard labels is input to the initial network model and the first calibration network model respectively, and the first result and the second result are respectively output, and the first result is used
  • the relationship between the first result and the second result and the first result and the hard label adjust the initial weight parameters of the initial network model, so that the first result output by the initial network model can be closer to the second result and the hard label, thereby ensuring the weight
  • the accuracy of the first preprocessed model obtained after parameter tuning is improved.
  • step 304 "based on the hard label, the first result and the second result, adjust the initial weight parameters of the initial network model to obtain the first preprocessing model ” may include the following steps:
  • Step 501 generate a first loss function based on the first result and the hard label.
  • the computer device generates the first loss function based on the first result output by the initial network model and the hard label corresponding to the first training image set.
  • the first loss function represents the loss function of the initial network model during the training process.
  • the first loss function may be represented by H(Y, P_S), where Y represents the hard label corresponding to the first training image set, and P_S represents the first result output by the initial network model.
  • Step 502 generating a second loss function based on the first result and the second result.
  • the computer device generates the second loss function based on the first result output by the initial network model and the second result output by the first calibration network model.
  • the second loss function represents the initial network model as the loss function of the student network in the process of imitating the first calibration network model.
  • the second loss function may be represented by H(P_T, P_S), where P_T represents the second result output by the first calibration network model, and P_S represents the first result output by the initial network model.
  • Step 503 using the first loss function and the second loss function to generate a first target loss function, and adjusting the initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.
  • the computer device may add the first loss function and the second loss function to generate the first target loss function, and adjust the initial weight parameters of the initial network model based on the first target loss function to obtain the first preprocessing model .
  • the computer device may also multiply the first loss function by the first weight parameter, multiply the second loss function by the second weight parameter, and perform the multiplication of the first loss function and the second loss function after multiplying the corresponding weights. Adding up to obtain the first objective loss function, and adjusting the initial weight parameters of the initial network model based on the first objective loss function to obtain the first preprocessing model.
  • the computer device can adjust the proportion of each loss function in the training process by adjusting the values of ⁇ and ⁇ .
  • the embodiment of the present application does not specifically limit the values of ⁇ and ⁇ .
  • the first loss function is generated based on the first result output by the initial network model and the hard label of the first training image set, and based on the first result output by the initial network model and the first
  • a second result output by a calibration network model generates a second loss function.
  • the first loss function can be used to represent the gap between the first result and the hard label
  • the second loss function can be used to represent the gap between the first result and the second result. Therefore, using the first loss function and the second loss function, the generated first target loss function can characterize the first result and the hard label and the gap between the first result and the second result.
  • the initial weight parameters of the initial network model are adjusted based on the first objective loss function to obtain a first preprocessing model. Thus improving the accuracy of the first preprocessing model.
  • step 103 "adjust the initial quantization parameters of the activation output of the first preprocessing model based on the second calibration network model to obtain the target network model” , can include the following steps:
  • Step 601 based on the learning method of knowledge distillation, the activation quantization threshold of the first preprocessing model is adjusted according to the second calibration network model.
  • the computer device can use a knowledge distillation learning method to compare the feature vectors output by each layer of the network in the second calibration network model with the feature vectors output by each layer of the network in the first preprocessing model, and then according to the comparison results, and The activation quantization threshold corresponding to each network layer in the first calibration network model is adjusted to the activation quantization threshold in the first preprocessing model.
  • Step 602 Adjust the initial quantization parameter of the activation output of the first preprocessing model according to the adjusted activation quantization threshold to obtain a target network model.
  • the computer device can adjust the initial activation output of the first preprocessing model according to the corresponding relationship between the adjusted activation quantization threshold and the initial quantization parameter of the activation output
  • the quantization parameter according to the quantization parameter of the adjusted activation output, obtains the target network model.
  • the learning method based on knowledge distillation adopts the second calibration network model as the large teacher network model for the small quantized first Preprocessing the model for guided learning to obtain better model parameters.
  • Adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model can ensure the accuracy of the adjusted activation quantization threshold.
  • the initial quantization parameter of the activation output of the first preprocessing model is adjusted to obtain the target network model, which can further ensure the accuracy of the initial quantization parameter of the activation output of the adjusted first preprocessing model performance, thereby improving the accuracy of the obtained target network model.
  • the "knowledge distillation-based learning method, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model" in step 601 may include The following steps:
  • Step 701 acquire a second training image set.
  • the computer device may receive the second training image set sent by other devices, and may receive the second training image set input by the user.
  • the second training image set may be an unlabeled image or an image with a label, and the implementation of the present application does not specifically limit the second training image set.
  • the second training image set can be the same as the first training image set, or can be different from the first training image set.
  • the second training image set may include multiple second training images.
  • Step 702 input the second training image set into the first preprocessing model, and output the third result.
  • the computer device inputs the second training image set into the first preprocessing model, and the first preprocessing model performs feature extraction on the second training image set, and outputs a third result based on the extracted features.
  • Step 703 input the second training image set into the second calibration network model, and output the fourth result.
  • the computer device inputs the second training image set into the second calibration network model, and the second calibration network model performs feature extraction on the second training image set, and outputs a fourth result based on the extracted features.
  • Step 704 Adjust the activation quantization threshold of the first preprocessing model based on the third result and the fourth result.
  • the computer device compares the third result output by the first preprocessing model with the fourth result output by the second calibration network model.
  • the computer device adjusts the activation quantization threshold of the first preprocessing model according to the comparison result.
  • the image X may be an image in the second training image set
  • the full-precision teacher network is the second calibration network model
  • the low-precision student network is the first preprocessing model.
  • the computer device inputs the image X into the all-fine teacher network, and the all-fine teacher network outputs the fourth result, namely P_T in Fig. 8 .
  • the computer device inputs the image X to the low-skilled student network, and the low-skilled student network outputs the third result, namely P_S in FIG. 8 .
  • the computer device adjusts the activation quantization threshold of the first preprocessing model based on P_T, P_S.
  • the second training image set is input to the first preprocessing model and the second calibration network model respectively, and the third result and the fourth result are output, and based on the third result and the fourth result, the first preprocessing
  • the activation quantization threshold of the processing model is adjusted, so that the accuracy of the adjusted activation quantization threshold can be guaranteed, and the accuracy of the first preprocessing model can be further ensured.
  • the "adjusting the activation quantization threshold of the first preprocessing model based on the third result and the fourth result" in the above step 704 may include the following steps:
  • Step 901 generate a second target loss function based on the third result and the fourth result.
  • the computer device generates the second target loss function based on the third result output by the first preprocessing model and the fourth result output by the second calibration network model.
  • the smaller the value of the second objective loss function the more it can indicate that under the same network structure, the first preprocessing model still has a predictive ability similar to that of the second calibration network model after being quantized with the threshold T.
  • T represents the activation quantization threshold of the first pre-processed model and X represents the images in the second training image set.
  • Step 902 Adjust the activation quantization threshold of the first preprocessing model based on the second objective loss function.
  • the computer device adjusts the activation quantization threshold of the first preprocessing model based on the function value calculated by the second target loss function
  • adjusting the activation quantization threshold of the first preprocessing model based on the second objective loss function can ensure the accuracy of the adjusted activation quantization threshold, and further ensure the accuracy of the quantization parameter of the activation output calculated based on the adjusted activation quantization threshold. Accuracy, thereby improving the accuracy of the target network model.
  • the computer device can also set the initial network model and the first preprocessing network model to be the same model, collectively referred to as the initial network in the embodiments of this application Model.
  • the training process of the initial network model can include the following:
  • the computer device first adjusts the initial weight parameters of the initial network model according to the first objective loss function, and then adjusts the activation quantization threshold of the initial network model based on the adjusted initial weight parameters according to the second objective loss function. After one adjustment, the initial network model Both the weight parameter and the activation quantization threshold are unsatisfactory.
  • the computer device continues to adjust the initial weight parameter of the initial network model according to the first objective loss function, and then adjusts the initial network model based on the adjusted initial weight parameter and according to the second objective loss function. Activates the quantization threshold.
  • the computer equipment cyclically adjusts the initial weight parameters and activation quantization thresholds of the initial network model. After multiple iterations of training, the training of the initial network model is finally completed and the target network model is generated, thereby ensuring the accuracy of the target network model.
  • the embodiment of the present application provides an overall flow of the network model quantification method, as shown in FIG. 10 , the method includes:
  • Step 1001 obtain the network model to be processed, the network model to be processed is a pre-trained full-precision network model, quantify the weight parameters and activation output of the network model to be processed according to the quantification requirements, and obtain the initial weight parameter and activation output of the initial Quantization parameters, based on the initial weight parameters and the initial quantization parameters of the activation output, construct the initial network model.
  • Step 1002 acquire a first training image set.
  • Step 1003 input the first training image set into the initial network model, and output the first result.
  • Step 1004 acquire the first calibration network model, input the first training image set into the first calibration network model, and output the second result.
  • Step 1005 generating a first loss function based on the first result and the hard label.
  • Step 1006 generating a second loss function based on the first result and the second result.
  • Step 1007 using the first loss function and the second loss function to generate a first target loss function, and adjusting the initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.
  • Step 1008 acquiring a second training image set.
  • Step 1009 input the second training image set into the first preprocessing model, and output the third result.
  • Step 1010 acquire a second calibration network model, input the second training image set into the second calibration network model, and output a fourth result.
  • Step 1011 generate a second target loss function based on the third result and the fourth result.
  • Step 1012 Adjust the activation quantization threshold of the first preprocessing model based on the second objective loss function.
  • Step 1013 according to the adjusted activation quantization threshold, adjust the initial quantization parameter of the activation output of the first preprocessing model to obtain the target network model.
  • the above network model quantification method may be shown in Figure 11, including the following steps:
  • Parameter initialization of the low-precision network Based on the pre-trained full-precision student network, the post-training quantization method (PTQ) is used to initialize the student network with low precision, and the low-precision weight values and activations of the student network that need to be quantified are initially determined. Quantize range values.
  • PTQ post-training quantization method
  • Network structure deployment Based on the quantified network model parameters, deploy the model structure on the actual hardware platform to perform corresponding task processing, such as image classification/detection/recognition tasks, or natural language processing tasks.
  • task processing such as image classification/detection/recognition tasks, or natural language processing tasks.
  • 9-10 may include multiple steps or multiple stages, and these steps or stages are not necessarily executed at the same time, but may be Performed at different times, the execution order of these steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a part of steps or stages in other steps.
  • the network model quantization device 1200 includes: a quantization processing module 1210, a first adjustment module 1220, and a second adjustment module 1230, wherein:
  • the quantization processing module 1210 is used to obtain the network model to be processed.
  • the network model to be processed is a pre-trained full-precision network model, and the weight parameters and activation outputs of the network model to be processed are respectively quantized according to the quantization requirements to obtain the initial weight parameters and
  • the initial quantization parameter of the activation output based on the initial weight parameter and the initial quantization parameter of the activation output, constructs the initial network model.
  • the first adjustment module 1220 is configured to obtain a first calibration network model, the accuracy of the first calibration network model is higher than that of the initial network model, and adjust the initial weight parameters of the initial network model based on the first calibration network model to obtain the first Preprocess the model.
  • the second adjustment module 1230 is configured to obtain a second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the initial network model, and the initial quantization parameter of the activation output of the first preprocessing model is performed based on the second calibration network model Adjust to get the target network model.
  • the above-mentioned first adjustment module 1220 is specifically used for a learning method based on knowledge distillation, and adjusts the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model.
  • the above-mentioned first adjustment module 1220 is specifically used to obtain the first training image set, the first training image set has hard labels; input the first training image set to the initial network model, and output the first Result; the first training image set is input to the first calibration network model, and the second result is output; based on the hard label, the first result and the second result, the initial weight parameters of the initial network model are adjusted to obtain the first preprocessing model.
  • the above-mentioned first adjustment module 1220 is specifically configured to generate the first loss function based on the first result and the hard label; generate the second loss function based on the first result and the second result; use the first loss function and a second loss function to generate a first objective loss function, and adjust the initial weight parameters of the initial network model based on the first objective loss function to obtain a first preprocessing model.
  • the above-mentioned second adjustment module 1230 includes: a first adjustment unit 1231 and a second adjustment unit 1232, wherein:
  • the first adjustment unit 1231 is used for a learning method based on knowledge distillation, and adjusts the activation quantization threshold of the first preprocessing model according to the second calibration network model;
  • the second adjustment unit 1232 is configured to adjust the initial quantization parameter of the activation output of the first preprocessing model according to the adjusted activation quantization threshold to obtain the target network model.
  • the above-mentioned first adjustment unit 1231 is specifically configured to: acquire the second training image set; input the second training image set into the first preprocessing model, and output the third result; The set is input to the second calibration network model, and the fourth result is output; based on the third result and the fourth result, the activation quantization threshold of the first preprocessing model is adjusted.
  • the above-mentioned first adjustment unit 1231 is specifically configured to: generate a second target loss function based on the third result and the fourth result; adjust the first preprocessing model based on the second target loss function The activation quantification threshold for .
  • Each module in the above-mentioned apparatus for network model quantification can be fully or partially realized by software, hardware and a combination thereof.
  • the above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can invoke and execute the corresponding operations of the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a terminal, and its internal structure may be as shown in FIG. 14 .
  • the computer device includes a processor, a memory, a communication interface, a display screen and an input device connected through a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and computer programs.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the communication interface of the computer device is used to communicate with an external terminal in a wired or wireless manner, and the wireless manner can be realized through WIFI, an operator network, NFC (Near Field Communication) or other technologies.
  • a network model quantification method is implemented.
  • the display screen of the computer device may be a liquid crystal display screen or an electronic ink display screen
  • the input device of the computer device may be a touch layer covered on the display screen, or a button, a trackball or a touch pad provided on the casing of the computer device , and can also be an external keyboard, touchpad, or mouse.
  • Figure 14 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer equipment on which the solution of this application is applied.
  • the specific computer equipment can be More or fewer components than shown in the figures may be included, or some components may be combined, or have a different arrangement of components.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 15 .
  • the computer device includes a processor, memory and a network interface connected by a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer programs and databases.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer device is used to store network model quantification data.
  • the network interface of the computer device is used to communicate with an external terminal via a network connection. When the computer program is executed by a processor, a network model quantification method is implemented.
  • Figure 15 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer equipment on which the solution of this application is applied.
  • the specific computer equipment can be More or fewer components than shown in the figures may be included, or some components may be combined, or have a different arrangement of components.
  • a computer device including a memory and a processor.
  • a computer program is stored in the memory.
  • the processor executes the computer program, the following steps are implemented: acquiring a network model to be processed, the network model to be processed is The pre-trained full-precision network model performs quantization processing on the weight parameters and activation output of the network model to be processed according to the quantification requirements, and obtains the initial weight parameter and the initial quantization parameter of the activation output, based on the initial weight parameter and the initial quantization parameter of the activation output , construct the initial network model; obtain the first calibration network model, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, adjust the initial weight parameters of the initial network model based on the first calibration network model, and obtain the first preprocessing Model; obtain the second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model, and adjust the initial quantization parameters of the activation output of the first preprocessing model based on the second calibration
  • the processor when the processor executes the computer program, the following steps are further implemented: using a learning method based on knowledge distillation, adjusting the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model.
  • the processor executes the computer program, the following steps are also implemented: obtaining the first training image set, the first training image set has hard labels; inputting the first training image set to the initial network model, and outputting the first training image set A result; the first training image set is input to the first calibration network model, and the second result is output; based on the hard label, the first result and the second result, the initial weight parameters of the initial network model are adjusted to obtain the first preprocessing model.
  • the processor when the processor executes the computer program, the following steps are also implemented: generating a first loss function based on the first result and the hard label; generating a second loss function based on the first result and the second result; using the first loss function and a second loss function, generate a first target loss function, and adjust initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.
  • the processor when the processor executes the computer program, the following steps are also implemented: a learning method based on knowledge distillation, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model; according to the adjusted activation quantization threshold , adjust the initial quantization parameters of the activation output of the first preprocessing model to obtain the target network model.
  • the processor when the processor executes the computer program, the following steps are also implemented: acquiring the second training image set; inputting the second training image set into the first preprocessing model, and outputting the third result; The set is input to the second calibration network model, and the fourth result is output; based on the third result and the fourth result, the activation quantization threshold of the first preprocessing model is adjusted.
  • the processor when the processor executes the computer program, the following steps are also implemented: generating a second target loss function based on the third result and the fourth result; adjusting the activation of the first preprocessing model based on the second target loss function Quantization Threshold.
  • a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the following steps are implemented: obtaining a network model to be processed, the network model to be processed is pre-trained According to the quantization requirements, the weight parameters and activation output of the network model to be processed are respectively quantized to obtain the initial weight parameters and the initial quantization parameters of the activation output.
  • the initial Network model Based on the initial weight parameters and the initial quantization parameters of the activation output, the initial Network model; obtain a first calibration network model, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, and adjust the initial weight parameters of the initial network model based on the first calibration network model to obtain a first preprocessing model; obtain The second calibration network model, the precision of the second calibration network model is higher than the precision of the first preprocessing model, and the initial quantization parameter of the activation output of the first preprocessing model is adjusted based on the second calibration network model to obtain the target network model.
  • the following steps are further implemented: using a learning method based on knowledge distillation, adjusting the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model.
  • the following steps are also implemented: obtaining the first training image set, the first training image set has hard labels; inputting the first training image set to the initial network model, and outputting The first result; input the first training image set to the first calibration network model, and output the second result; based on the hard label, the first result and the second result, adjust the initial weight parameters of the initial network model to obtain the first preprocessing model .
  • the following steps are further implemented: generating a first loss function based on the first result and the hard label; generating a second loss function based on the first result and the second result; using the first A loss function and a second loss function, generating a first target loss function, and adjusting initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.
  • the following steps are also implemented: using a learning method based on knowledge distillation, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model; Threshold, adjust the initial quantization parameter of the activation output of the first preprocessing model to obtain the target network model.
  • the following steps are also implemented: acquiring the second training image set; inputting the second training image set into the first preprocessing model, and outputting the third result; The image set is input to the second calibration network model, and a fourth result is output; based on the third result and the fourth result, the activation quantization threshold of the first preprocessing model is adjusted.
  • the following steps are further implemented: generating a second target loss function based on the third result and the fourth result; adjusting the first preprocessing model based on the second target loss function Activates the quantization threshold.
  • the storage medium can be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a flash memory (Flash Memory), a hard disk (Hard Disk Drive) , abbreviation: HDD) or solid-state drive (Solid-State Drive, SSD), etc.; the storage medium may also include a combination of the above-mentioned types of memory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed in the present application are a network model quantization method and apparatus, and a computer device and a storage medium, which are applicable to the technical field of artificial intelligence. The network model quantization method comprises: acquiring a network model to be processed, according to quantization requirements, separately performing quantization processing on a weight parameter and an activation output of the network model to be processed, so as to obtain an initial weight parameter and an initial quantization parameter of the activation output, and constructing an initial network model; acquiring a first calibration network model, and adjusting the initial weight parameter of the initial network model on the basis of the first calibration network model, so as to obtain a first pre-processed model; and acquiring a second calibration network model, and adjusting an initial quantization parameter of an activation output of the first pre-processed model on the basis of the second calibration network model, so as to obtain a target network model. By using the method, the problem of the precision of a large-sized deep neural network model being reduced caused by reducing the deep neural network model by means of model compression such as quantization and cropping can be solved.

Description

网络模型量化方法、装置、计算机设备以及存储介质Network model quantification method, device, computer equipment and storage medium
本申请要求在2021年9月28日提交中国专利局、申请号为202111139349.X、发明名称为“网络模型量化方法、装置、计算机设备以及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on September 28, 2021, with the application number 202111139349.X, and the title of the invention is "Network Model Quantization Method, Device, Computer Equipment, and Storage Medium", the entire content of which Incorporated in this application by reference.
技术领域technical field
本申请涉及人工智能领域,具体涉及一种网络模型量化方法、装置、计算机设备以及存储介质。The present application relates to the field of artificial intelligence, in particular to a network model quantification method, device, computer equipment and storage medium.
背景技术Background technique
随着人工智能技术的不断发展,人工智能技术的应用日益广泛。在人工智能技术领域,深度学习是较典型的技术之一。深度学习的本质就是人工神经网络,层数多的神经网络被称为深度神经网络。目前虽然深度神经网络模型在图像分类、检测等方面的能力已接近或超越人类,但在实际部署中,仍然存在模型大、计算复杂度高等问题,对硬件成本要求较高。在实际应用中,为了降低硬件成本,通常会将神经网络模型部署于一些终端设备或者边缘设备上,这些设备一般只有较低的计算能力,而且内存和电量消耗也都受限。因此,如何在保证模型精度不变的情况下,将大型的深度神经网络模型变小,并实现深度神经网络模型在终端的真正部署,成为了亟待解决的问题。With the continuous development of artificial intelligence technology, the application of artificial intelligence technology is becoming more and more extensive. In the field of artificial intelligence technology, deep learning is one of the more typical technologies. The essence of deep learning is an artificial neural network, and a neural network with many layers is called a deep neural network. At present, although the capabilities of deep neural network models in image classification and detection are close to or surpass those of humans, in actual deployment, there are still problems such as large models and high computational complexity, which require high hardware costs. In practical applications, in order to reduce hardware costs, neural network models are usually deployed on some terminal devices or edge devices. These devices generally have low computing power, and memory and power consumption are also limited. Therefore, how to reduce the large deep neural network model and realize the real deployment of the deep neural network model on the terminal has become an urgent problem to be solved while ensuring the accuracy of the model.
现有技术中,通常采用量化、裁剪等模型压缩的方式,降低深度神经网络模型的尺寸,从而实现将大型的深度神经网络模型缩小。In the prior art, model compression methods such as quantization and cropping are usually used to reduce the size of the deep neural network model, thereby reducing the size of the large deep neural network model.
然而,上述现有技术中,在通过量化、裁剪等模型压缩的方式缩小大型深度神经网络模型的过程中,严重降低了深度神经网络模型的精度,使得缩小后的深度神经网络模型精度较低,从而影响了缩小后的深度神经网络模型的应用。However, in the above-mentioned prior art, in the process of reducing the large-scale deep neural network model by means of model compression such as quantization and cropping, the accuracy of the deep neural network model is seriously reduced, so that the precision of the reduced deep neural network model is low. Thus affecting the application of the reduced deep neural network model.
发明内容Contents of the invention
有鉴于此,本申请实施例提供了一种网络模型量化方法、装置、计算机设备以及存储介质,以解决通过量化、裁剪等模型压缩的方式缩小大型深度神经网络模型,深度神经网络模型的精度较低的问题。In view of this, the embodiment of the present application provides a network model quantization method, device, computer equipment, and storage medium to solve the problem of shrinking large-scale deep neural network models through model compression such as quantization and cropping. The accuracy of deep neural network models is relatively low. low problem.
根据第一方面,本申请实施例提供了一种网络模型量化方法,该方法包括:获取待处理网络模型,待处理网络模型为预训练好的全精度网络模型,根据量化需求对待处理网络模型的权重参数和激活输出分别进行量化处理,得到初始权重参数和激活输出的初始量化参数,基于初始权重参数以及激 活输出的初始量化参数,构建初始网络模型;获取第一校准网络模型,第一校准网络模型的精度高于初始网络模型的精度,基于第一校准网络模型对初始网络模型的初始权重参数进行调整,得到第一预处理模型;获取第二校准网络模型,第二校准网络模型的精度高于初始网络模型的精度,基于第二校准网络模型对第一预处理模型的激活输出的初始量化参数进行调整,得到目标网络模型。According to the first aspect, the embodiment of the present application provides a network model quantification method, the method includes: obtaining the network model to be processed, the network model to be processed is a pre-trained full-precision network model, and the network model to be processed is Quantize the weight parameter and activation output respectively to obtain the initial weight parameter and the initial quantization parameter of the activation output, and construct the initial network model based on the initial weight parameter and the initial quantization parameter of the activation output; obtain the first calibration network model, the first calibration network The accuracy of the model is higher than that of the initial network model, and the initial weight parameters of the initial network model are adjusted based on the first calibration network model to obtain the first preprocessing model; the second calibration network model is obtained, and the accuracy of the second calibration network model is high Based on the accuracy of the initial network model, an initial quantization parameter of the activation output of the first preprocessing model is adjusted based on the second calibration network model to obtain a target network model.
在本实施例中,首先获取预训练好的全精度网络模型,并将其作为待处理网络模型,然后根据量化需求对待处理网络模型权重和激活输出分别进行量化处理,得到初始权重参数和激活输出的初始量化参数,基于初始权重参数以及激活输出的初始量化参数,构建初始网络模型。由于对待处理网络模型的权重和激活输出的初始量化参数进行了量化,因此,使得基于初始权重参数和激活输出的初始量化参数构建的初始网络模型的尺寸要远小于待处理网络模型,从而保证初始网络模型可以在一些终端设备和边缘设备上运行。此外,由于经过量化处理后得到的初始网络模型的精度较低,因此,可以基于模型精度高于初始网络模型的第一校准网络模型对初始网络模型的初始权重参数进行调整,得到第一预处理模型,从而可以保证第一预处理模型的权重参数的准确性,进而提高第一预处理模型的精度。此外,还可以基于模型精度高于第一预处理模型的第二校准网络模型对第一预处理模型的激活输出的初始量化参数进行调整,得到目标网络模型。从而使得目标网络模型不仅尺寸较小,而且目标网络模型的权重参数和激活输出范围均较准确,进一步提高了目标网络模型的精度,解决了通过量化、裁剪等模型压缩的方式缩小大型深度神经网络模型,使得缩小后的深度神经网络模型的精度较低的问题。In this embodiment, first obtain the pre-trained full-precision network model, and use it as the network model to be processed, and then perform quantization processing on the weight and activation output of the network model to be processed according to the quantization requirements, and obtain the initial weight parameters and activation output The initial quantization parameters of , based on the initial weight parameters and the initial quantization parameters of the activation output, construct the initial network model. Since the weight of the network model to be processed and the initial quantization parameter of the activation output are quantized, the size of the initial network model constructed based on the initial weight parameter and the initial quantization parameter of the activation output is much smaller than the network model to be processed, thus ensuring the initial Network models can run on some end devices and edge devices. In addition, because the accuracy of the initial network model obtained after quantization processing is low, the initial weight parameters of the initial network model can be adjusted based on the first calibration network model whose model accuracy is higher than that of the initial network model to obtain the first preprocessing model, so that the accuracy of the weight parameters of the first preprocessing model can be guaranteed, thereby improving the accuracy of the first preprocessing model. In addition, based on the second calibration network model whose model accuracy is higher than that of the first preprocessing model, the initial quantization parameters of the activation output of the first preprocessing model can be adjusted to obtain the target network model. As a result, not only the size of the target network model is smaller, but also the weight parameters and activation output range of the target network model are more accurate, which further improves the accuracy of the target network model and solves the problem of shrinking large deep neural networks through quantization, cropping and other model compression methods. model, which makes the accuracy of the reduced deep neural network model lower.
结合第一方面,在第一方面第一实施方式中,基于第一校准网络模型对初始网络模型的初始权重参数进行调整,得到第一预处理模型,包括:基于知识蒸馏的学习方法,根据第一校准网络模型调整初始网络模型的初始权重参数,得到第一预处理模型。In combination with the first aspect, in the first embodiment of the first aspect, the initial weight parameters of the initial network model are adjusted based on the first calibration network model to obtain the first preprocessing model, including: a learning method based on knowledge distillation, according to the first A calibration network model adjusts initial weight parameters of the initial network model to obtain a first preprocessing model.
在本实施例中,由于第一校准网络模型的精度高于初始网络模型,因此,基于知识蒸馏的学习方法,采用第一校准网络模型作为大的教师网络模型对小的量化后的初始网络模型进行指导学习获得更优的模型参数,根据第一校准网络模型调整初始网络模型的初始权重参数,得到第一预处理模型。从而可以保证得到的第一预处理模型的权重参数的准确性,提高第一预处理模型的精度。In this embodiment, since the accuracy of the first calibrated network model is higher than that of the initial network model, the learning method based on knowledge distillation uses the first calibrated network model as the large teacher network model for the small quantized initial network model Conduct guided learning to obtain better model parameters, adjust the initial weight parameters of the initial network model according to the first calibration network model, and obtain the first preprocessing model. Therefore, the accuracy of the weight parameters of the obtained first preprocessing model can be guaranteed, and the accuracy of the first preprocessing model can be improved.
结合第一方面第一实施方式,在第一方面第二实施方式中,基于知识蒸馏的学习方法,根据第一校准网络模型调整初始网络模型的初始权重参数, 得到第一预处理模型,包括:获取第一训练图像集,第一训练图像集带有硬标签;将第一训练图像集输入至初始网络模型,输出第一结果;将第一训练图像集输入至第一校准网络模型,输出第二结果;基于硬标签、第一结果和第二结果,调整初始网络模型的初始权重参数,得到第一预处理模型。In combination with the first embodiment of the first aspect, in the second embodiment of the first aspect, the learning method based on knowledge distillation adjusts the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model, including: Obtain the first training image set, the first training image set has hard labels; input the first training image set to the initial network model, and output the first result; input the first training image set to the first calibration network model, and output the first The second result: based on the hard label, the first result and the second result, the initial weight parameters of the initial network model are adjusted to obtain the first preprocessing model.
在本实施例中,利用分别将带有硬标签的第一训练图像集输入至初始网络模型和第一校准网络模型,分别输出第一结果和第二结果,利用第一结果和第二结果以及第一结果和硬标签之间的关系,调整初始网络模型的初始权重参数,从而可以使得初始网络模型输出的第一结果更加接近第二结果以及硬标签,从而保证权重参数调整后得到的第一预处理模型的精度提高了。In this embodiment, the first training image set with hard labels is input to the initial network model and the first calibration network model respectively, and the first result and the second result are output respectively, using the first result and the second result and The relationship between the first result and the hard label, adjust the initial weight parameters of the initial network model, so that the first result output by the initial network model can be closer to the second result and the hard label, so as to ensure that the first result obtained after weight parameter adjustment The accuracy of the preprocessed model has been improved.
结合第一方面第二实施方式,在第一方面第三实施方式中,基于硬标签、第一结果和第二结果,调整初始网络模型的初始权重参数,得到第一预处理模型,包括:基于第一结果和硬标签生成第一损失函数;基于第一结果和第二结果生成第二损失函数;利用第一损失函数和第二损失函数,生成第一目标损失函数,并基于第一目标损失函数调整初始网络模型的初始权重参数,得到第一预处理模型。In combination with the second embodiment of the first aspect, in the third embodiment of the first aspect, based on the hard label, the first result and the second result, the initial weight parameters of the initial network model are adjusted to obtain the first preprocessing model, including: based on Generate the first loss function based on the first result and the hard label; generate the second loss function based on the first result and the second result; use the first loss function and the second loss function to generate the first target loss function, and based on the first target loss The function adjusts the initial weight parameters of the initial network model to obtain the first preprocessing model.
在本实施例中,基于初始网络模型输出的第一结果和第一训练图像集的硬标签生成第一损失函数,基于初始网络模型输出的第一结果和第一校准网络模型输出的第二结果生成第二损失函数。其中,第一损失函数可以用来表征第一结果和硬标签之间的差距,第二损失函数可以用来表征第一结果和第二结果之间的差距。因此,利用第一损失函数和第二损失函数,生成的第一目标损失函数可以表征第一结果和硬标签以及第一结果和第二结果之间的差距。基于第一目标损失函数调整初始网络模型的初始权重参数,得到第一预处理模型,从而提高了第一预处理模型的精度。In this embodiment, the first loss function is generated based on the first result output by the initial network model and the hard label of the first training image set, based on the first result output by the initial network model and the second result output by the first calibration network model Generate the second loss function. Wherein, the first loss function can be used to represent the gap between the first result and the hard label, and the second loss function can be used to represent the gap between the first result and the second result. Therefore, using the first loss function and the second loss function, the generated first target loss function can characterize the first result and the hard label and the gap between the first result and the second result. The initial weight parameters of the initial network model are adjusted based on the first objective loss function to obtain a first preprocessing model, thereby improving the accuracy of the first preprocessing model.
结合第一方面,在第一方面第四实施方式中,基于第二校准网络模型对第一预处理模型的激活输出的初始量化参数进行调整,得到目标网络模型,包括:基于知识蒸馏的学习方法,根据第二校准网络模型调整第一预处理模型的激活量化阈值;根据调整后的激活量化阈值,调整第一预处理模型的激活输出的初始量化参数,得到目标网络模型。In combination with the first aspect, in the fourth embodiment of the first aspect, the initial quantization parameters of the activation output of the first preprocessing model are adjusted based on the second calibration network model to obtain the target network model, including: a learning method based on knowledge distillation , adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model; adjusting the initial quantization parameter of the activation output of the first preprocessing model according to the adjusted activation quantization threshold to obtain the target network model.
在本实施中,由于第二校准网络模型的精度高于第一预处理模型,因此,基于知识蒸馏的学习方法,采用第二校准网络模型作为大的教师网络模型对小的量化后的第一预处理模型进行指导学习获得更优的模型参数。根据第二校准网络模型调整第一预处理模型的激活量化阈值,可以保证调整后的激活量化阈值的准确性。进一步地,根据调整后的激活量化阈值,调整第一预处理模型的激活输出的初始量化参数,得到目标网络模型,可以进一步 保证调整后的第一预处理模型的激活输出的初始量化参数的准确性,从而提高得到的目标网络模型的精度。In this implementation, since the accuracy of the second calibration network model is higher than that of the first preprocessing model, the learning method based on knowledge distillation adopts the second calibration network model as the large teacher network model for the small quantized first Preprocessing the model for guided learning to obtain better model parameters. Adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model can ensure the accuracy of the adjusted activation quantization threshold. Further, according to the adjusted activation quantization threshold, the initial quantization parameter of the activation output of the first preprocessing model is adjusted to obtain the target network model, which can further ensure the accuracy of the initial quantization parameter of the activation output of the adjusted first preprocessing model performance, thereby improving the accuracy of the obtained target network model.
结合第一方面第四实施例,在第一方面第五实施方式中,基于知识蒸馏的学习方法,根据第二校准网络模型调整第一预处理模型的激活量化阈值,包括:获取第二训练图像集;将第二训练图像集输入至第一预处理模型,输出第三结果;将第二训练图像集输入至第二校准网络模型,输出第四结果;基于第三结果和第四结果,调整第一预处理模型的激活量化阈值。With reference to the fourth embodiment of the first aspect, in the fifth embodiment of the first aspect, the learning method based on knowledge distillation, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model, includes: acquiring a second training image set; input the second training image set to the first preprocessing model, and output the third result; input the second training image set to the second calibration network model, and output the fourth result; based on the third result and the fourth result, adjust Activation quantization threshold for the first preprocessing model.
在本实施例中,将第二训练图像集分别输入至第一预处理模型和第二校准网络模型,输出第三结果和第四结果,并基于第三结果和第四结果,对第一预处理模型的激活量化阈值进行调整,从而可以保证的调整后的激活量化阈值的准确性,进一步保证得到的目标网络模型的精度。In this embodiment, the second training image set is input to the first preprocessing model and the second calibration network model respectively, and the third result and the fourth result are output, and based on the third result and the fourth result, the first preprocessing The activation quantization threshold of the processing model is adjusted, thereby ensuring the accuracy of the adjusted activation quantization threshold and further ensuring the accuracy of the obtained target network model.
结合第一方面第五实施例,在第一方面第六实施方式中,基于第三结果和第四结果,调整第一预处理模型的激活量化阈值,包括:基于第三结果和第四结果,生成第二目标损失函数;基于第二目标损失函数,调整第一预处理模型的激活量化阈值。With reference to the fifth embodiment of the first aspect, in the sixth embodiment of the first aspect, adjusting the activation quantization threshold of the first preprocessing model based on the third result and the fourth result includes: based on the third result and the fourth result, A second objective loss function is generated; based on the second objective loss function, an activation quantization threshold of the first preprocessing model is adjusted.
在本实施例中,基于第三结果和第四结果,生成第二目标损失函数,第二目标损失函数的值越小证明第三结果和第四结果之间的差距越小。因此,基于第二目标损失函数,调整第一预处理模型的激活量化阈值,可以保证调节后的激活量化阈值的准确性,进一步保证基于调节后的激活量化阈值计算得到的激活输出的量化参数的准确性,从而提高了目标网络模型的精度。In this embodiment, based on the third result and the fourth result, a second objective loss function is generated, and a smaller value of the second objective loss function proves that the gap between the third result and the fourth result is smaller. Therefore, adjusting the activation quantization threshold of the first preprocessing model based on the second objective loss function can ensure the accuracy of the adjusted activation quantization threshold, and further ensure the accuracy of the quantization parameter of the activation output calculated based on the adjusted activation quantization threshold. Accuracy, thereby improving the accuracy of the target network model.
根据第二方面,本申请实施例提供了一种网络模型量化装置,装置包括:According to the second aspect, an embodiment of the present application provides a network model quantification device, which includes:
量化处理模块,用于获取待处理网络模型,待处理网络模型为预训练好的全精度网络模型,根据量化需求对待处理网络模型的权重参数和激活输出分别进行量化处理,得到初始权重参数和激活输出的初始量化参数,基于初始权重参数以及激活输出的初始量化参数,构建初始网络模型;The quantization processing module is used to obtain the network model to be processed. The network model to be processed is a pre-trained full-precision network model. According to the quantization requirements, the weight parameters and activation output of the network model to be processed are respectively quantized to obtain the initial weight parameter and activation. The initial quantization parameter of the output, based on the initial weight parameter and the initial quantization parameter of the activation output, constructs the initial network model;
第一调整模块,用于获取第一校准网络模型,第一校准网络模型的精度高于初始网络模型的精度,基于第一校准网络模型对初始网络模型的初始权重参数进行调整,得到第一预处理模型;The first adjustment module is used to obtain the first calibration network model, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, and the initial weight parameters of the initial network model are adjusted based on the first calibration network model to obtain the first prediction processing model;
第二调整模块,用于获取第二校准网络模型,第二校准网络模型的精度高于初始网络模型的精度,基于第二校准网络模型对第一预处理模型的激活输出的初始量化参数进行调整,得到目标网络模型。The second adjustment module is used to obtain a second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the initial network model, and the initial quantization parameter of the activation output of the first preprocessing model is adjusted based on the second calibration network model , to get the target network model.
结合第一方面,在第一方面第一实施方式中,上述第一调整模块,具体用于基于知识蒸馏的学习方法,根据第一校准网络模型调整初始网络模型的初始权重参数,得到第一预处理模型。With reference to the first aspect, in the first embodiment of the first aspect, the above-mentioned first adjustment module is specifically used for a learning method based on knowledge distillation, and adjusts the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preset Handle the model.
结合第一方面第一实施方式,在第一方面第二实施方式中,上述第一调 整模块,具体用于获取第一训练图像集,第一训练图像集带有硬标签;将第一训练图像集输入至初始网络模型,输出第一结果;将第一训练图像集输入至第一校准网络模型,输出第二结果;基于硬标签、第一结果和第二结果,调整初始网络模型的初始权重参数,得到第一预处理模型。With reference to the first embodiment of the first aspect, in the second embodiment of the first aspect, the above-mentioned first adjustment module is specifically used to obtain the first training image set, the first training image set has a hard label; the first training image set input to the initial network model, and output the first result; input the first training image set to the first calibration network model, and output the second result; based on the hard label, the first result and the second result, adjust the initial weight of the initial network model parameters to obtain the first preprocessing model.
结合第一方面第二实施方式,在第一方面第三实施方式中,上述第一调整模块,具体用于基于第一结果和硬标签生成第一损失函数;基于第一结果和第二结果生成第二损失函数;利用第一损失函数和第二损失函数,生成第一目标损失函数,并基于第一目标损失函数调整初始网络模型的初始权重参数,得到第一预处理模型。With reference to the second embodiment of the first aspect, in the third embodiment of the first aspect, the above-mentioned first adjustment module is specifically configured to generate a first loss function based on the first result and the hard label; generate a loss function based on the first result and the second result The second loss function: using the first loss function and the second loss function to generate a first target loss function, and adjusting the initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.
结合第一方面,在第一方面第四实施方式中,上述第二调整模块,包括:With reference to the first aspect, in the fourth implementation manner of the first aspect, the above-mentioned second adjustment module includes:
第一调整单元,用于基于知识蒸馏的学习方法,根据第二校准网络模型调整第一预处理模型的激活量化阈值;The first adjustment unit is used for the learning method based on knowledge distillation, and adjusts the activation quantization threshold of the first preprocessing model according to the second calibration network model;
第二调整单元,用于根据调整后的激活量化阈值,调整第一预处理模型的激活输出的初始量化参数,得到目标网络模型。The second adjustment unit is configured to adjust the initial quantization parameter of the activation output of the first preprocessing model according to the adjusted activation quantization threshold to obtain the target network model.
结合第一方面第四实施例,在第一方面第五实施方式中,上述第一调整单元,具体用于:获取第二训练图像集;将第二训练图像集输入至第一预处理模型,输出第三结果;将第二训练图像集输入至第二校准网络模型,输出第四结果;基于第三结果和第四结果,调整第一预处理模型的激活量化阈值。With reference to the fourth embodiment of the first aspect, in the fifth implementation manner of the first aspect, the above-mentioned first adjustment unit is specifically configured to: acquire the second training image set; input the second training image set into the first preprocessing model, outputting the third result; inputting the second training image set into the second calibration network model, outputting the fourth result; adjusting the activation quantization threshold of the first preprocessing model based on the third result and the fourth result.
结合第一方面第五实施例,在第一方面第六实施方式中,上述第一调整单元,具体用于:用于基于第三结果和第四结果,生成第二目标损失函数;基于第二目标损失函数,调整第一预处理模型的激活量化阈值。With reference to the fifth embodiment of the first aspect, in the sixth implementation manner of the first aspect, the above-mentioned first adjustment unit is specifically configured to: generate a second target loss function based on the third result and the fourth result; The objective loss function, which adjusts the activation quantization threshold of the first preprocessing model.
根据第三方面,本申请实施例提供了一种电子设备/移动终端/服务器,包括:存储器和处理器,存储器和处理器之间互相通信连接,存储器中存储有计算机指令,处理器通过执行计算机指令,从而执行第一方面或者第一方面的任意一种实施方式中的网络模型量化方法。According to a third aspect, an embodiment of the present application provides an electronic device/mobile terminal/server, including: a memory and a processor, the memory and the processor are connected to each other in communication, computer instructions are stored in the memory, and the processor executes the Instructions, so as to execute the network model quantification method in the first aspect or any implementation manner of the first aspect.
根据第四方面,本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质存储计算机指令,计算机指令用于使计算机执行第一方面或者第一方面的任意一种实施方式中的网络模型量化方法。According to the fourth aspect, the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and the computer instructions are used to make the computer execute the first aspect or any one of the implementations of the first aspect. Network Model Quantization Methods.
根据第五方面,本申请实施例提供了一种计算机程序产品,计算机程序产品包括存储在计算机可读存储介质上的计算机程序,计算机程序包括程序指令,当程序指令被计算机执行时,使计算机执行第一方面或者第一方面的任意一种实施方式中的网络模型量化方法。According to a fifth aspect, an embodiment of the present application provides a computer program product, the computer program product includes a computer program stored on a computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by the computer, the computer executes The network model quantification method in the first aspect or any implementation manner of the first aspect.
附图说明Description of drawings
通过参考附图会更加清楚的理解本申请的特征和优点,附图是示意性的而不应理解为对本申请进行任何限制,在附图中:The features and advantages of the present application will be more clearly understood by referring to the accompanying drawings, which are schematic and should not be construed as limiting the application in any way. In the accompanying drawings:
图1示出了一个实施例中网络模型量化方法的步骤流程图;Fig. 1 shows a flow chart of the steps of the network model quantification method in one embodiment;
图2a示出了一个实施例中网络模型量化方法中PTQ模型量化中的不饱和映射示意图;Fig. 2a shows a schematic diagram of unsaturated mapping in PTQ model quantization in the network model quantization method in one embodiment;
图2b示出了一个实施例中网络模型量化方法中PTQ模型量化中的饱和映射示意图;Figure 2b shows a schematic diagram of saturation mapping in PTQ model quantization in the network model quantization method in an embodiment;
图3示出了另一个实施例中网络模型量化方法的步骤流程图;Fig. 3 shows the flow chart of the steps of the network model quantification method in another embodiment;
图4示出了一个实施例中网络模型量化方法中调节初始网络模型权重参数过程的示意图;Fig. 4 shows a schematic diagram of the process of adjusting initial network model weight parameters in the network model quantification method in one embodiment;
图5示出了另一个实施例中网络模型量化方法的步骤流程图;Fig. 5 shows the flow chart of the steps of the network model quantification method in another embodiment;
图6示出了另一个实施例中网络模型量化方法的步骤流程图;Fig. 6 shows the flow chart of the steps of the network model quantification method in another embodiment;
图7示出了另一个实施例中网络模型量化方法的步骤流程图;Fig. 7 shows the flow chart of the steps of the network model quantification method in another embodiment;
图8示出了另一个实施例中网络模型量化方法中调节第一预处理模型激活输出阈值过程的示意图;Fig. 8 shows a schematic diagram of the process of adjusting the activation output threshold of the first preprocessing model in the network model quantization method in another embodiment;
图9示出了另一个实施例中网络模型量化方法的步骤流程图;Fig. 9 shows a flow chart of the steps of the network model quantification method in another embodiment;
图10示出了另一个实施例中网络模型量化方法的步骤流程图;Fig. 10 shows a flow chart of the steps of the network model quantification method in another embodiment;
图11示出了另一个实施例中网络模型量化方法的流程示意图;Fig. 11 shows a schematic flowchart of a network model quantification method in another embodiment;
图12示出了一个实施例中网络模型量化装置的结构框图;Fig. 12 shows a structural block diagram of a network model quantization device in an embodiment;
图13示出了一个实施例中网络模型量化装置的结构框图;Fig. 13 shows a structural block diagram of a network model quantization device in an embodiment;
图14示出了一个实施例计算机设备为服务器时的内部结构图;Fig. 14 shows an internal structural diagram when the computer device of an embodiment is a server;
图15示出了一个实施例计算机设备为终端时的内部结构图。Fig. 15 shows an internal structure diagram of an embodiment when the computer device is a terminal.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without making creative efforts belong to the scope of protection of this application.
需要说明的是,本申请实施例提供的网络模型量化的方法,其执行主体可以是网络模型量化的装置,该网络模型量化的装置可以通过软件、硬件或者软硬件结合的方式实现成为计算机设备的部分或者全部,其中,该计算机设备可以是服务器或者终端,其中,本申请实施例中的服务器可以为一台服务器,也可以为由多台服务器组成的服务器集群,本申请实施例中的终端可以是智能手机、个人电脑、平板电脑、可穿戴设备以及智能机器人等其他智能硬件设备。下述方法实施例中,均以执行主体是计算机设备为例来进行说明。It should be noted that the network model quantification method provided by the embodiment of the present application can be executed by a network model quantification device, and the network model quantification device can be implemented as a computer device through software, hardware, or a combination of software and hardware. Part or all, wherein, the computer device may be a server or a terminal, wherein the server in the embodiment of the present application may be a single server, or may be a server cluster composed of multiple servers, and the terminal in the embodiment of the present application may be Smartphones, personal computers, tablet computers, wearable devices, and other intelligent hardware devices such as intelligent robots. In the following method embodiments, the execution subject is a computer device as an example for illustration.
在本申请一个实施例中,如图1所示,提供了一种网络模型量化方法, 以该方法应用于计算机设备为例进行说明,包括以下步骤:In one embodiment of the present application, as shown in Figure 1, a network model quantification method is provided, and the method is applied to computer equipment as an example for illustration, including the following steps:
步骤101,获取待处理网络模型,待处理网络模型为预训练好的全精度网络模型,根据量化需求对待处理网络模型的权重参数和激活输出分别进行量化处理,得到初始权重参数和激活输出的初始量化参数,基于初始权重参数以及激活输出的初始量化参数,构建初始网络模型。 Step 101, obtain the network model to be processed, the network model to be processed is a pre-trained full-precision network model, quantify the weight parameters and activation output of the network model to be processed according to the quantification requirements, and obtain the initial weight parameter and activation output of the initial Quantization parameters, based on the initial weight parameters and the initial quantization parameters of the activation output, construct the initial network model.
具体地,计算机设备可以利用第一目标图像训练集对神经网络模型进行训练,训练得到待处理网络模型。其中,待处理网络模型为预训练好的全精度网络模型。待处理网络模型可以用来进行图像识别、图像检测以及图像分类等任务处理。本申请实施例对待处理网络模型的应用场景不做具体限定。Specifically, the computer device can use the first target image training set to train the neural network model, and obtain the network model to be processed. Among them, the network model to be processed is a pre-trained full-precision network model. The network model to be processed can be used to process tasks such as image recognition, image detection, and image classification. The embodiment of the present application does not specifically limit the application scenarios of the network model to be processed.
可选的,计算机设备还可以接收其他设备发送的待处理网络模型或者接收用户输入的待处理网络模型,本申请实施例对计算机设备获取待处理网络模型的方式不做具体限定。Optionally, the computer device may also receive a network model to be processed sent by other devices or a network model to be processed input by a user. The embodiment of the present application does not specifically limit the manner in which the computer device acquires the network model to be processed.
在本申请实施例中,计算机设备根据量化需求对待处理网络模型的权重参数和激活输出分别进行量化处理,得到初始权重参数和激活输出的初始量化参数,基于初始权重参数以及激活输出的初始量化参数,构建初始网络模型。其中,量化需求可以是用户基于计算机设备的输入组件输入至计算机设备的。量化需求可以根据实际情况进行改变。其中,量化需求可以表征权重参数和激活输出的位宽要求。示例性的,量化需求可以是将待处理网络模型的尺寸缩小4倍,将待处理网络模型的权重参数和激活输出由float32转int8。本申请实施例对量化需求不做具体限定。其中,初始网络模型的精度远小于待处理网络模型,初始网络模型的尺寸也远小于待处理网络模型的尺寸。In the embodiment of the present application, the computer device performs quantization processing on the weight parameters and activation output of the network model to be processed according to the quantization requirements, and obtains the initial weight parameter and the initial quantization parameter of the activation output, and the initial quantization parameter based on the initial weight parameter and the activation output , to build an initial network model. Wherein, the quantitative requirement may be input by the user to the computer device based on the input component of the computer device. Quantitative requirements can be changed according to the actual situation. Among them, quantization requirements can represent the bit width requirements of weight parameters and activation outputs. Exemplarily, the quantization requirement may be to reduce the size of the network model to be processed by 4 times, and to convert the weight parameter and activation output of the network model to be processed from float32 to int8. The embodiment of the present application does not specifically limit the quantitative requirement. Wherein, the accuracy of the initial network model is much smaller than that of the network model to be processed, and the size of the initial network model is also much smaller than the size of the network model to be processed.
在本申请实施例中,计算机设备可以利用训练后量化方法(Post-Training Quantization,PTQ)或者训练感知量化方法(Training-Aware Quantization,TAQ)对待处理网络模型的权重参数和激活输出分别进行量化处理。本申请实施例对待处理网络模型的权重参数和激活输出分别进行量化处理的方法不做具体限定。In the embodiment of the present application, the computer device can use the post-training quantization method (Post-Training Quantization, PTQ) or the training perception quantization method (Training-Aware Quantization, TAQ) to perform quantization processing on the weight parameters and activation outputs of the network model to be processed respectively . The embodiment of the present application does not specifically limit the method of separately quantizing the weight parameter and the activation output of the network model to be processed.
为了更好地理解本申请实施例的网络模型量化方法,下面以利用PTQ方法对待处理网络模型的权重参数和激活输出分别进行量化处理进行举例讲解。In order to better understand the network model quantification method of the embodiment of the present application, the following uses the PTQ method to quantify the weight parameters and activation outputs of the network model to be processed respectively for explanation.
采用PTQ量化方法的中心思想就是计算量化阈值T,根据量化阈值T确定待处理网络模型的权重与初始网络模型权重之间的映射关系以及待处理网络模型的激活输出与初始网络模型激活输出之间的映射关系。The central idea of using the PTQ quantization method is to calculate the quantization threshold T, and determine the mapping relationship between the weight of the network model to be processed and the weight of the initial network model and the activation output of the network model to be processed and the activation output of the initial network model according to the quantization threshold T mapping relationship.
以将待处理网络模型的权重和激活输出由float32转int8为例,其中, 待处理网络模型的权重参数与初始网络模型权重参数之间的映射关系以及待处理网络模型的激活输出与初始网络模型的激活输出之间的映射关系中包括饱和映射和不饱和映射两种,一般进行权重的量化时,采用如图2a所示的不饱和映射,这时量化阈值T就等于最大值。对激活输出进行量化时,一般采用饱和映射,如图2b所示。而饱和映射时的量化阈值T可以采用相对熵散度或者均方差方法查找。查找量化阈值T的标准是,找到这样一个阈值,基于这个阈值对原始值进行了裁剪操作,仍然能与原始值差异最小。Take the weight and activation output of the network model to be processed from float32 to int8 as an example, where the mapping relationship between the weight parameters of the network model to be processed and the weight parameters of the initial network model, and the activation output of the network model to be processed and the initial network model The mapping relationship between the activation outputs includes saturated mapping and unsaturated mapping. Generally, when weights are quantized, the unsaturated mapping shown in Figure 2a is used. At this time, the quantization threshold T is equal to the maximum value. When quantizing the activation output, a saturation map is generally used, as shown in Figure 2b. The quantization threshold T in saturated mapping can be searched by relative entropy divergence or mean square error method. The criterion for finding the quantization threshold T is to find such a threshold, based on which the original value is clipped, and the difference from the original value is still the smallest.
在饱和量化过程中,超过阈值T的部分,需要进行裁剪如公式(1)的第二项所示,所谓裁剪就是比如T=5,如果原始值中有6,那么它大于5,此时6也会强制转换为5。In the saturated quantization process, the part exceeding the threshold T needs to be clipped as shown in the second item of formula (1). The so-called clipping is, for example, T=5. If there is 6 in the original value, then it is greater than 5. At this time, 6 Also coerces to 5.
Figure PCTCN2022078256-appb-000001
Figure PCTCN2022078256-appb-000001
其中,s为量化映射尺度因子,x为原始值,q(x,T)表示x经量化-反量化后的数值,n为需要量化的位宽数,T为量化阈值,Among them, s is the quantization mapping scale factor, x is the original value, q(x, T) represents the value of x after quantization-inverse quantization, n is the number of bit widths to be quantized, T is the quantization threshold,
例如,x为原始float32数,转换后int8的数为q_x;q_x=x/s,n为需要量化的位宽数,如8-bit,4-bit,2-bit,1-bit等,当n=8-bit时,其中s=T/127即公式1中的第一项所示。
Figure PCTCN2022078256-appb-000002
表示取整,可以是四舍五入,也可以是向上或向下取整。
For example, x is the original float32 number, and the converted int8 number is q_x; q_x=x/s, n is the bit width number to be quantized, such as 8-bit, 4-bit, 2-bit, 1-bit, etc., when When n=8-bit, s=T/127 is shown in the first item in Formula 1.
Figure PCTCN2022078256-appb-000002
Indicates rounding, which can be rounded up or rounded down.
步骤102,获取第一校准网络模型,第一校准网络模型的精度高于初始网络模型的精度,基于第一校准网络模型对初始网络模型的初始权重参数进行调整,得到第一预处理模型。Step 102: Obtain a first calibration network model, the accuracy of the first calibration network model is higher than that of the initial network model, and adjust the initial weight parameters of the initial network model based on the first calibration network model to obtain a first preprocessing model.
其中,第一校准网络模型的精度高于初始网络模型的精度可以表征第一校准网络模型的性能精度高于初始网络模型的性能精度以及第一校准网络模型的参数的带宽精度高于初始网络模型的参数的带宽精度中的至少一项。Among them, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, which can represent that the performance accuracy of the first calibration network model is higher than the performance accuracy of the initial network model and the bandwidth accuracy of the parameters of the first calibration network model is higher than that of the initial network model. At least one of the bandwidth precision of the parameters.
具体地,计算机设备可以利用第二目标图像训练集对神经网络模型进行训练,训练得到的第一校准网络模型。其中,第一校准网络模型的精度高于初始网络模型的精度。第一校准网络模型可以用来进行图像识别、图像检测以及图像分类任务处理。本申请实施例对第一校准网络模型的应用场景不做具体限定。Specifically, the computer device may use the second target image training set to train the neural network model to obtain the first calibration network model. Wherein, the precision of the first calibration network model is higher than the precision of the initial network model. The first calibration network model can be used for image recognition, image detection and image classification task processing. The embodiment of the present application does not specifically limit the application scenario of the first calibration network model.
作为可选的实施方式,计算机设备还可以接收其他设备发送的第一校准网络模型或者接收用户输入的第一校准网络模型,本申请实施例对计算机设备获取第一校准网络模型的方式不做具体限定。As an optional implementation, the computer device may also receive the first calibration network model sent by other devices or the first calibration network model input by the user. The embodiment of this application does not specifically describe the method for the computer device to obtain the first calibration network model. limited.
进一步,计算机设备可以根据第一校准网络模型对初始网络模型的初始权重参数进行调整,得到第一预处理模型。Further, the computer device may adjust the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model.
作为可选的实施方式,计算机设备还可以将第一校准网络模型的输出结果与初始网络模型的输出结果进行对比,根据对比结果调整初始网络模型的初始权重参数,得到第一预处理模型。在步骤101中,将全精度网络转换为低精度的初始网络模型后,模型性能精度降低的主要原因一般来自于两部分:权重参数的改变和激活阈值的选取。在训练后量化过程中,所有的权重参数通常采用同一种近似方法进行截取,但同一种近似方法可能并不适应于所有的权重参数,因此这将无形中引入噪声而影响网络模型的特征提取能力。在本步骤中,通过将第一校准网络模型的输出结果与初始网络模型的输出结果进行对比,利用第一校准网络模型对初始网络模型的初始权重参数进行校正,降低上述过程中产生的误差。As an optional implementation manner, the computer device may also compare the output result of the first calibration network model with the output result of the initial network model, adjust the initial weight parameters of the initial network model according to the comparison result, and obtain the first preprocessing model. In step 101, after the full-precision network is converted to a low-precision initial network model, the main reason for the decrease in model performance accuracy generally comes from two parts: the change of weight parameters and the selection of activation thresholds. In the quantization process after training, all weight parameters are usually intercepted by the same approximation method, but the same approximation method may not be suitable for all weight parameters, so this will introduce noise virtually and affect the feature extraction ability of the network model . In this step, by comparing the output results of the first calibration network model with the output results of the initial network model, the first calibration network model is used to correct the initial weight parameters of the initial network model to reduce errors generated in the above process.
步骤103,获取第二校准网络模型,第二校准网络模型的精度高于第一预处理模型的精度,基于第二校准网络模型对第一预处理模型的激活输出的初始量化参数进行调整,得到目标网络模型。 Step 103, obtain the second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model, adjust the initial quantization parameters of the activation output of the first preprocessing model based on the second calibration network model, and obtain target network model.
其中,第二校准网络模型的精度高于第一预处理模型的精度可以表征第二校准网络模型的性能精度高于第一预处理模型的性能精度以及第二校准网络模型的参数的带宽精度高于第一预处理模型的参数的带宽精度中的至少一项。Among them, the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model, which can indicate that the performance accuracy of the second calibration network model is higher than the performance accuracy of the first preprocessing model and the bandwidth accuracy of the parameters of the second calibration network model is high. at least one of the bandwidth accuracy of the parameters of the first preprocessing model.
具体地,计算机设备可以利用第三目标图像训练集对神经网络模型进行训练,训练得到的第二校准网络模型。其中,第二校准网络模型的精度高于第一预处理模型的精度。第二校准网络模型可以用来进行图像识别、图像检测以及图像分类任务处理。本申请实施例对第二校准网络模型的应用场景不做具体限定。Specifically, the computer device may use the third target image training set to train the neural network model, and obtain the second calibration network model. Wherein, the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model. The second calibration network model can be used for image recognition, image detection and image classification task processing. The embodiment of the present application does not specifically limit the application scenario of the second calibration network model.
作为可选的实施方式,计算机设备还可以接收其他设备发送的第二校准网络模型或者接收用户输入的第二校准网络模型,本申请实施例对计算机设备获取第二校准网络模型的方式不做具体限定。第二校准网络模型可以和待处理网络模型为同一预训练好的全精度网络模型,也可以为不同的预训练好的全精度网络模型。As an optional implementation, the computer device can also receive the second calibration network model sent by other devices or receive the second calibration network model input by the user. The embodiment of this application does not specifically describe the method for the computer device to obtain the second calibration network model. limited. The second calibration network model may be the same pre-trained full-precision network model as the network model to be processed, or may be a different pre-trained full-precision network model.
作为一种实施方式,计算机设备可以根据第二校准网络模型对第一预处理模型的激活输出的初始量化参数进行调整,得到目标网络模型。为提高低精度的初始网络模型的精度,除采用步骤102对初始权重参数进行调整外,在本步骤103中,进一步对初始的激活阈值进行调整,计算机设备还可以将第二校准网络模型的输出结果与第一预处理模型的输出结果进行对比,根据对比结果对第一预处理模型的激活输出的初始量化参数进行调整,得到目标网络模型,从而进一步降低全精度模型转化为低精度模型后的损失,提高了模型的精度。As an implementation manner, the computer device may adjust the initial quantization parameters of the activation output of the first preprocessing model according to the second calibration network model to obtain the target network model. In order to improve the accuracy of the low-precision initial network model, in addition to adjusting the initial weight parameters in step 102, in this step 103, the initial activation threshold is further adjusted, and the computer device can also output the second calibration network model The results are compared with the output results of the first preprocessing model, and the initial quantization parameters of the activation output of the first preprocessing model are adjusted according to the comparison results to obtain the target network model, thereby further reducing the cost of converting the full precision model to a low precision model. loss, which improves the accuracy of the model.
在本实施例中,首先获取预训练好的全精度网络模型,并将其作为待处理网络模型,然后根据量化需求对待处理网络模型权重参数和激活输出分别进行量化处理,得到初始权重参数和激活输出的初始量化参数,基于初始权重参数以及激活输出的初始量化参数,构建初始网络模型。由于对待处理网络模型的权重参数和激活输出的初始量化参数进行了量化,因此,使得基于初始权重参数和激活输出的初始量化参数构建的初始网络模型的尺寸要远小于待处理网络模型,从而保证初始网络模型可以在一些终端设备和边缘设备上运行。此外,由于经过量化处理后得到的初始网络模型的精度较低,因此,可以基于精度高于初始网络模型的第一校准网络模型对初始网络模型的初始权重参数进行调整,得到第一预处理模型,从而可以保证第一预处理模型的权重参数的准确性,从而提高第一预处理模型的精度。此外,还可以基于精度高于第一预处理模型的第二校准网络模型对第一预处理模型的激活输出的初始量化参数进行调整,得到目标网络模型。从而使得目标网络模型不仅尺寸较小,而且目标网络模型的权重参数和激活输出均较准确,进一步提高了目标网络模型的精度,解决了通过量化、裁剪等模型压缩的方式缩小大型深度神经网络模型,严重降低了深度神经网络模型的精度的问题。In this embodiment, first obtain the pre-trained full-precision network model, and use it as the network model to be processed, and then perform quantization processing on the weight parameters and activation output of the network model to be processed according to the quantization requirements, and obtain the initial weight parameter and activation output. The initial quantization parameter of the output, based on the initial weight parameter and the initial quantization parameter of the activation output, constructs the initial network model. Since the weight parameters of the network model to be processed and the initial quantization parameters of the activation output are quantized, the size of the initial network model constructed based on the initial weight parameters and the initial quantization parameters of the activation output is much smaller than the network model to be processed, thus ensuring The initial network model can run on some end devices and edge devices. In addition, because the accuracy of the initial network model obtained after quantization processing is low, the initial weight parameters of the initial network model can be adjusted based on the first calibration network model with a higher accuracy than the initial network model to obtain the first preprocessing model , so that the accuracy of the weight parameters of the first preprocessing model can be guaranteed, thereby improving the accuracy of the first preprocessing model. In addition, the initial quantization parameters of the activation output of the first preprocessing model may be adjusted based on the second calibration network model whose accuracy is higher than that of the first preprocessing model, to obtain the target network model. As a result, not only the size of the target network model is smaller, but also the weight parameters and activation output of the target network model are more accurate, which further improves the accuracy of the target network model and solves the problem of shrinking large deep neural network models through model compression such as quantization and cropping. , which seriously reduces the accuracy of the deep neural network model.
在本申请一个可选的实施例中,上述步骤102中的“基于第一校准网络模型对初始网络模型的初始权重参数进行调整,得到第一预处理模型”,可以包括以下内容:In an optional embodiment of the present application, the "adjust the initial weight parameters of the initial network model based on the first calibration network model to obtain the first preprocessing model" in the above step 102 may include the following content:
基于知识蒸馏的学习方法,根据第一校准网络模型调整初始网络模型的初始权重参数,得到第一预处理模型。Based on the learning method of knowledge distillation, the initial weight parameters of the initial network model are adjusted according to the first calibration network model to obtain the first preprocessing model.
其中,知识蒸馏指的是模型压缩的思想,通过一步一步地使用一个较大的已经训练好的网络去教导一个较小的网络确切地去做什么。“软标签”指的是大网络在每一层卷积后输出的特征向量。然后,通过尝试复制大网络在每一层的输出(不仅仅是最终的损失),小网络被训练以学习大网络的准确行为。Among them, knowledge distillation refers to the idea of model compression, by using a larger trained network step by step to teach a smaller network exactly what to do. "Soft labels" refer to the feature vectors output by the large network after each layer of convolution. Then, the small network is trained to learn the exact behavior of the large network by trying to replicate the output of the large network at each layer (not just the final loss).
具体地,计算机设备可以利用知识蒸馏的学习方法,将第一校准网络模型中每层网络输出的特征向量与初始网络模型中每层网络输出的特征向量进行对比,然后根据对比结果,以及第一校准网络模型中每层网络对应的权重参数,对初始网络模型中的初始权重参数进行调整。Specifically, the computer device can use the knowledge distillation learning method to compare the feature vectors output by each layer of the network in the first calibration network model with the feature vectors output by each layer of the network in the initial network model, and then according to the comparison results, and the first Calibrate the weight parameters corresponding to each layer of network in the network model, and adjust the initial weight parameters in the initial network model.
与图1所示的实施例相比,在本实施例中,由于第一校准网络模型的精度高于初始网络模型,因此,基于知识蒸馏的学习方法,采用第一校准网络模型作为大的教师网络模型对小的量化后的初始网络模型进行指导学习获得更优的权重参数,根据第一校准网络模型调整初始网络模型的初始权重参数,得到第一预处理模型。从而可以保证得到的第一预处理模型的权重 参数的准确性,提高第一预处理模型的精度。Compared with the embodiment shown in Figure 1, in this embodiment, since the accuracy of the first calibration network model is higher than that of the initial network model, the learning method based on knowledge distillation uses the first calibration network model as a large teacher The network model guides and learns the small quantized initial network model to obtain better weight parameters, adjusts the initial weight parameters of the initial network model according to the first calibration network model, and obtains the first preprocessing model. Therefore, the accuracy of the weight parameters of the obtained first preprocessing model can be guaranteed, and the accuracy of the first preprocessing model can be improved.
在本申请一个可选的实施例中,如图3所示,上述“基于知识蒸馏的学习方法,根据第一校准网络模型调整初始网络模型的初始权重参数,得到第一预处理模型”,可以包括以下步骤:In an optional embodiment of the present application, as shown in FIG. 3, the above-mentioned "learning method based on knowledge distillation, adjusting the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model" can be Include the following steps:
步骤301,获取第一训练图像集。 Step 301, acquire a first training image set.
其中,第一训练图像集带有硬标签。硬标签是第一训练图像集中各图像对应的标签。示例性的,假设初始网络模型以及第一校准网络模型的作用均为从第一训练图像集中的各图像中识别出目标物体,那么该硬标签就可以表示对第一训练图像集中各图像中的目标物体进行了标注。Among them, the first training image set has hard labels. The hard labels are labels corresponding to each image in the first training image set. Exemplarily, assuming that the role of the initial network model and the first calibration network model is to identify the target object from each image in the first training image set, then the hard label can represent the The target object is labeled.
具体地,计算机设备可以接收其他设备发送的第一训练图像集,与可以接收用户输入的第一训练图像集。其中,第一训练图像集带有的硬标签可以是人为标注的,还可以是计算机设备基于神经网络模型进行标注的。本申请实施例对第一训练图像集的硬标签的标注方式不做具体限定。其中,第一训练图像集中包括多张第一训练图像。Specifically, the computer device may receive the first training image set sent by other devices, and may receive the first training image set input by the user. Wherein, the hard labels attached to the first training image set may be marked manually, or may be marked by a computer device based on a neural network model. The embodiment of the present application does not specifically limit the manner of labeling the hard tags of the first training image set. Wherein, the first training image set includes multiple first training images.
步骤302,将第一训练图像集输入至初始网络模型,输出第一结果。 Step 302, input the first training image set into the initial network model, and output the first result.
具体地,计算机设备将第一训练图像集输入至初始网络模型中,初始网络模型对第一训练图像集进行特征提取,基于提取后的特征输出第一结果。Specifically, the computer device inputs the first training image set into the initial network model, and the initial network model performs feature extraction on the first training image set, and outputs a first result based on the extracted features.
步骤303,将第一训练图像集输入至第一校准网络模型,输出第二结果。 Step 303, input the first training image set into the first calibration network model, and output the second result.
具体地,计算机设备将第一训练图像集输入至第一校准网络模型中,第一校准网络模型对第一训练图像集进行特征提取,基于提取后的特征输出第二结果。Specifically, the computer device inputs the first training image set into the first calibration network model, the first calibration network model performs feature extraction on the first training image set, and outputs a second result based on the extracted features.
步骤304,基于硬标签、第一结果和第二结果,调整初始网络模型的初始权重参数,得到第一预处理模型。 Step 304, based on the hard label, the first result and the second result, adjust the initial weight parameters of the initial network model to obtain a first preprocessing model.
具体地,计算机设备将初始网络模型输出的第一结果和第一训练图像集带有的硬标签进行对比,并将初始网络模型输出的第一结果和第一校准网络模型输出的第二结果进行对比。计算机设备根据对比结果,调整初始网络模型的初始权重参数,得到第一预处理模型。Specifically, the computer device compares the first result output by the initial network model with the hard labels carried by the first training image set, and compares the first result output by the initial network model with the second result output by the first calibration network model. Compared. The computer device adjusts the initial weight parameters of the initial network model according to the comparison result to obtain the first preprocessing model.
示例性地,如图4所示。其中,图像X可以为第一训练图像集中的图像,教师网络为第一校准网络模型,W_T为教师网络的权重参数。学生网络为初始网络模型,W_S为学生网络的初始权重参数。将图像X输入至教师网络,教师网络输出第二结果,即P_T。将图像X输入至学生网络,学生网络输出第一结果,即P_S。计算机设备基于P_T、P_S以及标签Y,调整初始网络模型的初始权重参数,得到第一预处理模型。Exemplarily, as shown in FIG. 4 . Wherein, the image X can be an image in the first training image set, the teacher network is the first calibration network model, and W_T is the weight parameter of the teacher network. The student network is the initial network model, and W_S is the initial weight parameter of the student network. The image X is input to the teacher network, and the teacher network outputs the second result, namely P_T. The image X is input to the student network, and the student network outputs the first result, namely P_S. The computer device adjusts initial weight parameters of the initial network model based on P_T, P_S and label Y to obtain a first preprocessing model.
与上述实施例相比,在本实施例中,利用分别将带有硬标签的第一训练图像集输入至初始网络模型和第一校准网络模型,分别输出第一结果和第 二结果,利用第一结果和第二结果以及第一结果和硬标签之间的关系,调整初始网络模型的初始权重参数,从而可以使得初始网络模型输出的第一结果更加接近第二结果以及硬标签,从而保证权重参数调整后得到的第一预处理模型的精度提高了。Compared with the above-mentioned embodiment, in this embodiment, the first training image set with hard labels is input to the initial network model and the first calibration network model respectively, and the first result and the second result are respectively output, and the first result is used The relationship between the first result and the second result and the first result and the hard label, adjust the initial weight parameters of the initial network model, so that the first result output by the initial network model can be closer to the second result and the hard label, thereby ensuring the weight The accuracy of the first preprocessed model obtained after parameter tuning is improved.
在本申请一个可选的实施例中,如图5所示,上述步骤304中的“基于硬标签、第一结果和第二结果,调整初始网络模型的初始权重参数,得到第一预处理模型”可以包括以下步骤:In an optional embodiment of the present application, as shown in FIG. 5, in the above step 304, "based on the hard label, the first result and the second result, adjust the initial weight parameters of the initial network model to obtain the first preprocessing model ” may include the following steps:
步骤501,基于第一结果和硬标签生成第一损失函数。 Step 501, generate a first loss function based on the first result and the hard label.
具体地,计算机设备基于初始网络模型输出的第一结果和第一训练图像集对应的硬标签生成第一损失函数。其中,第一损失函数表示初始网络模型在训练过程中的损失函数。可选的,第一损失函数可以用H(Y,P_S)来表示,其中,Y表示第一训练图像集对应的硬标签,P_S表示初始网络模型输出的第一结果。Specifically, the computer device generates the first loss function based on the first result output by the initial network model and the hard label corresponding to the first training image set. Wherein, the first loss function represents the loss function of the initial network model during the training process. Optionally, the first loss function may be represented by H(Y, P_S), where Y represents the hard label corresponding to the first training image set, and P_S represents the first result output by the initial network model.
步骤502,基于第一结果和第二结果生成第二损失函数。 Step 502, generating a second loss function based on the first result and the second result.
具体地,计算机设备基于初始网络模型输出的第一结果和第一校准网络模型输出的第二结果,生成第二损失函数。其中,第二损失函数表示初始网络模型作为学生网络在模仿第一校准网络模型过程中的损失函数。可选的,第二损失函数可以用H(P_T,P_S)来表示,其中,P_T表示第一校准网络模型输出的第二结果,P_S表示初始网络模型输出的第一结果。Specifically, the computer device generates the second loss function based on the first result output by the initial network model and the second result output by the first calibration network model. Wherein, the second loss function represents the initial network model as the loss function of the student network in the process of imitating the first calibration network model. Optionally, the second loss function may be represented by H(P_T, P_S), where P_T represents the second result output by the first calibration network model, and P_S represents the first result output by the initial network model.
步骤503,利用第一损失函数和第二损失函数,生成第一目标损失函数,并基于第一目标损失函数调整初始网络模型的初始权重参数,得到第一预处理模型。 Step 503, using the first loss function and the second loss function to generate a first target loss function, and adjusting the initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.
可选的,计算机设备可以将第一损失函数和第二损失函数进行相加,生成第一目标损失函数,并基于第一目标损失函数调整初始网络模型的初始权重参数,得到第一预处理模型。Optionally, the computer device may add the first loss function and the second loss function to generate the first target loss function, and adjust the initial weight parameters of the initial network model based on the first target loss function to obtain the first preprocessing model .
示例性的,第一目标损失函数可以为L w(x,W_S)=H(Y,P_S)+H(P_T,P_S);其中,P_T表示第一校准网络模型输出的第二结果,P_S表示初始网络模型输出的第一结果,Y表示第一训练图像集对应的硬标签,W_S为初始网络模型的初始权重参数,X可以为第一训练图像集中的图像。 Exemplarily, the first target loss function may be Lw (x, W_S)=H(Y, P_S)+H(P_T, P_S); wherein, P_T represents the second result output by the first calibration network model, and P_S represents The first result output by the initial network model, Y represents the hard label corresponding to the first training image set, W_S is the initial weight parameter of the initial network model, and X can be an image in the first training image set.
可选的,计算机设备还可以将第一损失函数乘以第一权重参数,将第二损失函数乘以第二权重参数,并将乘以相应权重后的第一损失函数和第二损失函数进行相加,得到第一目标损失函数,并基于第一目标损失函数调整初始网络模型的初始权重参数,得到第一预处理模型。Optionally, the computer device may also multiply the first loss function by the first weight parameter, multiply the second loss function by the second weight parameter, and perform the multiplication of the first loss function and the second loss function after multiplying the corresponding weights. Adding up to obtain the first objective loss function, and adjusting the initial weight parameters of the initial network model based on the first objective loss function to obtain the first preprocessing model.
示例性的,第一目标损失函数可以为L w(x,W_S)=αH(Y,P_S)+βH(P_T,P_S),其中,P_T表示第一校准网络模型输出的第二结果,P_S表 示初始网络模型输出的第一结果,Y表示第一训练图像集对应的硬标签,W_S为初始网络模型的初始权重参数,X可以为第一训练图像集中的图像;α为第一权重参数,β为第二权重参数。计算机设备可以通过调整α和β的值,调整各损失函数在训练过程中的占比。本申请实施例对α和β的值不做具体限定。 Exemplarily, the first target loss function may be Lw (x, W_S)=αH(Y,P_S)+βH(P_T,P_S), where P_T represents the second result output by the first calibration network model, and P_S represents The first result output by the initial network model, Y represents the hard label corresponding to the first training image set, W_S is the initial weight parameter of the initial network model, X can be an image in the first training image set; α is the first weight parameter, β is the second weight parameter. The computer device can adjust the proportion of each loss function in the training process by adjusting the values of α and β. The embodiment of the present application does not specifically limit the values of α and β.
如图3的实施例相比,在本实施例中,基于初始网络模型输出的第一结果和第一训练图像集的硬标签生成第一损失函数,基于初始网络模型输出的第一结果和第一校准网络模型输出的第二结果生成第二损失函数。其中,第一损失函数可以用来表征第一结果和硬标签之间的差距,第二损失函数可以用来表征第一结果和第二结果之间的差距。因此,利用第一损失函数和第二损失函数,生成的第一目标损失函数可以表征第一结果和硬标签以及第一结果和第二结果之间的差距。基于第一目标损失函数调整初始网络模型的初始权重参数,得到第一预处理模型。从而提高了第一预处理模型的精度。Compared with the embodiment of FIG. 3 , in this embodiment, the first loss function is generated based on the first result output by the initial network model and the hard label of the first training image set, and based on the first result output by the initial network model and the first A second result output by a calibration network model generates a second loss function. Wherein, the first loss function can be used to represent the gap between the first result and the hard label, and the second loss function can be used to represent the gap between the first result and the second result. Therefore, using the first loss function and the second loss function, the generated first target loss function can characterize the first result and the hard label and the gap between the first result and the second result. The initial weight parameters of the initial network model are adjusted based on the first objective loss function to obtain a first preprocessing model. Thus improving the accuracy of the first preprocessing model.
在本申请一个可选的实施例中,如图6所示,上述步骤103中的“基于第二校准网络模型对第一预处理模型的激活输出的初始量化参数进行调整,得到目标网络模型”,可以包括以下步骤:In an optional embodiment of the present application, as shown in FIG. 6, in step 103 above, "adjust the initial quantization parameters of the activation output of the first preprocessing model based on the second calibration network model to obtain the target network model" , can include the following steps:
步骤601,基于知识蒸馏的学习方法,根据第二校准网络模型调整第一预处理模型的激活量化阈值。 Step 601, based on the learning method of knowledge distillation, the activation quantization threshold of the first preprocessing model is adjusted according to the second calibration network model.
其中,知识蒸馏指的是模型压缩的思想,通过一步一步地使用一个较大的已经训练好的网络去教导一个较小的网络确切地去做什么。“软标签”指的是大网络在每一层卷积后输出的特征向量。然后,通过尝试复制大网络在每一层的输出(不仅仅是最终的损失),小网络被训练以学习大网络的准确行为。Among them, knowledge distillation refers to the idea of model compression, by using a larger trained network step by step to teach a smaller network exactly what to do. "Soft labels" refer to the feature vectors output by the large network after each layer of convolution. Then, the small network is trained to learn the exact behavior of the large network by trying to replicate the output of the large network at each layer (not just the final loss).
具体地,计算机设备可以利用知识蒸馏的学习方法,将第二校准网络模型中每层网络输出的特征向量与第一预处理模型中每层网络输出的特征向量进行对比,然后根据对比结果,以及第一校准网络模型中每层网络对应的激活量化阈值,对第一预处理模型中的激活量化阈值进行调整。Specifically, the computer device can use a knowledge distillation learning method to compare the feature vectors output by each layer of the network in the second calibration network model with the feature vectors output by each layer of the network in the first preprocessing model, and then according to the comparison results, and The activation quantization threshold corresponding to each network layer in the first calibration network model is adjusted to the activation quantization threshold in the first preprocessing model.
步骤602,根据调整后的激活量化阈值,调整第一预处理模型的激活输出的初始量化参数,得到目标网络模型。Step 602: Adjust the initial quantization parameter of the activation output of the first preprocessing model according to the adjusted activation quantization threshold to obtain a target network model.
具体地,计算机设备在对激活量化阈值进行调整完毕之后,计算机设备可以根据调整后的激活量化阈值与激活输出的初始量化参数之间的对应的关系,调整第一预处理模型的激活输出的初始量化参数,根据调整后的激活输出的量化参数,得到目标网络模型。Specifically, after the computer device has adjusted the activation quantization threshold, the computer device can adjust the initial activation output of the first preprocessing model according to the corresponding relationship between the adjusted activation quantization threshold and the initial quantization parameter of the activation output The quantization parameter, according to the quantization parameter of the adjusted activation output, obtains the target network model.
在本实施中,由于第二校准网络模型的精度高于第一预处理模型,因此, 基于知识蒸馏的学习方法,采用第二校准网络模型作为大的教师网络模型对小的量化后的第一预处理模型进行指导学习获得更优的模型参数。根据第二校准网络模型调整第一预处理模型的激活量化阈值,可以保证调整后的激活量化阈值的准确性。进一步地,根据调整后的激活量化阈值,调整第一预处理模型的激活输出的初始量化参数,得到目标网络模型,可以进一步保证调整后的第一预处理模型的激活输出的初始量化参数的准确性,从而提高得到的目标网络模型的精度。In this implementation, since the accuracy of the second calibration network model is higher than that of the first preprocessing model, the learning method based on knowledge distillation adopts the second calibration network model as the large teacher network model for the small quantized first Preprocessing the model for guided learning to obtain better model parameters. Adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model can ensure the accuracy of the adjusted activation quantization threshold. Further, according to the adjusted activation quantization threshold, the initial quantization parameter of the activation output of the first preprocessing model is adjusted to obtain the target network model, which can further ensure the accuracy of the initial quantization parameter of the activation output of the adjusted first preprocessing model performance, thereby improving the accuracy of the obtained target network model.
在本申请一个可选的实施例中,如图7所示,上述步骤601中的“基于知识蒸馏的学习方法,根据第二校准网络模型调整第一预处理模型的激活量化阈值”,可以包括以下步骤:In an optional embodiment of the present application, as shown in FIG. 7 , the "knowledge distillation-based learning method, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model" in step 601 may include The following steps:
步骤701,获取第二训练图像集。 Step 701, acquire a second training image set.
具体地,计算机设备可以接收其他设备发送的第二训练图像集,与可以接收用户输入的第二训练图像集。其中,第二训练图像集可以是无标签图像,也可以是带有标签的图像,本申请实施对第二训练图像集不做具体限定。此外,第二训练图像集可以与第一训练图像集相同,也可以与第一训练图像集不同。其中,第二训练图像集可以包括多张第二训练图像。Specifically, the computer device may receive the second training image set sent by other devices, and may receive the second training image set input by the user. Wherein, the second training image set may be an unlabeled image or an image with a label, and the implementation of the present application does not specifically limit the second training image set. In addition, the second training image set can be the same as the first training image set, or can be different from the first training image set. Wherein, the second training image set may include multiple second training images.
步骤702,将第二训练图像集输入至第一预处理模型,输出第三结果。 Step 702, input the second training image set into the first preprocessing model, and output the third result.
具体地,计算机设备将第二训练图像集输入至第一预处理模型中,第一预处理模型对第二训练图像集进行特征提取,基于提取后的特征输出第三结果。Specifically, the computer device inputs the second training image set into the first preprocessing model, and the first preprocessing model performs feature extraction on the second training image set, and outputs a third result based on the extracted features.
步骤703,将第二训练图像集输入至第二校准网络模型,输出第四结果。 Step 703, input the second training image set into the second calibration network model, and output the fourth result.
具体地,计算机设备将第二训练图像集输入至第二校准网络模型中,第二校准网络模型对第二训练图像集进行特征提取,基于提取后的特征输出第四结果。Specifically, the computer device inputs the second training image set into the second calibration network model, and the second calibration network model performs feature extraction on the second training image set, and outputs a fourth result based on the extracted features.
步骤704,基于第三结果和第四结果,调整第一预处理模型的激活量化阈值。Step 704: Adjust the activation quantization threshold of the first preprocessing model based on the third result and the fourth result.
具体地,计算机设备将第一预处理模型输出的第三结果和第二校准网络模型输出的第四结果进行对比。计算机设备根据对比结果,调整第一预处理模型的激活量化阈值。Specifically, the computer device compares the third result output by the first preprocessing model with the fourth result output by the second calibration network model. The computer device adjusts the activation quantization threshold of the first preprocessing model according to the comparison result.
示例性的,如图8所示,其中,图像X可以为第二训练图像集中的图像,全精教师网络为第二校准网络模型,低精学生网络为第一预处理模型。计算机设备将图像X输入至全精教师网络,全精教师网络输出第四结果,即图8中的P_T。计算机设备将图像X输入至低精学生网络,低精学生网络输出第三结果,即图8中的P_S。计算机设备基于P_T、P_S,调整第一预处理模型的激活量化阈值。Exemplarily, as shown in FIG. 8 , the image X may be an image in the second training image set, the full-precision teacher network is the second calibration network model, and the low-precision student network is the first preprocessing model. The computer device inputs the image X into the all-fine teacher network, and the all-fine teacher network outputs the fourth result, namely P_T in Fig. 8 . The computer device inputs the image X to the low-skilled student network, and the low-skilled student network outputs the third result, namely P_S in FIG. 8 . The computer device adjusts the activation quantization threshold of the first preprocessing model based on P_T, P_S.
在本实施例中,将第二训练图像集分别输入至第一预处理模型和第二校准网络模型,输出第三结果和第四结果,并基于第三结果和第四结果,对第一预处理模型的激活量化阈值进行调整,从而可以保证的调整后的激活量化阈值的准确性,进一步保证第一预处理模型的精度。In this embodiment, the second training image set is input to the first preprocessing model and the second calibration network model respectively, and the third result and the fourth result are output, and based on the third result and the fourth result, the first preprocessing The activation quantization threshold of the processing model is adjusted, so that the accuracy of the adjusted activation quantization threshold can be guaranteed, and the accuracy of the first preprocessing model can be further ensured.
在本申请一个可选的实施例中,如图9所示,上述步骤704中的“基于第三结果和第四结果,调整第一预处理模型的激活量化阈值”,可以包括如下步骤:In an optional embodiment of the present application, as shown in FIG. 9 , the "adjusting the activation quantization threshold of the first preprocessing model based on the third result and the fourth result" in the above step 704 may include the following steps:
步骤901,基于第三结果和第四结果,生成第二目标损失函数。 Step 901, generate a second target loss function based on the third result and the fourth result.
具体地,计算机设备基于第一预处理模型输出的第三结果和第二校准网络模型输出的第四结果,生成第二目标损失函数。其中,第二目标损失函数的值越小,则越可以表示在相同网络结构下,第一预处理模型采用阈值T量化后,仍然拥有同第二校准网络模型相近的预测能力。Specifically, the computer device generates the second target loss function based on the third result output by the first preprocessing model and the fourth result output by the second calibration network model. Among them, the smaller the value of the second objective loss function, the more it can indicate that under the same network structure, the first preprocessing model still has a predictive ability similar to that of the second calibration network model after being quantized with the threshold T.
示例性的,第二损失函数可以为L A(x,T)=H(P_T,P_S),其中,P_T表示第二校准网络模型输出的第四结果,P_S第一预处理模型输出的第三结果,T表示第一预处理模型的激活量化阈值,X表示第二训练图像集中的图像。 Exemplarily, the second loss function may be L A (x, T)=H(P_T, P_S), where P_T represents the fourth result output by the second calibration network model, and P_S represents the third result output by the first preprocessing model. As a result, T represents the activation quantization threshold of the first pre-processed model and X represents the images in the second training image set.
步骤902,基于第二目标损失函数,调整第一预处理模型的激活量化阈值。Step 902: Adjust the activation quantization threshold of the first preprocessing model based on the second objective loss function.
具体地,计算机设备基于第二目标损失函数计算得到的函数值,调整第一预处理模型的激活量化阈值,Specifically, the computer device adjusts the activation quantization threshold of the first preprocessing model based on the function value calculated by the second target loss function,
其中,在本申请实施例中采用对称的均匀量化模型。Wherein, a symmetrical uniform quantization model is adopted in the embodiment of the present application.
在本实施例中,基于第三结果和第四结果,生成第二目标损失函数,第二目标损失函数的值越小,则证明第三结果和第四结果之间的差距越小。因此,基于第二目标损失函数,调整第一预处理模型的激活量化阈值,可以保证调节后的激活量化阈值的准确性,进一步保证基于调节后的激活量化阈值计算得到的激活输出的量化参数的准确性,从而提高了目标网络模型的精度。In this embodiment, based on the third result and the fourth result, a second target loss function is generated, and the smaller the value of the second target loss function, the smaller the gap between the third result and the fourth result. Therefore, adjusting the activation quantization threshold of the first preprocessing model based on the second objective loss function can ensure the accuracy of the adjusted activation quantization threshold, and further ensure the accuracy of the quantization parameter of the activation output calculated based on the adjusted activation quantization threshold. Accuracy, thereby improving the accuracy of the target network model.
基于上述实施例的内容,在本申请一个可选的实施例中,计算机设备还可以将初始网络模型和第一预处理网络模型设置为是同一个模型,在本申请实施例中统称为初始网络模型。初始网络模型的训练过程可以包括以下内容:Based on the content of the above embodiments, in an optional embodiment of this application, the computer device can also set the initial network model and the first preprocessing network model to be the same model, collectively referred to as the initial network in the embodiments of this application Model. The training process of the initial network model can include the following:
计算机设备首先根据第一目标损失函数调整初始网络模型的初始权重参数,然后基于调整后的初始权重参数,并根据第二目标损失函数调整初始网络模型的激活量化阈值,经过一次调整后初始网络模型的权重参数和激活量化阈值均不理想,计算机设备继续根据第一目标损失函数调整初始网 络模型的初始权重参数,然后基于调整后的初始权重参数,并根据第二目标损失函数调整初始网络模型的激活量化阈值。计算机设备如此循环调整初始网络模型的初始权重参数和激活量化阈值,经过多次迭代训练之后,最终完成对初始网络模型的训练,生成目标网络模型,从而保证了目标网络模型的精度。The computer device first adjusts the initial weight parameters of the initial network model according to the first objective loss function, and then adjusts the activation quantization threshold of the initial network model based on the adjusted initial weight parameters according to the second objective loss function. After one adjustment, the initial network model Both the weight parameter and the activation quantization threshold are unsatisfactory. The computer device continues to adjust the initial weight parameter of the initial network model according to the first objective loss function, and then adjusts the initial network model based on the adjusted initial weight parameter and according to the second objective loss function. Activates the quantization threshold. The computer equipment cyclically adjusts the initial weight parameters and activation quantization thresholds of the initial network model. After multiple iterations of training, the training of the initial network model is finally completed and the target network model is generated, thereby ensuring the accuracy of the target network model.
为了更好地说明本申请实施例提供的网络模型量化方法,本申请实施例提供了一种网络模型量化方法的整体流程,如图10所示,该方法包括:In order to better illustrate the network model quantification method provided in the embodiment of the present application, the embodiment of the present application provides an overall flow of the network model quantification method, as shown in FIG. 10 , the method includes:
步骤1001,获取待处理网络模型,待处理网络模型为预训练好的全精度网络模型,根据量化需求对待处理网络模型的权重参数和激活输出分别进行量化处理,得到初始权重参数和激活输出的初始量化参数,基于初始权重参数以及激活输出的初始量化参数,构建初始网络模型。 Step 1001, obtain the network model to be processed, the network model to be processed is a pre-trained full-precision network model, quantify the weight parameters and activation output of the network model to be processed according to the quantification requirements, and obtain the initial weight parameter and activation output of the initial Quantization parameters, based on the initial weight parameters and the initial quantization parameters of the activation output, construct the initial network model.
步骤1002,获取第一训练图像集。 Step 1002, acquire a first training image set.
步骤1003,将第一训练图像集输入至初始网络模型,输出第一结果。 Step 1003, input the first training image set into the initial network model, and output the first result.
步骤1004,获取第一校准网络模型,将第一训练图像集输入至第一校准网络模型,输出第二结果。 Step 1004, acquire the first calibration network model, input the first training image set into the first calibration network model, and output the second result.
步骤1005,基于第一结果和硬标签生成第一损失函数。 Step 1005, generating a first loss function based on the first result and the hard label.
步骤1006,基于第一结果和第二结果生成第二损失函数。 Step 1006, generating a second loss function based on the first result and the second result.
步骤1007,利用第一损失函数和第二损失函数,生成第一目标损失函数,并基于第一目标损失函数调整初始网络模型的初始权重参数,得到第一预处理模型。 Step 1007, using the first loss function and the second loss function to generate a first target loss function, and adjusting the initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.
步骤1008,获取第二训练图像集。 Step 1008, acquiring a second training image set.
步骤1009,将第二训练图像集输入至第一预处理模型,输出第三结果。 Step 1009, input the second training image set into the first preprocessing model, and output the third result.
步骤1010,获取第二校准网络模型,将第二训练图像集输入至第二校准网络模型,输出第四结果。 Step 1010, acquire a second calibration network model, input the second training image set into the second calibration network model, and output a fourth result.
步骤1011,基于第三结果和第四结果,生成第二目标损失函数。 Step 1011, generate a second target loss function based on the third result and the fourth result.
步骤1012,基于第二目标损失函数,调整第一预处理模型的激活量化阈值。Step 1012: Adjust the activation quantization threshold of the first preprocessing model based on the second objective loss function.
步骤1013,根据调整后的激活量化阈值,调整第一预处理模型的激活输出的初始量化参数,得到目标网络模型。 Step 1013, according to the adjusted activation quantization threshold, adjust the initial quantization parameter of the activation output of the first preprocessing model to obtain the target network model.
在本申请一个可选的实施例中,上述网络模型量化方法,可以如图11所示,包括以下步骤:In an optional embodiment of the present application, the above network model quantification method may be shown in Figure 11, including the following steps:
(1)低精度网络的参数初始化:基于预训练好的全精度学生网络,采用训练后量化方法(PTQ)对学生网络进行低精度初始化,初步确定需要量化的学生网络的低精度权重值和激活量化范围值。(1) Parameter initialization of the low-precision network: Based on the pre-trained full-precision student network, the post-training quantization method (PTQ) is used to initialize the student network with low precision, and the low-precision weight values and activations of the student network that need to be quantified are initially determined. Quantize range values.
(2)在全精度教师网络1的指导下,对学生网络的低精度权重参数进 行学习调整。(2) Under the guidance of the full-precision teacher network 1, learn and adjust the low-precision weight parameters of the student network.
(3)在全精度教师网络2的指导下,固定学生网络的低精度权重参数,对学生网络的激活量化阈值进行学习调整。(3) Under the guidance of the full-precision teacher network 2, the low-precision weight parameters of the student network are fixed, and the activation quantization threshold of the student network is learned and adjusted.
(4)网络结构部署。基于量化得到的网络模型参数,将该模型结构部署于实际的硬件平台上,进行相应的任务处理,如图像分类/检测/识别任务,或自然语言处理任务。(4) Network structure deployment. Based on the quantified network model parameters, deploy the model structure on the actual hardware platform to perform corresponding task processing, such as image classification/detection/recognition tasks, or natural language processing tasks.
应该理解的是,虽然图1、图3、图5-7、以及图9-10的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图1、图3、图5-7、以及图9-10中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the steps in the flowcharts of FIG. 1 , FIG. 3 , FIGS. 5-7 , and FIGS. 9-10 are shown sequentially as indicated by the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in Fig. 1, Fig. 3, Fig. 5-7, and Fig. 9-10 may include multiple steps or multiple stages, and these steps or stages are not necessarily executed at the same time, but may be Performed at different times, the execution order of these steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a part of steps or stages in other steps.
相应地,请参考图12,本申请实施例提供一种网络模型量化装置1200,网络模型量化装置1200,包括:量化处理模块1210、第一调整模块1220以及第二调整模块1230,其中:Correspondingly, please refer to FIG. 12 , an embodiment of the present application provides a network model quantization device 1200. The network model quantization device 1200 includes: a quantization processing module 1210, a first adjustment module 1220, and a second adjustment module 1230, wherein:
量化处理模块1210,用于获取待处理网络模型,待处理网络模型为预训练好的全精度网络模型,根据量化需求对待处理网络模型的权重参数和激活输出分别进行量化处理,得到初始权重参数和激活输出的初始量化参数,基于初始权重参数以及激活输出的初始量化参数,构建初始网络模型。The quantization processing module 1210 is used to obtain the network model to be processed. The network model to be processed is a pre-trained full-precision network model, and the weight parameters and activation outputs of the network model to be processed are respectively quantized according to the quantization requirements to obtain the initial weight parameters and The initial quantization parameter of the activation output, based on the initial weight parameter and the initial quantization parameter of the activation output, constructs the initial network model.
第一调整模块1220,用于获取第一校准网络模型,第一校准网络模型的精度高于初始网络模型的精度,基于第一校准网络模型对初始网络模型的初始权重参数进行调整,得到第一预处理模型。The first adjustment module 1220 is configured to obtain a first calibration network model, the accuracy of the first calibration network model is higher than that of the initial network model, and adjust the initial weight parameters of the initial network model based on the first calibration network model to obtain the first Preprocess the model.
第二调整模块1230,用于获取第二校准网络模型,第二校准网络模型的精度高于初始网络模型的精度,基于第二校准网络模型对第一预处理模型的激活输出的初始量化参数进行调整,得到目标网络模型。The second adjustment module 1230 is configured to obtain a second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the initial network model, and the initial quantization parameter of the activation output of the first preprocessing model is performed based on the second calibration network model Adjust to get the target network model.
在本申请一个实施例中,上述第一调整模块1220,具体用于基于知识蒸馏的学习方法,根据第一校准网络模型调整初始网络模型的初始权重参数,得到第一预处理模型。In one embodiment of the present application, the above-mentioned first adjustment module 1220 is specifically used for a learning method based on knowledge distillation, and adjusts the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model.
在本申请一个实施例中,上述第一调整模块1220,具体用于获取第一训练图像集,第一训练图像集带有硬标签;将第一训练图像集输入至初始网络模型,输出第一结果;将第一训练图像集输入至第一校准网络模型,输出第二结果;基于硬标签、第一结果和第二结果,调整初始网络模型的初始权重参数,得到第一预处理模型。In one embodiment of the present application, the above-mentioned first adjustment module 1220 is specifically used to obtain the first training image set, the first training image set has hard labels; input the first training image set to the initial network model, and output the first Result; the first training image set is input to the first calibration network model, and the second result is output; based on the hard label, the first result and the second result, the initial weight parameters of the initial network model are adjusted to obtain the first preprocessing model.
在本申请一个实施例中,上述第一调整模块1220,具体用于基于第一结果和硬标签生成第一损失函数;基于第一结果和第二结果生成第二损失函数;利用第一损失函数和第二损失函数,生成第一目标损失函数,并基于第一目标损失函数调整初始网络模型的初始权重参数,得到第一预处理模型。In one embodiment of the present application, the above-mentioned first adjustment module 1220 is specifically configured to generate the first loss function based on the first result and the hard label; generate the second loss function based on the first result and the second result; use the first loss function and a second loss function to generate a first objective loss function, and adjust the initial weight parameters of the initial network model based on the first objective loss function to obtain a first preprocessing model.
相应地,请参考图13,在本申请一个实施例中,上述第二调整模块1230,包括:第一调整单元1231以及第二调整单元1232,其中:Correspondingly, please refer to FIG. 13 , in one embodiment of the present application, the above-mentioned second adjustment module 1230 includes: a first adjustment unit 1231 and a second adjustment unit 1232, wherein:
第一调整单元1231,用于基于知识蒸馏的学习方法,根据第二校准网络模型调整第一预处理模型的激活量化阈值;The first adjustment unit 1231 is used for a learning method based on knowledge distillation, and adjusts the activation quantization threshold of the first preprocessing model according to the second calibration network model;
第二调整单元1232,用于根据调整后的激活量化阈值,调整第一预处理模型的激活输出的初始量化参数,得到目标网络模型。The second adjustment unit 1232 is configured to adjust the initial quantization parameter of the activation output of the first preprocessing model according to the adjusted activation quantization threshold to obtain the target network model.
在本申请一个实施例中,上述第一调整单元1231,具体用于:获取第二训练图像集;将第二训练图像集输入至第一预处理模型,输出第三结果;将第二训练图像集输入至第二校准网络模型,输出第四结果;基于第三结果和第四结果,调整第一预处理模型的激活量化阈值。In one embodiment of the present application, the above-mentioned first adjustment unit 1231 is specifically configured to: acquire the second training image set; input the second training image set into the first preprocessing model, and output the third result; The set is input to the second calibration network model, and the fourth result is output; based on the third result and the fourth result, the activation quantization threshold of the first preprocessing model is adjusted.
在本申请一个实施例中,上述第一调整单元1231,具体用于:用于基于第三结果和第四结果,生成第二目标损失函数;基于第二目标损失函数,调整第一预处理模型的激活量化阈值。In one embodiment of the present application, the above-mentioned first adjustment unit 1231 is specifically configured to: generate a second target loss function based on the third result and the fourth result; adjust the first preprocessing model based on the second target loss function The activation quantification threshold for .
关于网络模型量化装置的具体限定以及有益效果可以参见上文中对于网络模型量化方法的限定,在此不再赘述。上述网络模型量化装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitations and beneficial effects of the network model quantification device, please refer to the above-mentioned limitations on the network model quantification method, which will not be repeated here. Each module in the above-mentioned apparatus for network model quantification can be fully or partially realized by software, hardware and a combination thereof. The above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can invoke and execute the corresponding operations of the above-mentioned modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是终端,其内部结构图可以如图14所示。该计算机设备包括通过系统总线连接的处理器、存储器、通信接口、显示屏和输入装置。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的通信接口用于与外部的终端进行有线或无线方式的通信,无线方式可通过WIFI、运营商网络、NFC(近场通信)或其他技术实现。该计算机程序被处理器执行时以实现一种网络模型量化方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。In one embodiment, a computer device is provided. The computer device may be a terminal, and its internal structure may be as shown in FIG. 14 . The computer device includes a processor, a memory, a communication interface, a display screen and an input device connected through a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used to communicate with an external terminal in a wired or wireless manner, and the wireless manner can be realized through WIFI, an operator network, NFC (Near Field Communication) or other technologies. When the computer program is executed by a processor, a network model quantification method is implemented. The display screen of the computer device may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer device may be a touch layer covered on the display screen, or a button, a trackball or a touch pad provided on the casing of the computer device , and can also be an external keyboard, touchpad, or mouse.
本领域技术人员可以理解,图14中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in Figure 14 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer equipment on which the solution of this application is applied. The specific computer equipment can be More or fewer components than shown in the figures may be included, or some components may be combined, or have a different arrangement of components.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图15所示。该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储网络模型量化数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种网络模型量化方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure may be as shown in FIG. 15 . The computer device includes a processor, memory and a network interface connected by a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs and databases. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store network model quantification data. The network interface of the computer device is used to communicate with an external terminal via a network connection. When the computer program is executed by a processor, a network model quantification method is implemented.
本领域技术人员可以理解,图15中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in Figure 15 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer equipment on which the solution of this application is applied. The specific computer equipment can be More or fewer components than shown in the figures may be included, or some components may be combined, or have a different arrangement of components.
在本申请一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现以下步骤:获取待处理网络模型,待处理网络模型为预训练好的全精度网络模型,根据量化需求对待处理网络模型的权重参数和激活输出分别进行量化处理,得到初始权重参数和激活输出的初始量化参数,基于初始权重参数以及激活输出的初始量化参数,构建初始网络模型;获取第一校准网络模型,第一校准网络模型的精度高于初始网络模型的精度,基于第一校准网络模型对初始网络模型的初始权重参数进行调整,得到第一预处理模型;获取第二校准网络模型,第二校准网络模型的精度高于第一预处理模型的精度,基于第二校准网络模型对第一预处理模型的激活输出的初始量化参数进行调整,得到目标网络模型。In one embodiment of the present application, a computer device is provided, including a memory and a processor. A computer program is stored in the memory. When the processor executes the computer program, the following steps are implemented: acquiring a network model to be processed, the network model to be processed is The pre-trained full-precision network model performs quantization processing on the weight parameters and activation output of the network model to be processed according to the quantification requirements, and obtains the initial weight parameter and the initial quantization parameter of the activation output, based on the initial weight parameter and the initial quantization parameter of the activation output , construct the initial network model; obtain the first calibration network model, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, adjust the initial weight parameters of the initial network model based on the first calibration network model, and obtain the first preprocessing Model; obtain the second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model, and adjust the initial quantization parameters of the activation output of the first preprocessing model based on the second calibration network model to obtain the target network model.
在本申请一个实施例中,处理器执行计算机程序时还实现以下步骤:基于知识蒸馏的学习方法,根据第一校准网络模型调整初始网络模型的初始权重参数,得到第一预处理模型。In one embodiment of the present application, when the processor executes the computer program, the following steps are further implemented: using a learning method based on knowledge distillation, adjusting the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model.
在本申请一个实施例中,处理器执行计算机程序时还实现以下步骤:获取第一训练图像集,第一训练图像集带有硬标签;将第一训练图像集输入至初始网络模型,输出第一结果;将第一训练图像集输入至第一校准网络模型,输出第二结果;基于硬标签、第一结果和第二结果,调整初始网络模型的初 始权重参数,得到第一预处理模型。In one embodiment of the present application, when the processor executes the computer program, the following steps are also implemented: obtaining the first training image set, the first training image set has hard labels; inputting the first training image set to the initial network model, and outputting the first training image set A result; the first training image set is input to the first calibration network model, and the second result is output; based on the hard label, the first result and the second result, the initial weight parameters of the initial network model are adjusted to obtain the first preprocessing model.
在本申请一个实施例中,处理器执行计算机程序时还实现以下步骤:基于第一结果和硬标签生成第一损失函数;基于第一结果和第二结果生成第二损失函数;利用第一损失函数和第二损失函数,生成第一目标损失函数,并基于第一目标损失函数调整初始网络模型的初始权重参数,得到第一预处理模型。In one embodiment of the present application, when the processor executes the computer program, the following steps are also implemented: generating a first loss function based on the first result and the hard label; generating a second loss function based on the first result and the second result; using the first loss function and a second loss function, generate a first target loss function, and adjust initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.
在本申请一个实施例中,处理器执行计算机程序时还实现以下步骤:基于知识蒸馏的学习方法,根据第二校准网络模型调整第一预处理模型的激活量化阈值;根据调整后的激活量化阈值,调整第一预处理模型的激活输出的初始量化参数,得到目标网络模型。In one embodiment of the present application, when the processor executes the computer program, the following steps are also implemented: a learning method based on knowledge distillation, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model; according to the adjusted activation quantization threshold , adjust the initial quantization parameters of the activation output of the first preprocessing model to obtain the target network model.
在本申请一个实施例中,处理器执行计算机程序时还实现以下步骤:获取第二训练图像集;将第二训练图像集输入至第一预处理模型,输出第三结果;将第二训练图像集输入至第二校准网络模型,输出第四结果;基于第三结果和第四结果,调整第一预处理模型的激活量化阈值。In one embodiment of the present application, when the processor executes the computer program, the following steps are also implemented: acquiring the second training image set; inputting the second training image set into the first preprocessing model, and outputting the third result; The set is input to the second calibration network model, and the fourth result is output; based on the third result and the fourth result, the activation quantization threshold of the first preprocessing model is adjusted.
在本申请一个实施例中,处理器执行计算机程序时还实现以下步骤:基于第三结果和第四结果,生成第二目标损失函数;基于第二目标损失函数,调整第一预处理模型的激活量化阈值。In one embodiment of the present application, when the processor executes the computer program, the following steps are also implemented: generating a second target loss function based on the third result and the fourth result; adjusting the activation of the first preprocessing model based on the second target loss function Quantization Threshold.
在本申请一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:获取待处理网络模型,待处理网络模型为预训练好的全精度网络模型,根据量化需求对待处理网络模型的权重参数和激活输出分别进行量化处理,得到初始权重参数和激活输出的初始量化参数,基于初始权重参数以及激活输出的初始量化参数,构建初始网络模型;获取第一校准网络模型,第一校准网络模型的精度高于初始网络模型的精度,基于第一校准网络模型对初始网络模型的初始权重参数进行调整,得到第一预处理模型;获取第二校准网络模型,第二校准网络模型的精度高于第一预处理模型的精度,基于第二校准网络模型对第一预处理模型的激活输出的初始量化参数进行调整,得到目标网络模型。In one embodiment of the present application, a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the following steps are implemented: obtaining a network model to be processed, the network model to be processed is pre-trained According to the quantization requirements, the weight parameters and activation output of the network model to be processed are respectively quantized to obtain the initial weight parameters and the initial quantization parameters of the activation output. Based on the initial weight parameters and the initial quantization parameters of the activation output, the initial Network model; obtain a first calibration network model, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, and adjust the initial weight parameters of the initial network model based on the first calibration network model to obtain a first preprocessing model; obtain The second calibration network model, the precision of the second calibration network model is higher than the precision of the first preprocessing model, and the initial quantization parameter of the activation output of the first preprocessing model is adjusted based on the second calibration network model to obtain the target network model.
在本申请一个实施例中,计算机程序被处理器执行时还实现以下步骤:基于知识蒸馏的学习方法,根据第一校准网络模型调整初始网络模型的初始权重参数,得到第一预处理模型。In one embodiment of the present application, when the computer program is executed by the processor, the following steps are further implemented: using a learning method based on knowledge distillation, adjusting the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model.
在本申请一个实施例中,计算机程序被处理器执行时还实现以下步骤:获取第一训练图像集,第一训练图像集带有硬标签;将第一训练图像集输入至初始网络模型,输出第一结果;将第一训练图像集输入至第一校准网络模型,输出第二结果;基于硬标签、第一结果和第二结果,调整初始网络模型 的初始权重参数,得到第一预处理模型。In one embodiment of the present application, when the computer program is executed by the processor, the following steps are also implemented: obtaining the first training image set, the first training image set has hard labels; inputting the first training image set to the initial network model, and outputting The first result; input the first training image set to the first calibration network model, and output the second result; based on the hard label, the first result and the second result, adjust the initial weight parameters of the initial network model to obtain the first preprocessing model .
在本申请一个实施例中,计算机程序被处理器执行时还实现以下步骤:基于第一结果和硬标签生成第一损失函数;基于第一结果和第二结果生成第二损失函数;利用第一损失函数和第二损失函数,生成第一目标损失函数,并基于第一目标损失函数调整初始网络模型的初始权重参数,得到第一预处理模型。In one embodiment of the present application, when the computer program is executed by the processor, the following steps are further implemented: generating a first loss function based on the first result and the hard label; generating a second loss function based on the first result and the second result; using the first A loss function and a second loss function, generating a first target loss function, and adjusting initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.
在本申请一个实施例中,计算机程序被处理器执行时还实现以下步骤:基于知识蒸馏的学习方法,根据第二校准网络模型调整第一预处理模型的激活量化阈值;根据调整后的激活量化阈值,调整第一预处理模型的激活输出的初始量化参数,得到目标网络模型。In one embodiment of the present application, when the computer program is executed by the processor, the following steps are also implemented: using a learning method based on knowledge distillation, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model; Threshold, adjust the initial quantization parameter of the activation output of the first preprocessing model to obtain the target network model.
在本申请一个实施例中,计算机程序被处理器执行时还实现以下步骤:获取第二训练图像集;将第二训练图像集输入至第一预处理模型,输出第三结果;将第二训练图像集输入至第二校准网络模型,输出第四结果;基于第三结果和第四结果,调整第一预处理模型的激活量化阈值。In one embodiment of the present application, when the computer program is executed by the processor, the following steps are also implemented: acquiring the second training image set; inputting the second training image set into the first preprocessing model, and outputting the third result; The image set is input to the second calibration network model, and a fourth result is output; based on the third result and the fourth result, the activation quantization threshold of the first preprocessing model is adjusted.
在本申请一个实施例中,计算机程序被处理器执行时还实现以下步骤:基于第三结果和第四结果,生成第二目标损失函数;基于第二目标损失函数,调整第一预处理模型的激活量化阈值。In one embodiment of the present application, when the computer program is executed by the processor, the following steps are further implemented: generating a second target loss function based on the third result and the fourth result; adjusting the first preprocessing model based on the second target loss function Activates the quantization threshold.
本领域技术人员可以理解,实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)、随机存储记忆体(Random Access Memory,RAM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive,缩写:HDD)或固态硬盘(Solid-State Drive,SSD)等;存储介质还可以包括上述种类的存储器的组合。Those skilled in the art can understand that all or part of the processes in the methods of the above-mentioned embodiments can be completed by instructing related hardware through computer programs, and the programs can be stored in a computer-readable storage medium. , may include the flow of the embodiments of the above-mentioned methods. Wherein, the storage medium can be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a flash memory (Flash Memory), a hard disk (Hard Disk Drive) , abbreviation: HDD) or solid-state drive (Solid-State Drive, SSD), etc.; the storage medium may also include a combination of the above-mentioned types of memory.
虽然结合附图描述了本申请的实施例,但是本领域技术人员可以在不脱离本申请的精神和范围的情况下作出各种修改和变型,这样的修改和变型均落入由所附权利要求所限定的范围之内。Although the embodiment of the application has been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the application, and such modifications and variations all fall into the scope defined by the appended claims. within the limited range.

Claims (10)

  1. 一种网络模型量化方法,其特征在于,所述方法包括:A network model quantification method, characterized in that the method comprises:
    获取待处理网络模型,所述待处理网络模型为预训练好的全精度网络模型,根据量化需求对所述待处理网络模型的权重参数和激活输出分别进行量化处理,得到初始权重参数和激活输出的初始量化参数,基于所述初始权重参数以及所述激活输出的初始量化参数,构建初始网络模型;Obtaining the network model to be processed, the network model to be processed is a pre-trained full-precision network model, and performing quantization processing on the weight parameters and activation output of the network model to be processed according to the quantification requirements, to obtain the initial weight parameter and activation output An initial quantization parameter, based on the initial weight parameter and the initial quantization parameter of the activation output, constructs an initial network model;
    获取第一校准网络模型,所述第一校准网络模型的精度高于所述初始网络模型的精度,基于所述第一校准网络模型对所述初始网络模型的所述初始权重参数进行调整,得到第一预处理模型;Acquiring a first calibration network model, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, adjusting the initial weight parameters of the initial network model based on the first calibration network model, to obtain The first preprocessing model;
    获取第二校准网络模型,所述第二校准网络模型的精度高于所述第一预处理模型的精度,基于所述第二校准网络模型对所述第一预处理模型的所述激活输出的初始量化参数进行调整,得到目标网络模型。Acquiring a second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model, based on the activation output of the first preprocessing model by the second calibration network model The initial quantization parameters are adjusted to obtain the target network model.
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述第一校准网络模型对所述初始网络模型的所述初始权重参数进行调整,得到第一预处理模型,包括:The method according to claim 1, wherein said adjusting said initial weight parameters of said initial network model based on said first calibration network model to obtain a first preprocessing model comprises:
    基于知识蒸馏的学习方法,根据所述第一校准网络模型调整所述初始网络模型的所述初始权重参数,得到所述第一预处理模型。Based on the learning method of knowledge distillation, the initial weight parameter of the initial network model is adjusted according to the first calibration network model to obtain the first preprocessing model.
  3. 根据权利要求2所述的方法,其特征在于,所述基于知识蒸馏的学习方法,根据所述第一校准网络模型调整所述初始网络模型的所述初始权重参数,得到所述第一预处理模型,包括:The method according to claim 2, wherein the learning method based on knowledge distillation adjusts the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing models, including:
    获取第一训练图像集,所述第一训练图像集带有硬标签;Obtain a first set of training images with hard labels;
    将所述第一训练图像集输入至所述初始网络模型,输出第一结果;input the first training image set to the initial network model, and output a first result;
    将所述第一训练图像集输入至所述第一校准网络模型,输出第二结果;input the first training image set to the first calibration network model, and output a second result;
    基于所述硬标签、所述第一结果和所述第二结果,调整所述初始网络模型的所述初始权重参数,得到所述第一预处理模型。Adjusting the initial weight parameters of the initial network model based on the hard label, the first result, and the second result to obtain the first preprocessing model.
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述硬标签、所述第一结果和所述第二结果,调整所述初始网络模型的所述初始权重参数, 得到所述第一预处理模型,包括:The method according to claim 3, wherein the initial weight parameter of the initial network model is adjusted based on the hard label, the first result and the second result to obtain the first A preprocessing model, including:
    基于所述第一结果和所述硬标签生成第一损失函数;generating a first loss function based on the first result and the hard label;
    基于所述第一结果和所述第二结果生成第二损失函数;generating a second loss function based on the first result and the second result;
    利用所述第一损失函数和所述第二损失函数,生成第一目标损失函数,并基于所述第一目标损失函数调整所述初始网络模型的所述初始权重参数,得到所述第一预处理模型。Using the first loss function and the second loss function to generate a first target loss function, and adjust the initial weight parameters of the initial network model based on the first target loss function to obtain the first prediction Handle the model.
  5. 根据权利要求1所述的方法,其特征在于,所述基于所述第二校准网络模型对所述第一预处理模型的所述激活输出的初始量化参数进行调整,得到目标网络模型,包括:The method according to claim 1, wherein the adjustment of the initial quantization parameters of the activation output of the first preprocessing model based on the second calibration network model to obtain a target network model includes:
    基于知识蒸馏的学习方法,根据所述第二校准网络模型调整所述第一预处理模型的激活量化阈值;A learning method based on knowledge distillation, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model;
    根据调整后的所述激活量化阈值,调整所述第一预处理模型的所述激活输出的初始量化参数,得到所述目标网络。According to the adjusted activation quantization threshold, an initial quantization parameter of the activation output of the first preprocessing model is adjusted to obtain the target network.
  6. 根据权利要求5所述的方法,其特征在于,所述基于知识蒸馏的学习方法,根据所述第二校准网络模型调整所述第一预处理模型的激活量化阈值,包括:The method according to claim 5, wherein the learning method based on knowledge distillation, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model, comprises:
    获取第二训练图像集;Obtain a second training image set;
    将所述第二训练图像集输入至所述第一预处理模型,输出第三结果;inputting the second training image set into the first preprocessing model, and outputting a third result;
    将所述第二训练图像集输入至所述第二校准网络模型,输出第四结果;inputting the second training image set into the second calibration network model, and outputting a fourth result;
    基于所述第三结果和所述第四结果,调整所述第一预处理模型的所述激活量化阈值。The activation quantization threshold of the first preprocessing model is adjusted based on the third result and the fourth result.
  7. 根据权利要求6所述的方法,其特征在于,所述基于所述第三结果和所述第四结果,调整所述第一预处理模型的所述激活量化阈值,包括:The method according to claim 6, wherein the adjusting the activation quantization threshold of the first preprocessing model based on the third result and the fourth result comprises:
    基于所述第三结果和所述第四结果,生成第二目标损失函数;generating a second objective loss function based on the third result and the fourth result;
    基于所述第二目标损失函数,调整所述第一预处理模型的所述激活量化阈值。The activation quantization threshold of the first preprocessing model is adjusted based on the second objective loss function.
  8. 一种网络模型量化装置,其特征在于,所述装置包括:A network model quantification device, characterized in that the device comprises:
    量化处理模块,用于获取待处理网络模型,所述待处理网络模型为预训练好的全精度网络模型,根据量化需求对所述待处理网络模型的权重参数和激活输出分别进行量化处理,得到初始权重参数和激活输出的初始量化参数,基于所述初始权重参数以及所述激活输出的初始量化参数,构建初始网络模型;The quantization processing module is used to obtain the network model to be processed, the network model to be processed is a pre-trained full-precision network model, and the weight parameters and activation output of the network model to be processed are respectively quantized according to the quantization requirements to obtain An initial weight parameter and an initial quantization parameter of the activation output, based on the initial weight parameter and the initial quantization parameter of the activation output, constructing an initial network model;
    第一调整模块,用于获取第一校准网络模型,所述第一校准网络模型的精度高于所述初始网络模型的精度,基于所述第一校准网络模型对所述初始网络模型的所述初始权重参数进行调整,得到第一预处理模型;The first adjustment module is used to obtain a first calibration network model, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, based on the first calibration network model to the initial network model The initial weight parameters are adjusted to obtain the first preprocessing model;
    第二调整模块,用于获取第二校准网络模型,所述第二校准网络模型的精度高于所述初始网络模型的精度,基于所述第二校准网络模型对所述第一预处理模型的所述激活输出的初始量化参数进行调整,得到目标网络模型。The second adjustment module is used to obtain a second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the initial network model, based on the second calibration network model to the first preprocessing model The initial quantization parameters of the activation output are adjusted to obtain the target network model.
  9. 一种计算机设备,其特征在于,包括存储器和处理器,所述存储器和所述处理器之间互相通信连接,所述存储器中存储有计算机指令,所述处理器通过执行所述计算机指令,从而执行权利要求1-7中任一项所述的网络模型量化方法。A computer device, characterized in that it includes a memory and a processor, the memory and the processor are connected in communication with each other, and computer instructions are stored in the memory, and the processor executes the computer instructions, thereby Execute the network model quantification method described in any one of claims 1-7.
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使所述计算机执行权利要求1-7中任一项所述的网络模型量化方法。A computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and the computer instructions are used to make the computer execute the network model quantification described in any one of claims 1-7 method.
PCT/CN2022/078256 2021-09-28 2022-02-28 Network model quantization method and apparatus, and computer device and storage medium WO2023050707A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111139349.XA CN113610232B (en) 2021-09-28 2021-09-28 Network model quantization method and device, computer equipment and storage medium
CN202111139349.X 2021-09-28

Publications (1)

Publication Number Publication Date
WO2023050707A1 true WO2023050707A1 (en) 2023-04-06

Family

ID=78343259

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/078256 WO2023050707A1 (en) 2021-09-28 2022-02-28 Network model quantization method and apparatus, and computer device and storage medium

Country Status (2)

Country Link
CN (1) CN113610232B (en)
WO (1) WO2023050707A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116542344A (en) * 2023-07-05 2023-08-04 浙江大华技术股份有限公司 Model automatic deployment method, platform and system
CN116579407A (en) * 2023-05-19 2023-08-11 北京百度网讯科技有限公司 Compression method, training method, processing method and device of neural network model
CN116721399A (en) * 2023-07-26 2023-09-08 之江实验室 Point cloud target detection method and device for quantitative perception training
CN117077740A (en) * 2023-09-25 2023-11-17 荣耀终端有限公司 Model quantization method and device

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610232B (en) * 2021-09-28 2022-02-22 苏州浪潮智能科技有限公司 Network model quantization method and device, computer equipment and storage medium
CN115570228B (en) * 2022-11-22 2023-03-17 苏芯物联技术(南京)有限公司 Intelligent feedback control method and system for welding pipeline gas supply
CN117689044A (en) * 2024-02-01 2024-03-12 厦门大学 Quantification method suitable for vision self-attention model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276451A (en) * 2019-06-28 2019-09-24 南京大学 One kind being based on the normalized deep neural network compression method of weight
CN110443165A (en) * 2019-07-23 2019-11-12 北京迈格威科技有限公司 Neural network quantization method, image-recognizing method, device and computer equipment
CN112016674A (en) * 2020-07-29 2020-12-01 魔门塔(苏州)科技有限公司 Knowledge distillation-based convolutional neural network quantification method
CN112508169A (en) * 2020-11-13 2021-03-16 华为技术有限公司 Knowledge distillation method and system
CN113610232A (en) * 2021-09-28 2021-11-05 苏州浪潮智能科技有限公司 Network model quantization method and device, computer equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190164057A1 (en) * 2019-01-30 2019-05-30 Intel Corporation Mapping and quantification of influence of neural network features for explainable artificial intelligence
US20210142177A1 (en) * 2019-11-13 2021-05-13 Nvidia Corporation Synthesizing data for training one or more neural networks
CN111753761B (en) * 2020-06-28 2024-04-09 北京百度网讯科技有限公司 Model generation method, device, electronic equipment and storage medium
CN112200296B (en) * 2020-07-31 2024-04-05 星宸科技股份有限公司 Network model quantization method and device, storage medium and electronic equipment
CN112308019B (en) * 2020-11-19 2021-08-17 中国人民解放军国防科技大学 SAR ship target detection method based on network pruning and knowledge distillation
CN113011581B (en) * 2021-02-23 2023-04-07 北京三快在线科技有限公司 Neural network model compression method and device, electronic equipment and readable storage medium
CN112988975A (en) * 2021-04-09 2021-06-18 北京语言大学 Viewpoint mining method based on ALBERT and knowledge distillation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276451A (en) * 2019-06-28 2019-09-24 南京大学 One kind being based on the normalized deep neural network compression method of weight
CN110443165A (en) * 2019-07-23 2019-11-12 北京迈格威科技有限公司 Neural network quantization method, image-recognizing method, device and computer equipment
CN112016674A (en) * 2020-07-29 2020-12-01 魔门塔(苏州)科技有限公司 Knowledge distillation-based convolutional neural network quantification method
CN112508169A (en) * 2020-11-13 2021-03-16 华为技术有限公司 Knowledge distillation method and system
CN113610232A (en) * 2021-09-28 2021-11-05 苏州浪潮智能科技有限公司 Network model quantization method and device, computer equipment and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116579407A (en) * 2023-05-19 2023-08-11 北京百度网讯科技有限公司 Compression method, training method, processing method and device of neural network model
CN116579407B (en) * 2023-05-19 2024-02-13 北京百度网讯科技有限公司 Compression method, training method, processing method and device of neural network model
CN116542344A (en) * 2023-07-05 2023-08-04 浙江大华技术股份有限公司 Model automatic deployment method, platform and system
CN116721399A (en) * 2023-07-26 2023-09-08 之江实验室 Point cloud target detection method and device for quantitative perception training
CN116721399B (en) * 2023-07-26 2023-11-14 之江实验室 Point cloud target detection method and device for quantitative perception training
CN117077740A (en) * 2023-09-25 2023-11-17 荣耀终端有限公司 Model quantization method and device
CN117077740B (en) * 2023-09-25 2024-03-12 荣耀终端有限公司 Model quantization method and device

Also Published As

Publication number Publication date
CN113610232B (en) 2022-02-22
CN113610232A (en) 2021-11-05

Similar Documents

Publication Publication Date Title
WO2023050707A1 (en) Network model quantization method and apparatus, and computer device and storage medium
US10991074B2 (en) Transforming source domain images into target domain images
US20230376771A1 (en) Training machine learning models by determining update rules using neural networks
US20210201147A1 (en) Model training method, machine translation method, computer device, and storage medium
US11657254B2 (en) Computation method and device used in a convolutional neural network
US10789734B2 (en) Method and device for data quantization
CN110880036B (en) Neural network compression method, device, computer equipment and storage medium
CN112106081A (en) Application development platform and software development suite for providing comprehensive machine learning service
TWI767000B (en) Method and computer storage medium of generating waveform
US20190340492A1 (en) Design flow for quantized neural networks
US20230042221A1 (en) Modifying digital images utilizing a language guided image editing model
TWI744724B (en) Method of processing convolution neural network
JP2022169743A (en) Information extraction method and device, electronic equipment, and storage medium
KR102508860B1 (en) Method, device, electronic equipment and medium for identifying key point positions in images
WO2022021834A1 (en) Neural network model determination method and apparatus, and electronic device, and medium, and product
WO2016142285A1 (en) Method and apparatus for image search using sparsifying analysis operators
JP2023547010A (en) Model training methods, equipment, and electronics based on knowledge distillation
WO2023020456A1 (en) Network model quantification method and apparatus, device, and storage medium
Huai et al. Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
JP6467893B2 (en) Information processing system, information processing method, and program
US20220044109A1 (en) Quantization-aware training of quantized neural networks
CN117315758A (en) Facial expression detection method and device, electronic equipment and storage medium
US20230046088A1 (en) Method for training student network and method for recognizing image
US10530387B1 (en) Estimating an optimal ordering for data compression
US20240062057A1 (en) Regularizing targets in model distillation utilizing past state knowledge to improve teacher-student machine learning models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22874110

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE