WO2023050707A1 - Procédé et appareil de quantification de modèle de réseau, et dispositif informatique et support de stockage - Google Patents

Procédé et appareil de quantification de modèle de réseau, et dispositif informatique et support de stockage Download PDF

Info

Publication number
WO2023050707A1
WO2023050707A1 PCT/CN2022/078256 CN2022078256W WO2023050707A1 WO 2023050707 A1 WO2023050707 A1 WO 2023050707A1 CN 2022078256 W CN2022078256 W CN 2022078256W WO 2023050707 A1 WO2023050707 A1 WO 2023050707A1
Authority
WO
WIPO (PCT)
Prior art keywords
network model
initial
model
result
preprocessing
Prior art date
Application number
PCT/CN2022/078256
Other languages
English (en)
Chinese (zh)
Inventor
梁玲燕
董刚
赵雅倩
温东超
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023050707A1 publication Critical patent/WO2023050707A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present application relates to the field of artificial intelligence, in particular to a network model quantification method, device, computer equipment and storage medium.
  • neural network models are usually deployed on some terminal devices or edge devices. These devices generally have low computing power, and memory and power consumption are also limited. Therefore, how to reduce the large deep neural network model and realize the real deployment of the deep neural network model on the terminal has become an urgent problem to be solved while ensuring the accuracy of the model.
  • model compression methods such as quantization and cropping are usually used to reduce the size of the deep neural network model, thereby reducing the size of the large deep neural network model.
  • the embodiment of the present application provides a network model quantization method, device, computer equipment, and storage medium to solve the problem of shrinking large-scale deep neural network models through model compression such as quantization and cropping.
  • the accuracy of deep neural network models is relatively low. low problem.
  • the embodiment of the present application provides a network model quantification method, the method includes: obtaining the network model to be processed, the network model to be processed is a pre-trained full-precision network model, and the network model to be processed is Quantize the weight parameter and activation output respectively to obtain the initial weight parameter and the initial quantization parameter of the activation output, and construct the initial network model based on the initial weight parameter and the initial quantization parameter of the activation output; obtain the first calibration network model, the first calibration network The accuracy of the model is higher than that of the initial network model, and the initial weight parameters of the initial network model are adjusted based on the first calibration network model to obtain the first preprocessing model; the second calibration network model is obtained, and the accuracy of the second calibration network model is high Based on the accuracy of the initial network model, an initial quantization parameter of the activation output of the first preprocessing model is adjusted based on the second calibration network model to obtain a target network model.
  • the pre-trained full-precision network model first obtain the pre-trained full-precision network model, and use it as the network model to be processed, and then perform quantization processing on the weight and activation output of the network model to be processed according to the quantization requirements, and obtain the initial weight parameters and activation output
  • the initial quantization parameters of based on the initial weight parameters and the initial quantization parameters of the activation output, construct the initial network model. Since the weight of the network model to be processed and the initial quantization parameter of the activation output are quantized, the size of the initial network model constructed based on the initial weight parameter and the initial quantization parameter of the activation output is much smaller than the network model to be processed, thus ensuring the initial Network models can run on some end devices and edge devices.
  • the initial weight parameters of the initial network model can be adjusted based on the first calibration network model whose model accuracy is higher than that of the initial network model to obtain the first preprocessing model, so that the accuracy of the weight parameters of the first preprocessing model can be guaranteed, thereby improving the accuracy of the first preprocessing model.
  • the initial quantization parameters of the activation output of the first preprocessing model can be adjusted to obtain the target network model.
  • the weight parameters and activation output range of the target network model are more accurate, which further improves the accuracy of the target network model and solves the problem of shrinking large deep neural networks through quantization, cropping and other model compression methods. model, which makes the accuracy of the reduced deep neural network model lower.
  • the initial weight parameters of the initial network model are adjusted based on the first calibration network model to obtain the first preprocessing model, including: a learning method based on knowledge distillation, according to the first A calibration network model adjusts initial weight parameters of the initial network model to obtain a first preprocessing model.
  • the learning method based on knowledge distillation uses the first calibrated network model as the large teacher network model for the small quantized initial network model Conduct guided learning to obtain better model parameters, adjust the initial weight parameters of the initial network model according to the first calibration network model, and obtain the first preprocessing model. Therefore, the accuracy of the weight parameters of the obtained first preprocessing model can be guaranteed, and the accuracy of the first preprocessing model can be improved.
  • the learning method based on knowledge distillation adjusts the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model, including: Obtain the first training image set, the first training image set has hard labels; input the first training image set to the initial network model, and output the first result; input the first training image set to the first calibration network model, and output the first The second result: based on the hard label, the first result and the second result, the initial weight parameters of the initial network model are adjusted to obtain the first preprocessing model.
  • the first training image set with hard labels is input to the initial network model and the first calibration network model respectively, and the first result and the second result are output respectively, using the first result and the second result and The relationship between the first result and the hard label, adjust the initial weight parameters of the initial network model, so that the first result output by the initial network model can be closer to the second result and the hard label, so as to ensure that the first result obtained after weight parameter adjustment The accuracy of the preprocessed model has been improved.
  • the initial weight parameters of the initial network model are adjusted to obtain the first preprocessing model, including: based on Generate the first loss function based on the first result and the hard label; generate the second loss function based on the first result and the second result; use the first loss function and the second loss function to generate the first target loss function, and based on the first target loss
  • the function adjusts the initial weight parameters of the initial network model to obtain the first preprocessing model.
  • the first loss function is generated based on the first result output by the initial network model and the hard label of the first training image set, based on the first result output by the initial network model and the second result output by the first calibration network model Generate the second loss function.
  • the first loss function can be used to represent the gap between the first result and the hard label
  • the second loss function can be used to represent the gap between the first result and the second result. Therefore, using the first loss function and the second loss function, the generated first target loss function can characterize the first result and the hard label and the gap between the first result and the second result.
  • the initial weight parameters of the initial network model are adjusted based on the first objective loss function to obtain a first preprocessing model, thereby improving the accuracy of the first preprocessing model.
  • the initial quantization parameters of the activation output of the first preprocessing model are adjusted based on the second calibration network model to obtain the target network model, including: a learning method based on knowledge distillation , adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model; adjusting the initial quantization parameter of the activation output of the first preprocessing model according to the adjusted activation quantization threshold to obtain the target network model.
  • the learning method based on knowledge distillation adopts the second calibration network model as the large teacher network model for the small quantized first Preprocessing the model for guided learning to obtain better model parameters.
  • Adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model can ensure the accuracy of the adjusted activation quantization threshold.
  • the initial quantization parameter of the activation output of the first preprocessing model is adjusted to obtain the target network model, which can further ensure the accuracy of the initial quantization parameter of the activation output of the adjusted first preprocessing model performance, thereby improving the accuracy of the obtained target network model.
  • the learning method based on knowledge distillation, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model includes: acquiring a second training image set; input the second training image set to the first preprocessing model, and output the third result; input the second training image set to the second calibration network model, and output the fourth result; based on the third result and the fourth result, adjust Activation quantization threshold for the first preprocessing model.
  • the second training image set is input to the first preprocessing model and the second calibration network model respectively, and the third result and the fourth result are output, and based on the third result and the fourth result, the first preprocessing
  • the activation quantization threshold of the processing model is adjusted, thereby ensuring the accuracy of the adjusted activation quantization threshold and further ensuring the accuracy of the obtained target network model.
  • adjusting the activation quantization threshold of the first preprocessing model based on the third result and the fourth result includes: based on the third result and the fourth result, A second objective loss function is generated; based on the second objective loss function, an activation quantization threshold of the first preprocessing model is adjusted.
  • adjusting the activation quantization threshold of the first preprocessing model based on the second objective loss function can ensure the accuracy of the adjusted activation quantization threshold, and further ensure the accuracy of the quantization parameter of the activation output calculated based on the adjusted activation quantization threshold. Accuracy, thereby improving the accuracy of the target network model.
  • an embodiment of the present application provides a network model quantification device, which includes:
  • the quantization processing module is used to obtain the network model to be processed.
  • the network model to be processed is a pre-trained full-precision network model. According to the quantization requirements, the weight parameters and activation output of the network model to be processed are respectively quantized to obtain the initial weight parameter and activation.
  • the initial quantization parameter of the output based on the initial weight parameter and the initial quantization parameter of the activation output, constructs the initial network model;
  • the first adjustment module is used to obtain the first calibration network model, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, and the initial weight parameters of the initial network model are adjusted based on the first calibration network model to obtain the first prediction processing model;
  • the second adjustment module is used to obtain a second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the initial network model, and the initial quantization parameter of the activation output of the first preprocessing model is adjusted based on the second calibration network model , to get the target network model.
  • the above-mentioned first adjustment module is specifically used for a learning method based on knowledge distillation, and adjusts the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preset Handle the model.
  • the above-mentioned first adjustment module is specifically used to obtain the first training image set, the first training image set has a hard label; the first training image set input to the initial network model, and output the first result; input the first training image set to the first calibration network model, and output the second result; based on the hard label, the first result and the second result, adjust the initial weight of the initial network model parameters to obtain the first preprocessing model.
  • the above-mentioned first adjustment module is specifically configured to generate a first loss function based on the first result and the hard label; generate a loss function based on the first result and the second result
  • the second loss function using the first loss function and the second loss function to generate a first target loss function, and adjusting the initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.
  • the above-mentioned second adjustment module includes:
  • the first adjustment unit is used for the learning method based on knowledge distillation, and adjusts the activation quantization threshold of the first preprocessing model according to the second calibration network model;
  • the second adjustment unit is configured to adjust the initial quantization parameter of the activation output of the first preprocessing model according to the adjusted activation quantization threshold to obtain the target network model.
  • the above-mentioned first adjustment unit is specifically configured to: acquire the second training image set; input the second training image set into the first preprocessing model, outputting the third result; inputting the second training image set into the second calibration network model, outputting the fourth result; adjusting the activation quantization threshold of the first preprocessing model based on the third result and the fourth result.
  • the above-mentioned first adjustment unit is specifically configured to: generate a second target loss function based on the third result and the fourth result;
  • the objective loss function which adjusts the activation quantization threshold of the first preprocessing model.
  • an embodiment of the present application provides an electronic device/mobile terminal/server, including: a memory and a processor, the memory and the processor are connected to each other in communication, computer instructions are stored in the memory, and the processor executes the Instructions, so as to execute the network model quantification method in the first aspect or any implementation manner of the first aspect.
  • the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and the computer instructions are used to make the computer execute the first aspect or any one of the implementations of the first aspect.
  • Network Model Quantization Methods are used to make the computer execute the first aspect or any one of the implementations of the first aspect.
  • an embodiment of the present application provides a computer program product, the computer program product includes a computer program stored on a computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by the computer, the computer executes The network model quantification method in the first aspect or any implementation manner of the first aspect.
  • Fig. 1 shows a flow chart of the steps of the network model quantification method in one embodiment
  • Fig. 2a shows a schematic diagram of unsaturated mapping in PTQ model quantization in the network model quantization method in one embodiment
  • Figure 2b shows a schematic diagram of saturation mapping in PTQ model quantization in the network model quantization method in an embodiment
  • Fig. 3 shows the flow chart of the steps of the network model quantification method in another embodiment
  • Fig. 4 shows a schematic diagram of the process of adjusting initial network model weight parameters in the network model quantification method in one embodiment
  • Fig. 5 shows the flow chart of the steps of the network model quantification method in another embodiment
  • Fig. 6 shows the flow chart of the steps of the network model quantification method in another embodiment
  • Fig. 7 shows the flow chart of the steps of the network model quantification method in another embodiment
  • Fig. 8 shows a schematic diagram of the process of adjusting the activation output threshold of the first preprocessing model in the network model quantization method in another embodiment
  • Fig. 9 shows a flow chart of the steps of the network model quantification method in another embodiment
  • Fig. 10 shows a flow chart of the steps of the network model quantification method in another embodiment
  • Fig. 11 shows a schematic flowchart of a network model quantification method in another embodiment
  • Fig. 12 shows a structural block diagram of a network model quantization device in an embodiment
  • Fig. 13 shows a structural block diagram of a network model quantization device in an embodiment
  • Fig. 14 shows an internal structural diagram when the computer device of an embodiment is a server
  • Fig. 15 shows an internal structure diagram of an embodiment when the computer device is a terminal.
  • the network model quantification method provided by the embodiment of the present application can be executed by a network model quantification device, and the network model quantification device can be implemented as a computer device through software, hardware, or a combination of software and hardware.
  • the computer device may be a server or a terminal
  • the server in the embodiment of the present application may be a single server, or may be a server cluster composed of multiple servers
  • the terminal in the embodiment of the present application may be Smartphones, personal computers, tablet computers, wearable devices, and other intelligent hardware devices such as intelligent robots.
  • the execution subject is a computer device as an example for illustration.
  • a network model quantification method is provided, and the method is applied to computer equipment as an example for illustration, including the following steps:
  • Step 101 obtain the network model to be processed, the network model to be processed is a pre-trained full-precision network model, quantify the weight parameters and activation output of the network model to be processed according to the quantification requirements, and obtain the initial weight parameter and activation output of the initial Quantization parameters, based on the initial weight parameters and the initial quantization parameters of the activation output, construct the initial network model.
  • the computer device can use the first target image training set to train the neural network model, and obtain the network model to be processed.
  • the network model to be processed is a pre-trained full-precision network model.
  • the network model to be processed can be used to process tasks such as image recognition, image detection, and image classification.
  • the embodiment of the present application does not specifically limit the application scenarios of the network model to be processed.
  • the computer device may also receive a network model to be processed sent by other devices or a network model to be processed input by a user.
  • the embodiment of the present application does not specifically limit the manner in which the computer device acquires the network model to be processed.
  • the computer device performs quantization processing on the weight parameters and activation output of the network model to be processed according to the quantization requirements, and obtains the initial weight parameter and the initial quantization parameter of the activation output, and the initial quantization parameter based on the initial weight parameter and the activation output , to build an initial network model.
  • the quantitative requirement may be input by the user to the computer device based on the input component of the computer device.
  • Quantitative requirements can be changed according to the actual situation. Among them, quantization requirements can represent the bit width requirements of weight parameters and activation outputs.
  • the quantization requirement may be to reduce the size of the network model to be processed by 4 times, and to convert the weight parameter and activation output of the network model to be processed from float32 to int8.
  • the embodiment of the present application does not specifically limit the quantitative requirement.
  • the accuracy of the initial network model is much smaller than that of the network model to be processed, and the size of the initial network model is also much smaller than the size of the network model to be processed.
  • the computer device can use the post-training quantization method (Post-Training Quantization, PTQ) or the training perception quantization method (Training-Aware Quantization, TAQ) to perform quantization processing on the weight parameters and activation outputs of the network model to be processed respectively .
  • the embodiment of the present application does not specifically limit the method of separately quantizing the weight parameter and the activation output of the network model to be processed.
  • the following uses the PTQ method to quantify the weight parameters and activation outputs of the network model to be processed respectively for explanation.
  • the central idea of using the PTQ quantization method is to calculate the quantization threshold T, and determine the mapping relationship between the weight of the network model to be processed and the weight of the initial network model and the activation output of the network model to be processed and the activation output of the initial network model according to the quantization threshold T mapping relationship.
  • the mapping relationship between the activation outputs includes saturated mapping and unsaturated mapping.
  • the unsaturated mapping shown in Figure 2a is used.
  • the quantization threshold T is equal to the maximum value.
  • a saturation map is generally used, as shown in Figure 2b.
  • the quantization threshold T in saturated mapping can be searched by relative entropy divergence or mean square error method. The criterion for finding the quantization threshold T is to find such a threshold, based on which the original value is clipped, and the difference from the original value is still the smallest.
  • the part exceeding the threshold T needs to be clipped as shown in the second item of formula (1).
  • s is the quantization mapping scale factor
  • x is the original value
  • q(x, T) represents the value of x after quantization-inverse quantization
  • n is the number of bit widths to be quantized
  • T is the quantization threshold
  • Step 102 Obtain a first calibration network model, the accuracy of the first calibration network model is higher than that of the initial network model, and adjust the initial weight parameters of the initial network model based on the first calibration network model to obtain a first preprocessing model.
  • the accuracy of the first calibration network model is higher than the accuracy of the initial network model, which can represent that the performance accuracy of the first calibration network model is higher than the performance accuracy of the initial network model and the bandwidth accuracy of the parameters of the first calibration network model is higher than that of the initial network model. At least one of the bandwidth precision of the parameters.
  • the computer device may use the second target image training set to train the neural network model to obtain the first calibration network model.
  • the precision of the first calibration network model is higher than the precision of the initial network model.
  • the first calibration network model can be used for image recognition, image detection and image classification task processing. The embodiment of the present application does not specifically limit the application scenario of the first calibration network model.
  • the computer device may also receive the first calibration network model sent by other devices or the first calibration network model input by the user.
  • the embodiment of this application does not specifically describe the method for the computer device to obtain the first calibration network model. limited.
  • the computer device may adjust the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model.
  • the computer device may also compare the output result of the first calibration network model with the output result of the initial network model, adjust the initial weight parameters of the initial network model according to the comparison result, and obtain the first preprocessing model.
  • step 101 after the full-precision network is converted to a low-precision initial network model, the main reason for the decrease in model performance accuracy generally comes from two parts: the change of weight parameters and the selection of activation thresholds. In the quantization process after training, all weight parameters are usually intercepted by the same approximation method, but the same approximation method may not be suitable for all weight parameters, so this will introduce noise virtually and affect the feature extraction ability of the network model .
  • the first calibration network model is used to correct the initial weight parameters of the initial network model to reduce errors generated in the above process.
  • Step 103 obtain the second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model, adjust the initial quantization parameters of the activation output of the first preprocessing model based on the second calibration network model, and obtain target network model.
  • the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model, which can indicate that the performance accuracy of the second calibration network model is higher than the performance accuracy of the first preprocessing model and the bandwidth accuracy of the parameters of the second calibration network model is high. at least one of the bandwidth accuracy of the parameters of the first preprocessing model.
  • the computer device may use the third target image training set to train the neural network model, and obtain the second calibration network model.
  • the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model.
  • the second calibration network model can be used for image recognition, image detection and image classification task processing. The embodiment of the present application does not specifically limit the application scenario of the second calibration network model.
  • the computer device can also receive the second calibration network model sent by other devices or receive the second calibration network model input by the user.
  • the embodiment of this application does not specifically describe the method for the computer device to obtain the second calibration network model. limited.
  • the second calibration network model may be the same pre-trained full-precision network model as the network model to be processed, or may be a different pre-trained full-precision network model.
  • the computer device may adjust the initial quantization parameters of the activation output of the first preprocessing model according to the second calibration network model to obtain the target network model.
  • the initial activation threshold is further adjusted, and the computer device can also output the second calibration network model
  • the results are compared with the output results of the first preprocessing model, and the initial quantization parameters of the activation output of the first preprocessing model are adjusted according to the comparison results to obtain the target network model, thereby further reducing the cost of converting the full precision model to a low precision model. loss, which improves the accuracy of the model.
  • the initial quantization parameter of the output based on the initial weight parameter and the initial quantization parameter of the activation output, constructs the initial network model. Since the weight parameters of the network model to be processed and the initial quantization parameters of the activation output are quantized, the size of the initial network model constructed based on the initial weight parameters and the initial quantization parameters of the activation output is much smaller than the network model to be processed, thus ensuring The initial network model can run on some end devices and edge devices.
  • the initial weight parameters of the initial network model can be adjusted based on the first calibration network model with a higher accuracy than the initial network model to obtain the first preprocessing model , so that the accuracy of the weight parameters of the first preprocessing model can be guaranteed, thereby improving the accuracy of the first preprocessing model.
  • the initial quantization parameters of the activation output of the first preprocessing model may be adjusted based on the second calibration network model whose accuracy is higher than that of the first preprocessing model, to obtain the target network model.
  • the weight parameters and activation output of the target network model are more accurate, which further improves the accuracy of the target network model and solves the problem of shrinking large deep neural network models through model compression such as quantization and cropping. , which seriously reduces the accuracy of the deep neural network model.
  • the "adjust the initial weight parameters of the initial network model based on the first calibration network model to obtain the first preprocessing model" in the above step 102 may include the following content:
  • the initial weight parameters of the initial network model are adjusted according to the first calibration network model to obtain the first preprocessing model.
  • the computer device can use the knowledge distillation learning method to compare the feature vectors output by each layer of the network in the first calibration network model with the feature vectors output by each layer of the network in the initial network model, and then according to the comparison results, and the first Calibrate the weight parameters corresponding to each layer of network in the network model, and adjust the initial weight parameters in the initial network model.
  • the learning method based on knowledge distillation uses the first calibration network model as a large teacher
  • the network model guides and learns the small quantized initial network model to obtain better weight parameters, adjusts the initial weight parameters of the initial network model according to the first calibration network model, and obtains the first preprocessing model. Therefore, the accuracy of the weight parameters of the obtained first preprocessing model can be guaranteed, and the accuracy of the first preprocessing model can be improved.
  • the above-mentioned "learning method based on knowledge distillation, adjusting the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model” can be Include the following steps:
  • Step 301 acquire a first training image set.
  • the first training image set has hard labels.
  • the hard labels are labels corresponding to each image in the first training image set.
  • the hard label can represent the The target object is labeled.
  • the computer device may receive the first training image set sent by other devices, and may receive the first training image set input by the user.
  • the hard labels attached to the first training image set may be marked manually, or may be marked by a computer device based on a neural network model.
  • the embodiment of the present application does not specifically limit the manner of labeling the hard tags of the first training image set.
  • the first training image set includes multiple first training images.
  • Step 302 input the first training image set into the initial network model, and output the first result.
  • the computer device inputs the first training image set into the initial network model, and the initial network model performs feature extraction on the first training image set, and outputs a first result based on the extracted features.
  • Step 303 input the first training image set into the first calibration network model, and output the second result.
  • the computer device inputs the first training image set into the first calibration network model, the first calibration network model performs feature extraction on the first training image set, and outputs a second result based on the extracted features.
  • Step 304 based on the hard label, the first result and the second result, adjust the initial weight parameters of the initial network model to obtain a first preprocessing model.
  • the computer device compares the first result output by the initial network model with the hard labels carried by the first training image set, and compares the first result output by the initial network model with the second result output by the first calibration network model. Compared. The computer device adjusts the initial weight parameters of the initial network model according to the comparison result to obtain the first preprocessing model.
  • the image X can be an image in the first training image set
  • the teacher network is the first calibration network model
  • W_T is the weight parameter of the teacher network
  • the student network is the initial network model
  • W_S is the initial weight parameter of the student network.
  • the image X is input to the teacher network, and the teacher network outputs the second result, namely P_T.
  • the image X is input to the student network, and the student network outputs the first result, namely P_S.
  • the computer device adjusts initial weight parameters of the initial network model based on P_T, P_S and label Y to obtain a first preprocessing model.
  • the first training image set with hard labels is input to the initial network model and the first calibration network model respectively, and the first result and the second result are respectively output, and the first result is used
  • the relationship between the first result and the second result and the first result and the hard label adjust the initial weight parameters of the initial network model, so that the first result output by the initial network model can be closer to the second result and the hard label, thereby ensuring the weight
  • the accuracy of the first preprocessed model obtained after parameter tuning is improved.
  • step 304 "based on the hard label, the first result and the second result, adjust the initial weight parameters of the initial network model to obtain the first preprocessing model ” may include the following steps:
  • Step 501 generate a first loss function based on the first result and the hard label.
  • the computer device generates the first loss function based on the first result output by the initial network model and the hard label corresponding to the first training image set.
  • the first loss function represents the loss function of the initial network model during the training process.
  • the first loss function may be represented by H(Y, P_S), where Y represents the hard label corresponding to the first training image set, and P_S represents the first result output by the initial network model.
  • Step 502 generating a second loss function based on the first result and the second result.
  • the computer device generates the second loss function based on the first result output by the initial network model and the second result output by the first calibration network model.
  • the second loss function represents the initial network model as the loss function of the student network in the process of imitating the first calibration network model.
  • the second loss function may be represented by H(P_T, P_S), where P_T represents the second result output by the first calibration network model, and P_S represents the first result output by the initial network model.
  • Step 503 using the first loss function and the second loss function to generate a first target loss function, and adjusting the initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.
  • the computer device may add the first loss function and the second loss function to generate the first target loss function, and adjust the initial weight parameters of the initial network model based on the first target loss function to obtain the first preprocessing model .
  • the computer device may also multiply the first loss function by the first weight parameter, multiply the second loss function by the second weight parameter, and perform the multiplication of the first loss function and the second loss function after multiplying the corresponding weights. Adding up to obtain the first objective loss function, and adjusting the initial weight parameters of the initial network model based on the first objective loss function to obtain the first preprocessing model.
  • the computer device can adjust the proportion of each loss function in the training process by adjusting the values of ⁇ and ⁇ .
  • the embodiment of the present application does not specifically limit the values of ⁇ and ⁇ .
  • the first loss function is generated based on the first result output by the initial network model and the hard label of the first training image set, and based on the first result output by the initial network model and the first
  • a second result output by a calibration network model generates a second loss function.
  • the first loss function can be used to represent the gap between the first result and the hard label
  • the second loss function can be used to represent the gap between the first result and the second result. Therefore, using the first loss function and the second loss function, the generated first target loss function can characterize the first result and the hard label and the gap between the first result and the second result.
  • the initial weight parameters of the initial network model are adjusted based on the first objective loss function to obtain a first preprocessing model. Thus improving the accuracy of the first preprocessing model.
  • step 103 "adjust the initial quantization parameters of the activation output of the first preprocessing model based on the second calibration network model to obtain the target network model” , can include the following steps:
  • Step 601 based on the learning method of knowledge distillation, the activation quantization threshold of the first preprocessing model is adjusted according to the second calibration network model.
  • the computer device can use a knowledge distillation learning method to compare the feature vectors output by each layer of the network in the second calibration network model with the feature vectors output by each layer of the network in the first preprocessing model, and then according to the comparison results, and The activation quantization threshold corresponding to each network layer in the first calibration network model is adjusted to the activation quantization threshold in the first preprocessing model.
  • Step 602 Adjust the initial quantization parameter of the activation output of the first preprocessing model according to the adjusted activation quantization threshold to obtain a target network model.
  • the computer device can adjust the initial activation output of the first preprocessing model according to the corresponding relationship between the adjusted activation quantization threshold and the initial quantization parameter of the activation output
  • the quantization parameter according to the quantization parameter of the adjusted activation output, obtains the target network model.
  • the learning method based on knowledge distillation adopts the second calibration network model as the large teacher network model for the small quantized first Preprocessing the model for guided learning to obtain better model parameters.
  • Adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model can ensure the accuracy of the adjusted activation quantization threshold.
  • the initial quantization parameter of the activation output of the first preprocessing model is adjusted to obtain the target network model, which can further ensure the accuracy of the initial quantization parameter of the activation output of the adjusted first preprocessing model performance, thereby improving the accuracy of the obtained target network model.
  • the "knowledge distillation-based learning method, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model" in step 601 may include The following steps:
  • Step 701 acquire a second training image set.
  • the computer device may receive the second training image set sent by other devices, and may receive the second training image set input by the user.
  • the second training image set may be an unlabeled image or an image with a label, and the implementation of the present application does not specifically limit the second training image set.
  • the second training image set can be the same as the first training image set, or can be different from the first training image set.
  • the second training image set may include multiple second training images.
  • Step 702 input the second training image set into the first preprocessing model, and output the third result.
  • the computer device inputs the second training image set into the first preprocessing model, and the first preprocessing model performs feature extraction on the second training image set, and outputs a third result based on the extracted features.
  • Step 703 input the second training image set into the second calibration network model, and output the fourth result.
  • the computer device inputs the second training image set into the second calibration network model, and the second calibration network model performs feature extraction on the second training image set, and outputs a fourth result based on the extracted features.
  • Step 704 Adjust the activation quantization threshold of the first preprocessing model based on the third result and the fourth result.
  • the computer device compares the third result output by the first preprocessing model with the fourth result output by the second calibration network model.
  • the computer device adjusts the activation quantization threshold of the first preprocessing model according to the comparison result.
  • the image X may be an image in the second training image set
  • the full-precision teacher network is the second calibration network model
  • the low-precision student network is the first preprocessing model.
  • the computer device inputs the image X into the all-fine teacher network, and the all-fine teacher network outputs the fourth result, namely P_T in Fig. 8 .
  • the computer device inputs the image X to the low-skilled student network, and the low-skilled student network outputs the third result, namely P_S in FIG. 8 .
  • the computer device adjusts the activation quantization threshold of the first preprocessing model based on P_T, P_S.
  • the second training image set is input to the first preprocessing model and the second calibration network model respectively, and the third result and the fourth result are output, and based on the third result and the fourth result, the first preprocessing
  • the activation quantization threshold of the processing model is adjusted, so that the accuracy of the adjusted activation quantization threshold can be guaranteed, and the accuracy of the first preprocessing model can be further ensured.
  • the "adjusting the activation quantization threshold of the first preprocessing model based on the third result and the fourth result" in the above step 704 may include the following steps:
  • Step 901 generate a second target loss function based on the third result and the fourth result.
  • the computer device generates the second target loss function based on the third result output by the first preprocessing model and the fourth result output by the second calibration network model.
  • the smaller the value of the second objective loss function the more it can indicate that under the same network structure, the first preprocessing model still has a predictive ability similar to that of the second calibration network model after being quantized with the threshold T.
  • T represents the activation quantization threshold of the first pre-processed model and X represents the images in the second training image set.
  • Step 902 Adjust the activation quantization threshold of the first preprocessing model based on the second objective loss function.
  • the computer device adjusts the activation quantization threshold of the first preprocessing model based on the function value calculated by the second target loss function
  • adjusting the activation quantization threshold of the first preprocessing model based on the second objective loss function can ensure the accuracy of the adjusted activation quantization threshold, and further ensure the accuracy of the quantization parameter of the activation output calculated based on the adjusted activation quantization threshold. Accuracy, thereby improving the accuracy of the target network model.
  • the computer device can also set the initial network model and the first preprocessing network model to be the same model, collectively referred to as the initial network in the embodiments of this application Model.
  • the training process of the initial network model can include the following:
  • the computer device first adjusts the initial weight parameters of the initial network model according to the first objective loss function, and then adjusts the activation quantization threshold of the initial network model based on the adjusted initial weight parameters according to the second objective loss function. After one adjustment, the initial network model Both the weight parameter and the activation quantization threshold are unsatisfactory.
  • the computer device continues to adjust the initial weight parameter of the initial network model according to the first objective loss function, and then adjusts the initial network model based on the adjusted initial weight parameter and according to the second objective loss function. Activates the quantization threshold.
  • the computer equipment cyclically adjusts the initial weight parameters and activation quantization thresholds of the initial network model. After multiple iterations of training, the training of the initial network model is finally completed and the target network model is generated, thereby ensuring the accuracy of the target network model.
  • the embodiment of the present application provides an overall flow of the network model quantification method, as shown in FIG. 10 , the method includes:
  • Step 1001 obtain the network model to be processed, the network model to be processed is a pre-trained full-precision network model, quantify the weight parameters and activation output of the network model to be processed according to the quantification requirements, and obtain the initial weight parameter and activation output of the initial Quantization parameters, based on the initial weight parameters and the initial quantization parameters of the activation output, construct the initial network model.
  • Step 1002 acquire a first training image set.
  • Step 1003 input the first training image set into the initial network model, and output the first result.
  • Step 1004 acquire the first calibration network model, input the first training image set into the first calibration network model, and output the second result.
  • Step 1005 generating a first loss function based on the first result and the hard label.
  • Step 1006 generating a second loss function based on the first result and the second result.
  • Step 1007 using the first loss function and the second loss function to generate a first target loss function, and adjusting the initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.
  • Step 1008 acquiring a second training image set.
  • Step 1009 input the second training image set into the first preprocessing model, and output the third result.
  • Step 1010 acquire a second calibration network model, input the second training image set into the second calibration network model, and output a fourth result.
  • Step 1011 generate a second target loss function based on the third result and the fourth result.
  • Step 1012 Adjust the activation quantization threshold of the first preprocessing model based on the second objective loss function.
  • Step 1013 according to the adjusted activation quantization threshold, adjust the initial quantization parameter of the activation output of the first preprocessing model to obtain the target network model.
  • the above network model quantification method may be shown in Figure 11, including the following steps:
  • Parameter initialization of the low-precision network Based on the pre-trained full-precision student network, the post-training quantization method (PTQ) is used to initialize the student network with low precision, and the low-precision weight values and activations of the student network that need to be quantified are initially determined. Quantize range values.
  • PTQ post-training quantization method
  • Network structure deployment Based on the quantified network model parameters, deploy the model structure on the actual hardware platform to perform corresponding task processing, such as image classification/detection/recognition tasks, or natural language processing tasks.
  • task processing such as image classification/detection/recognition tasks, or natural language processing tasks.
  • 9-10 may include multiple steps or multiple stages, and these steps or stages are not necessarily executed at the same time, but may be Performed at different times, the execution order of these steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a part of steps or stages in other steps.
  • the network model quantization device 1200 includes: a quantization processing module 1210, a first adjustment module 1220, and a second adjustment module 1230, wherein:
  • the quantization processing module 1210 is used to obtain the network model to be processed.
  • the network model to be processed is a pre-trained full-precision network model, and the weight parameters and activation outputs of the network model to be processed are respectively quantized according to the quantization requirements to obtain the initial weight parameters and
  • the initial quantization parameter of the activation output based on the initial weight parameter and the initial quantization parameter of the activation output, constructs the initial network model.
  • the first adjustment module 1220 is configured to obtain a first calibration network model, the accuracy of the first calibration network model is higher than that of the initial network model, and adjust the initial weight parameters of the initial network model based on the first calibration network model to obtain the first Preprocess the model.
  • the second adjustment module 1230 is configured to obtain a second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the initial network model, and the initial quantization parameter of the activation output of the first preprocessing model is performed based on the second calibration network model Adjust to get the target network model.
  • the above-mentioned first adjustment module 1220 is specifically used for a learning method based on knowledge distillation, and adjusts the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model.
  • the above-mentioned first adjustment module 1220 is specifically used to obtain the first training image set, the first training image set has hard labels; input the first training image set to the initial network model, and output the first Result; the first training image set is input to the first calibration network model, and the second result is output; based on the hard label, the first result and the second result, the initial weight parameters of the initial network model are adjusted to obtain the first preprocessing model.
  • the above-mentioned first adjustment module 1220 is specifically configured to generate the first loss function based on the first result and the hard label; generate the second loss function based on the first result and the second result; use the first loss function and a second loss function to generate a first objective loss function, and adjust the initial weight parameters of the initial network model based on the first objective loss function to obtain a first preprocessing model.
  • the above-mentioned second adjustment module 1230 includes: a first adjustment unit 1231 and a second adjustment unit 1232, wherein:
  • the first adjustment unit 1231 is used for a learning method based on knowledge distillation, and adjusts the activation quantization threshold of the first preprocessing model according to the second calibration network model;
  • the second adjustment unit 1232 is configured to adjust the initial quantization parameter of the activation output of the first preprocessing model according to the adjusted activation quantization threshold to obtain the target network model.
  • the above-mentioned first adjustment unit 1231 is specifically configured to: acquire the second training image set; input the second training image set into the first preprocessing model, and output the third result; The set is input to the second calibration network model, and the fourth result is output; based on the third result and the fourth result, the activation quantization threshold of the first preprocessing model is adjusted.
  • the above-mentioned first adjustment unit 1231 is specifically configured to: generate a second target loss function based on the third result and the fourth result; adjust the first preprocessing model based on the second target loss function The activation quantification threshold for .
  • Each module in the above-mentioned apparatus for network model quantification can be fully or partially realized by software, hardware and a combination thereof.
  • the above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can invoke and execute the corresponding operations of the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a terminal, and its internal structure may be as shown in FIG. 14 .
  • the computer device includes a processor, a memory, a communication interface, a display screen and an input device connected through a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and computer programs.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the communication interface of the computer device is used to communicate with an external terminal in a wired or wireless manner, and the wireless manner can be realized through WIFI, an operator network, NFC (Near Field Communication) or other technologies.
  • a network model quantification method is implemented.
  • the display screen of the computer device may be a liquid crystal display screen or an electronic ink display screen
  • the input device of the computer device may be a touch layer covered on the display screen, or a button, a trackball or a touch pad provided on the casing of the computer device , and can also be an external keyboard, touchpad, or mouse.
  • Figure 14 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer equipment on which the solution of this application is applied.
  • the specific computer equipment can be More or fewer components than shown in the figures may be included, or some components may be combined, or have a different arrangement of components.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 15 .
  • the computer device includes a processor, memory and a network interface connected by a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer programs and databases.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer device is used to store network model quantification data.
  • the network interface of the computer device is used to communicate with an external terminal via a network connection. When the computer program is executed by a processor, a network model quantification method is implemented.
  • Figure 15 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer equipment on which the solution of this application is applied.
  • the specific computer equipment can be More or fewer components than shown in the figures may be included, or some components may be combined, or have a different arrangement of components.
  • a computer device including a memory and a processor.
  • a computer program is stored in the memory.
  • the processor executes the computer program, the following steps are implemented: acquiring a network model to be processed, the network model to be processed is The pre-trained full-precision network model performs quantization processing on the weight parameters and activation output of the network model to be processed according to the quantification requirements, and obtains the initial weight parameter and the initial quantization parameter of the activation output, based on the initial weight parameter and the initial quantization parameter of the activation output , construct the initial network model; obtain the first calibration network model, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, adjust the initial weight parameters of the initial network model based on the first calibration network model, and obtain the first preprocessing Model; obtain the second calibration network model, the accuracy of the second calibration network model is higher than the accuracy of the first preprocessing model, and adjust the initial quantization parameters of the activation output of the first preprocessing model based on the second calibration
  • the processor when the processor executes the computer program, the following steps are further implemented: using a learning method based on knowledge distillation, adjusting the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model.
  • the processor executes the computer program, the following steps are also implemented: obtaining the first training image set, the first training image set has hard labels; inputting the first training image set to the initial network model, and outputting the first training image set A result; the first training image set is input to the first calibration network model, and the second result is output; based on the hard label, the first result and the second result, the initial weight parameters of the initial network model are adjusted to obtain the first preprocessing model.
  • the processor when the processor executes the computer program, the following steps are also implemented: generating a first loss function based on the first result and the hard label; generating a second loss function based on the first result and the second result; using the first loss function and a second loss function, generate a first target loss function, and adjust initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.
  • the processor when the processor executes the computer program, the following steps are also implemented: a learning method based on knowledge distillation, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model; according to the adjusted activation quantization threshold , adjust the initial quantization parameters of the activation output of the first preprocessing model to obtain the target network model.
  • the processor when the processor executes the computer program, the following steps are also implemented: acquiring the second training image set; inputting the second training image set into the first preprocessing model, and outputting the third result; The set is input to the second calibration network model, and the fourth result is output; based on the third result and the fourth result, the activation quantization threshold of the first preprocessing model is adjusted.
  • the processor when the processor executes the computer program, the following steps are also implemented: generating a second target loss function based on the third result and the fourth result; adjusting the activation of the first preprocessing model based on the second target loss function Quantization Threshold.
  • a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the following steps are implemented: obtaining a network model to be processed, the network model to be processed is pre-trained According to the quantization requirements, the weight parameters and activation output of the network model to be processed are respectively quantized to obtain the initial weight parameters and the initial quantization parameters of the activation output.
  • the initial Network model Based on the initial weight parameters and the initial quantization parameters of the activation output, the initial Network model; obtain a first calibration network model, the accuracy of the first calibration network model is higher than the accuracy of the initial network model, and adjust the initial weight parameters of the initial network model based on the first calibration network model to obtain a first preprocessing model; obtain The second calibration network model, the precision of the second calibration network model is higher than the precision of the first preprocessing model, and the initial quantization parameter of the activation output of the first preprocessing model is adjusted based on the second calibration network model to obtain the target network model.
  • the following steps are further implemented: using a learning method based on knowledge distillation, adjusting the initial weight parameters of the initial network model according to the first calibration network model to obtain the first preprocessing model.
  • the following steps are also implemented: obtaining the first training image set, the first training image set has hard labels; inputting the first training image set to the initial network model, and outputting The first result; input the first training image set to the first calibration network model, and output the second result; based on the hard label, the first result and the second result, adjust the initial weight parameters of the initial network model to obtain the first preprocessing model .
  • the following steps are further implemented: generating a first loss function based on the first result and the hard label; generating a second loss function based on the first result and the second result; using the first A loss function and a second loss function, generating a first target loss function, and adjusting initial weight parameters of the initial network model based on the first target loss function to obtain a first preprocessing model.
  • the following steps are also implemented: using a learning method based on knowledge distillation, adjusting the activation quantization threshold of the first preprocessing model according to the second calibration network model; Threshold, adjust the initial quantization parameter of the activation output of the first preprocessing model to obtain the target network model.
  • the following steps are also implemented: acquiring the second training image set; inputting the second training image set into the first preprocessing model, and outputting the third result; The image set is input to the second calibration network model, and a fourth result is output; based on the third result and the fourth result, the activation quantization threshold of the first preprocessing model is adjusted.
  • the following steps are further implemented: generating a second target loss function based on the third result and the fourth result; adjusting the first preprocessing model based on the second target loss function Activates the quantization threshold.
  • the storage medium can be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a flash memory (Flash Memory), a hard disk (Hard Disk Drive) , abbreviation: HDD) or solid-state drive (Solid-State Drive, SSD), etc.; the storage medium may also include a combination of the above-mentioned types of memory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

La présente demande divulgue un procédé et un appareil de quantification de modèle de réseau, ainsi qu'un dispositif informatique et un support de stockage, qui sont applicables au domaine technique de l'intelligence artificielle. Le procédé de quantification de modèle de réseau consiste à : acquérir un modèle de réseau à traiter en fonction des exigences de quantification, effectuer séparément un traitement de quantification sur un paramètre de poids et une sortie d'activation du modèle de réseau à traiter de manière à obtenir un paramètre de poids initial et un paramètre de quantification initial de la sortie d'activation, puis construire un modèle de réseau initial ; acquérir un premier modèle de réseau d'étalonnage, puis ajuster le paramètre de poids initial du modèle de réseau initial d'après le premier modèle de réseau d'étalonnage afin d'obtenir un premier modèle prétraité ; et acquérir un second modèle de réseau d'étalonnage, puis ajuster un paramètre de quantification initial d'une sortie d'activation du premier modèle prétraité d'après le second modèle de réseau d'étalonnage afin d'obtenir un modèle de réseau cible. Le procédé permet de résoudre le problème de la réduction de précision d'un modèle de réseau neuronal profond de grande taille provoqué par la réduction du modèle de réseau neuronal profond au moyen d'une compression de modèle telle que la quantification et le recadrage.
PCT/CN2022/078256 2021-09-28 2022-02-28 Procédé et appareil de quantification de modèle de réseau, et dispositif informatique et support de stockage WO2023050707A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111139349.XA CN113610232B (zh) 2021-09-28 2021-09-28 网络模型量化方法、装置、计算机设备以及存储介质
CN202111139349.X 2021-09-28

Publications (1)

Publication Number Publication Date
WO2023050707A1 true WO2023050707A1 (fr) 2023-04-06

Family

ID=78343259

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/078256 WO2023050707A1 (fr) 2021-09-28 2022-02-28 Procédé et appareil de quantification de modèle de réseau, et dispositif informatique et support de stockage

Country Status (2)

Country Link
CN (1) CN113610232B (fr)
WO (1) WO2023050707A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116542344A (zh) * 2023-07-05 2023-08-04 浙江大华技术股份有限公司 一种模型自动化部署方法、平台和系统
CN116579407A (zh) * 2023-05-19 2023-08-11 北京百度网讯科技有限公司 神经网络模型的压缩方法、训练方法、处理方法和装置
CN116721399A (zh) * 2023-07-26 2023-09-08 之江实验室 一种量化感知训练的点云目标检测方法及装置
CN117077740A (zh) * 2023-09-25 2023-11-17 荣耀终端有限公司 模型量化方法和设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610232B (zh) * 2021-09-28 2022-02-22 苏州浪潮智能科技有限公司 网络模型量化方法、装置、计算机设备以及存储介质
CN115570228B (zh) * 2022-11-22 2023-03-17 苏芯物联技术(南京)有限公司 一种焊接管道供气智能反馈控制方法与系统
CN117689044A (zh) * 2024-02-01 2024-03-12 厦门大学 一种适用于视觉自注意力模型的量化方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276451A (zh) * 2019-06-28 2019-09-24 南京大学 一种基于权重归一化的深度神经网络压缩方法
CN110443165A (zh) * 2019-07-23 2019-11-12 北京迈格威科技有限公司 神经网络量化方法、图像识别方法、装置和计算机设备
CN112016674A (zh) * 2020-07-29 2020-12-01 魔门塔(苏州)科技有限公司 一种基于知识蒸馏的卷积神经网络的量化方法
CN112508169A (zh) * 2020-11-13 2021-03-16 华为技术有限公司 知识蒸馏方法和系统
CN113610232A (zh) * 2021-09-28 2021-11-05 苏州浪潮智能科技有限公司 网络模型量化方法、装置、计算机设备以及存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190164057A1 (en) * 2019-01-30 2019-05-30 Intel Corporation Mapping and quantification of influence of neural network features for explainable artificial intelligence
US20210142177A1 (en) * 2019-11-13 2021-05-13 Nvidia Corporation Synthesizing data for training one or more neural networks
CN111753761B (zh) * 2020-06-28 2024-04-09 北京百度网讯科技有限公司 模型生成方法、装置、电子设备及存储介质
CN112200296B (zh) * 2020-07-31 2024-04-05 星宸科技股份有限公司 网络模型量化方法、装置、存储介质及电子设备
CN112308019B (zh) * 2020-11-19 2021-08-17 中国人民解放军国防科技大学 基于网络剪枝和知识蒸馏的sar舰船目标检测方法
CN113011581B (zh) * 2021-02-23 2023-04-07 北京三快在线科技有限公司 神经网络模型压缩方法、装置、电子设备及可读存储介质
CN112988975A (zh) * 2021-04-09 2021-06-18 北京语言大学 一种基于albert和知识蒸馏的观点挖掘方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276451A (zh) * 2019-06-28 2019-09-24 南京大学 一种基于权重归一化的深度神经网络压缩方法
CN110443165A (zh) * 2019-07-23 2019-11-12 北京迈格威科技有限公司 神经网络量化方法、图像识别方法、装置和计算机设备
CN112016674A (zh) * 2020-07-29 2020-12-01 魔门塔(苏州)科技有限公司 一种基于知识蒸馏的卷积神经网络的量化方法
CN112508169A (zh) * 2020-11-13 2021-03-16 华为技术有限公司 知识蒸馏方法和系统
CN113610232A (zh) * 2021-09-28 2021-11-05 苏州浪潮智能科技有限公司 网络模型量化方法、装置、计算机设备以及存储介质

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116579407A (zh) * 2023-05-19 2023-08-11 北京百度网讯科技有限公司 神经网络模型的压缩方法、训练方法、处理方法和装置
CN116579407B (zh) * 2023-05-19 2024-02-13 北京百度网讯科技有限公司 神经网络模型的压缩方法、训练方法、处理方法和装置
CN116542344A (zh) * 2023-07-05 2023-08-04 浙江大华技术股份有限公司 一种模型自动化部署方法、平台和系统
CN116721399A (zh) * 2023-07-26 2023-09-08 之江实验室 一种量化感知训练的点云目标检测方法及装置
CN116721399B (zh) * 2023-07-26 2023-11-14 之江实验室 一种量化感知训练的点云目标检测方法及装置
CN117077740A (zh) * 2023-09-25 2023-11-17 荣耀终端有限公司 模型量化方法和设备
CN117077740B (zh) * 2023-09-25 2024-03-12 荣耀终端有限公司 模型量化方法和设备

Also Published As

Publication number Publication date
CN113610232A (zh) 2021-11-05
CN113610232B (zh) 2022-02-22

Similar Documents

Publication Publication Date Title
WO2023050707A1 (fr) Procédé et appareil de quantification de modèle de réseau, et dispositif informatique et support de stockage
US10991074B2 (en) Transforming source domain images into target domain images
US20230376771A1 (en) Training machine learning models by determining update rules using neural networks
US20210201147A1 (en) Model training method, machine translation method, computer device, and storage medium
US11657254B2 (en) Computation method and device used in a convolutional neural network
CN110880036B (zh) 神经网络压缩方法、装置、计算机设备及存储介质
CN112106081A (zh) 提供综合机器学习服务的应用开发平台和软件开发套件
TWI767000B (zh) 產生波形之方法及電腦儲存媒體
US20180350109A1 (en) Method and device for data quantization
US20190340492A1 (en) Design flow for quantized neural networks
US20230042221A1 (en) Modifying digital images utilizing a language guided image editing model
TWI744724B (zh) 處理卷積神經網路的方法
US20240185086A1 (en) Model distillation method and related device
JP2022169743A (ja) 情報抽出方法、装置、電子機器及び記憶媒体
KR102508860B1 (ko) 이미지에서의 키 포인트 위치의 인식 방법, 장치, 전자기기 및 매체
JP2023547010A (ja) 知識の蒸留に基づくモデルトレーニング方法、装置、電子機器
US20220004849A1 (en) Image processing neural networks with dynamic filter activation
WO2016142285A1 (fr) Procédé et appareil de recherche d'images à l'aide d'opérateurs d'analyse dispersants
WO2023020456A1 (fr) Procédé et appareil de quantification de modèle de réseau, dispositif et support de stockage
US20220044109A1 (en) Quantization-aware training of quantized neural networks
Huai et al. Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
JP6467893B2 (ja) 情報処理システム、情報処理方法、及び、プログラム
CN117315758A (zh) 面部表情的检测方法、装置、电子设备及存储介质
US20230046088A1 (en) Method for training student network and method for recognizing image
US10530387B1 (en) Estimating an optimal ordering for data compression

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22874110

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE