WO2023060959A1 - Neural network model quantification method, system and device, and computer-readable medium - Google Patents

Neural network model quantification method, system and device, and computer-readable medium Download PDF

Info

Publication number
WO2023060959A1
WO2023060959A1 PCT/CN2022/105317 CN2022105317W WO2023060959A1 WO 2023060959 A1 WO2023060959 A1 WO 2023060959A1 CN 2022105317 W CN2022105317 W CN 2022105317W WO 2023060959 A1 WO2023060959 A1 WO 2023060959A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
quantization
neural network
network model
layer
Prior art date
Application number
PCT/CN2022/105317
Other languages
French (fr)
Chinese (zh)
Inventor
陈其宾
李锐
张晖
Original Assignee
山东浪潮科学研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 山东浪潮科学研究院有限公司 filed Critical 山东浪潮科学研究院有限公司
Publication of WO2023060959A1 publication Critical patent/WO2023060959A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the present invention relates to the technical field of neural network, in particular to a neural network model quantification method, system, device and computer readable medium.
  • neural network models have been widely used in many fields and have achieved very good results.
  • the neural network model results in low inference efficiency and long inference time, especially when running on mobile devices with low performance and low power consumption devices. Therefore, how to design a model with low resource consumption, real-time prediction and guaranteed prediction accuracy has become a practical problem.
  • models with low resource consumption are required.
  • models that can predict in real time are required.
  • we can generally start with designing an efficient model architecture designing a model architecture suitable for specific hardware, network pruning, knowledge distillation, and model quantization. Among them, model quantization has achieved good results in this problem. Quantizing the model from floating-point type to fixed-point type can effectively reduce the size of the model and improve the speed of model reasoning.
  • the technical task of the present invention is to address the above deficiencies and provide a neural network model quantization method, system, device and computer readable medium to solve the technical problem of how to avoid calculating activation value quantization factors during reasoning.
  • the neural network model quantification method of the present invention calculates the activation value quantification factor of each layer of the neural network model by minimizing the equation before the neural network model performs inference, and the method includes the following steps:
  • For the target model calculate the model weight quantization factor based on the quantization range by calculating the maximum value of the absolute value of the model weight;
  • For each layer of the target model perform model inference through quantized weights and activation values of the fixed-point type, and dequantize the inference result into an int32 data type;
  • each operator is quantized by asymmetric quantization, the weight of the floating-point model is quantized into an int8 data type, and the activation value is quantized into a uint8 data type to obtain the final quantized model.
  • the model weight is quantized as int8 type, and the quantization range is [-128,127].
  • the test data set is obtained, the mean square error of the output after quantization and the output without quantization of each layer of the target model is calculated based on the test data set, and the quantization factor of the activation value is obtained by minimizing the mean square error.
  • the mean square error formula is:
  • y i represents the non-quantized output, Indicates the output after quantization.
  • the neural network model quantification system of the present invention is used to calculate the activation value quantification factor of each layer of the neural network model by minimizing the equation before the neural network model performs inference, and the system includes:
  • Construct a training module which is used to construct a neural network model, and train the neural network model to obtain a floating-point type neural network model as the target model;
  • a quantization factor calculation module the quantization factor calculation module is applied to the target model, and is used to calculate the model weight quantization factor based on the quantization range by calculating the maximum absolute value of the model weight;
  • An activation value quantization factor calculation module the activation value quantization factor calculation module is applied to each layer of the target model, and is used to calculate the activation value quantization factor of each layer of the target model by minimizing the mean square error;
  • the inference inverse quantization module is applied to each layer of the target model, for performing model inference through quantized fixed-point type weights and activation values, and inverse quantization of inference results into int32 data types;
  • the final quantization module which is applied to each layer of the target model, is used to quantize each operator by means of asymmetric quantization, quantizes the weight of the floating-point type model into an int8 data type, and converts the activation value Quantize to uint8 data type to get the final quantized model.
  • the model weight is quantized as int8 type, and the quantization range is [-128,127].
  • the activation value quantization factor calculation module is used to calculate the activation value quantization factor by the following method:
  • test set data set calculate the mean square error of the output of each layer of the target model after quantization and non-quantization, and obtain the quantization factor of the activation value by minimizing the mean square error.
  • the mean square error formula is:
  • y i represents the non-quantized output, Indicates the output after quantization.
  • the device of the present invention includes: at least one memory and at least one processor;
  • said at least one memory for storing machine-readable programs
  • the at least one processor is configured to invoke the machine-readable program to execute the method described in any one of the first aspects.
  • the computer-readable medium of the present invention stores computer instructions on the computer-readable medium, and when the computer instructions are executed by a processor, the processor executes the method described in any one of the first aspect .
  • the neural network model quantification method, system, device, and computer-readable medium of the present invention have the following advantages: the method for calculating the quantization factor of the activation value in advance before model reasoning, and calculating the quantization factor of the activation value of each layer by minimizing the mean square error, The accuracy of the model is guaranteed, and the reasoning speed of the model is improved by calculating the quantization factor in advance.
  • Fig. 1 is a flow chart of the neural network model quantification method in Embodiment 1.
  • Embodiments of the present invention provide a neural network model quantization method, system, device, and computer-readable medium for solving the technical problem of how to avoid calculating activation value quantization factors during reasoning.
  • the neural network model quantification method of the present invention calculates the activation value quantification factor of each layer of the neural network model by minimizing the equation, and the method includes the following steps:
  • the quantization factor of the model weight is calculated in step S200.
  • the quantization factor of the model weight is calculated based on the quantization range.
  • the model weight is quantized into an int8 type, so the quantization range is [-128, 127].
  • Step S300 calculates the quantization factor of the activation value of each layer by minimizing the mean square error, calculates the mean square error of the quantized output and the non-quantized output of each layer based on a part of the test data set, and obtains the quantization of the activation value by minimizing the mean square error factor.
  • the formula for mean square error is:
  • y i represents the non-quantized output, Indicates the output after quantization.
  • Each layer of the model repeats step S400 and step S500 to calculate, and finally obtains the model output.
  • the neural network model quantification system of the present invention is used to calculate the activation value quantification factor of each layer of the neural network model by minimizing the equation before the neural network model performs inference.
  • the system includes a training module, a quantization factor calculation module, an activation value quantization factor calculation module, an inference inverse quantization module, and a final quantization module.
  • the training module is used to build a neural network model, and the neural network model is trained to obtain floating point types.
  • the neural network model of the target model is used as the target model; the quantization factor calculation module is applied to the target model, and is used to calculate the model weight quantization factor based on the quantization range by calculating the maximum absolute value of the model weight; the activation value quantization factor calculation module is applied to each of the target model Layer, which is used to calculate the activation value quantization factor of each layer of the target model by minimizing the mean square error; the inference inverse quantization module is applied to each layer of the target model, and is used to perform model quantization by weights and activation values of the fixed-point type Inference, and dequantize the inference result into int32 data type; the final quantization module is applied to each layer of the target model to quantize each operator through asymmetric quantization, and quantize the weight of the floating-point type model into int8 data type, and quantize the activation value into a uint8 data type to obtain the final quantized model.
  • the model weights are quantized to int8 type, and the quantization range is [-128,127].
  • the activation value quantization factor calculation module is used to calculate the activation value quantization factor by the following method: obtain the test set data set, calculate the mean square error of the quantized output of each layer of the target model and the output without quantization, and obtain the activation value by minimizing the mean square error quantization factor.
  • the formula for mean square error is:
  • y i represents the non-quantized output, Indicates the output after quantization.
  • the device of the present invention includes: at least one memory and at least one processor; at least one memory for storing a machine-readable program; at least one processor for calling the machine-readable program to execute the disclosure disclosed in Embodiment 1 of the present invention Methods.
  • An embodiment of the present invention also provides a computer-readable medium, wherein computer instructions are stored on the computer-readable medium, and when the computer instructions are executed by a processor, the processor executes the method disclosed in Embodiment 1 of the present invention.
  • a system or device equipped with a storage medium may be provided, on which a software program code for realizing the functions of any of the above embodiments is stored, and the computer (or CPU or MPU of the system or device) ) to read and execute the program code stored in the storage medium.
  • the program code itself read from the storage medium can realize the function of any one of the above-mentioned embodiments, so the program code and the storage medium storing the program code constitute a part of the present invention.
  • Examples of storage media for providing program code include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), Tape, non-volatile memory card, and ROM.
  • the program code can be downloaded from a server computer via a communication network.
  • the program code read from the storage medium is written into the memory provided in the expansion board inserted into the computer or written into the memory provided in the expansion unit connected to the computer, and then based on the program code
  • the instruction causes the CPU installed on the expansion board or the expansion unit to perform some or all of the actual operations, so as to realize the functions of any one of the above-mentioned embodiments.
  • the hardware unit may be implemented mechanically or electrically.
  • a hardware unit may include permanently dedicated circuitry or logic (such as a dedicated processor, FPGA or ASIC) to perform the corresponding operations.
  • the hardware unit may also include programmable logic or circuits (such as general-purpose processors or other programmable processors), which can be temporarily set by software to complete corresponding operations.
  • the specific implementation mechanical way, or a dedicated permanent circuit, or a temporary circuit

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention belongs to the technical field of neural networks, and disclosed are a neural network model quantification method, system and device, and a computer-readable medium, wherein the technical problem to be solved is how to avoid calculating activation value quantization factors during inference. The method comprises the following steps: constructing a neural network model and training the neural network model; for a target model, calculating a model weight quantization factor on the basis of a quantization range by calculating the maximum value of an absolute value of a model weight; for each layer of the target model, by minimizing a mean square error, calculating an activation value quantization factor of each layer of the target model; for each layer of the target model, performing model inference by means of a weight and activation value of a quantized fixed-point type, and inversely quantizing the inference result into an int32 data type; and for each layer of the target model, quantizing each operator by means of asymmetric quantization, quantizing a weight of a floating-point type model into an int8 data type, and quantizing an activation value into a uint8 data type.

Description

神经网络模型量化方法、系统、装置及计算机可读介质Neural network model quantification method, system, device and computer readable medium 技术领域technical field
本发明涉及神经网络技术领域,具体地说是神经网络模型量化方法、系统、装置及计算机可读介质。The present invention relates to the technical field of neural network, in particular to a neural network model quantification method, system, device and computer readable medium.
背景技术Background technique
近年来,神经网络模型被广泛应用在许多领域,并取得了非常好的效果。但是,神经网络模型由于模型复杂度高、模型大,导致推理时效率较低,推理时间较长,尤其是运行在性能较低的移动设备以及低功耗设备。因此,如何设计低资源消耗的,可以实时预测的、同时保证预测精度的模型成为一个现实问题。在类似于单片机的低功耗设备上,需要低资源消耗的模型。在对实时性要求高的领域,如语音识别、自动驾驶,要求可以实时预测的模型。针对该问题,一般可以从设计高效的模型架构、设计适合特定硬件的模型架构、网络剪枝、知识蒸馏以及模型量化等方面入手。其中,模型量化在该问题取得了较好的效果,将模型从浮点类型量化成定点类型可以有效降低模型大小,同时提高模型推理速度。In recent years, neural network models have been widely used in many fields and have achieved very good results. However, due to the high complexity and large size of the model, the neural network model results in low inference efficiency and long inference time, especially when running on mobile devices with low performance and low power consumption devices. Therefore, how to design a model with low resource consumption, real-time prediction and guaranteed prediction accuracy has become a practical problem. On low-power devices like microcontrollers, models with low resource consumption are required. In areas with high real-time requirements, such as speech recognition and automatic driving, models that can predict in real time are required. To solve this problem, we can generally start with designing an efficient model architecture, designing a model architecture suitable for specific hardware, network pruning, knowledge distillation, and model quantization. Among them, model quantization has achieved good results in this problem. Quantizing the model from floating-point type to fixed-point type can effectively reduce the size of the model and improve the speed of model reasoning.
为了提高模型推理速度,如何避免在推理时计算激活值量化因子,是需要解决的技术问题。In order to improve the model inference speed, how to avoid calculating the quantization factor of the activation value during inference is a technical problem that needs to be solved.
发明内容Contents of the invention
本发明的技术任务是针对以上不足,提供神经网络模型量化方法、系统、装置及计算机可读介质,来解决如何避免在推理时计算激活值量化因子的技术问题。The technical task of the present invention is to address the above deficiencies and provide a neural network model quantization method, system, device and computer readable medium to solve the technical problem of how to avoid calculating activation value quantization factors during reasoning.
第一方面,本发明的神经网络模型量化方法,在神经网络模型进行推理之前,通过最小化方程计算神经网络模型每层的激活值量化因子,所述方法包括如下步骤:In the first aspect, the neural network model quantification method of the present invention calculates the activation value quantification factor of each layer of the neural network model by minimizing the equation before the neural network model performs inference, and the method includes the following steps:
构建神经网络模型,并对神经网络模型进行训练,得到浮点类型的神经网络模型作为目标模型;Construct a neural network model and train the neural network model to obtain a floating-point neural network model as the target model;
对于目标模型,通过计算模型权重绝对值最大值,基于量化范围计算模型权重量化因子;For the target model, calculate the model weight quantization factor based on the quantization range by calculating the maximum value of the absolute value of the model weight;
对于目标模型的每一层,通过最小化均方误差,计算目标模型每层的激活值量化因子;For each layer of the target model, calculate the activation value quantization factor of each layer of the target model by minimizing the mean square error;
对于目标模型的每一层,通过量化后的定点类型的权重和激活值进行模型推理,并将推理结果逆量化为int32数据类型;For each layer of the target model, perform model inference through quantized weights and activation values of the fixed-point type, and dequantize the inference result into an int32 data type;
对于目标模型的每一层,通过非对称量化的方式对每个算子进行量化,将浮点类型模型权重量化成int8数据类型,并将激活值量化成uint8数据类型,得到最终量化后模型。For each layer of the target model, each operator is quantized by asymmetric quantization, the weight of the floating-point model is quantized into an int8 data type, and the activation value is quantized into a uint8 data type to obtain the final quantized model.
作为优选,对于目标模型,模型权重量化为int8类型,量化范围为[-128,127]。Preferably, for the target model, the model weight is quantized as int8 type, and the quantization range is [-128,127].
作为优选,获取测试数据集,基于测试数据集计算目标模型每层量化后输出以及不量化后输出的均方误差,通过最小化均方误差得到激活值量化因子。Preferably, the test data set is obtained, the mean square error of the output after quantization and the output without quantization of each layer of the target model is calculated based on the test data set, and the quantization factor of the activation value is obtained by minimizing the mean square error.
作为优选,所述均方误差公式为:As preferably, the mean square error formula is:
Figure PCTCN2022105317-appb-000001
Figure PCTCN2022105317-appb-000001
其中,y i表示不量化输出,
Figure PCTCN2022105317-appb-000002
表示量化后输出。
Among them, y i represents the non-quantized output,
Figure PCTCN2022105317-appb-000002
Indicates the output after quantization.
第二方面,本发明的神经网络模型量化系统,用于在神经网络模型进行推理之前,通过最小化方程计算神经网络模型每层的激活值量化因子,所述系统包括:In the second aspect, the neural network model quantification system of the present invention is used to calculate the activation value quantification factor of each layer of the neural network model by minimizing the equation before the neural network model performs inference, and the system includes:
构建训练模块,所述构建训练模块用于构建神经网络模型,并对神经网络模型进行训练,得到浮点类型的神经网络模型作为目标模型;Construct a training module, which is used to construct a neural network model, and train the neural network model to obtain a floating-point type neural network model as the target model;
量化因子计算模块,所述量化因子计算模块应用于目标模型,用于通过计算模型权重绝对值最大值,基于量化范围计算模型权重量化因子;A quantization factor calculation module, the quantization factor calculation module is applied to the target model, and is used to calculate the model weight quantization factor based on the quantization range by calculating the maximum absolute value of the model weight;
激活值量化因子计算模块,所述激活值量化因子计算模块应用于目标模型的每一层,用于通过最小化均方误差,计算目标模型每层的激活值量化因子;An activation value quantization factor calculation module, the activation value quantization factor calculation module is applied to each layer of the target model, and is used to calculate the activation value quantization factor of each layer of the target model by minimizing the mean square error;
推理逆量化模块,所述推理逆量化模块应用于目标模型的每一层,用于通过量化后的定点类型的权重和激活值进行模型推理,并将推理结果逆量化为int32数据类型;Inference inverse quantization module, the inference inverse quantization module is applied to each layer of the target model, for performing model inference through quantized fixed-point type weights and activation values, and inverse quantization of inference results into int32 data types;
最终量化模块,所述最终量化模块应用于目标模型的每一层,用于通过非对称量化的方式对每个算子进行量化,将浮点类型模型权重量化成int8数据类型,并将激活值量化成uint8数据类型,得到最终量化后模型。The final quantization module, which is applied to each layer of the target model, is used to quantize each operator by means of asymmetric quantization, quantizes the weight of the floating-point type model into an int8 data type, and converts the activation value Quantize to uint8 data type to get the final quantized model.
作为优选,对于目标模型,模型权重量化为int8类型,量化范围为[-128,127]。Preferably, for the target model, the model weight is quantized as int8 type, and the quantization range is [-128,127].
作为优选,激活值量化因子计算模块用于通过如下方法计算激活值量化因子:Preferably, the activation value quantization factor calculation module is used to calculate the activation value quantization factor by the following method:
获取测试集数据集,计算目标模型每层量化后输出以及不量化后输出的均方误差,通过最小化均方误差得到激活值量化因子。Obtain the test set data set, calculate the mean square error of the output of each layer of the target model after quantization and non-quantization, and obtain the quantization factor of the activation value by minimizing the mean square error.
作为优选,所述均方误差公式为:As preferably, the mean square error formula is:
Figure PCTCN2022105317-appb-000003
Figure PCTCN2022105317-appb-000003
其中,y i表示不量化输出,
Figure PCTCN2022105317-appb-000004
表示量化后输出。
Among them, y i represents the non-quantized output,
Figure PCTCN2022105317-appb-000004
Indicates the output after quantization.
第三方面,本发明的装置,包括:至少一个存储器和至少一个处理器;In a third aspect, the device of the present invention includes: at least one memory and at least one processor;
所述至少一个存储器,用于存储机器可读程序;said at least one memory for storing machine-readable programs;
所述至少一个处理器,用于调用所述机器可读程序,执行第一方面任一项任一所述的方法。The at least one processor is configured to invoke the machine-readable program to execute the method described in any one of the first aspects.
第四方面,本发明的计算机可读介质,所述计算机可读介质上存储有计算机指令,所述计算机指令在被处理器执行时,使所述处理器执行第一方面任一所述的方法。In the fourth aspect, the computer-readable medium of the present invention stores computer instructions on the computer-readable medium, and when the computer instructions are executed by a processor, the processor executes the method described in any one of the first aspect .
本发明的神经网络模型量化方法、系统、装置及计算机可读介质具有以下优点:在模型推理前提前计算激活值量化因子的方法,通过最小化均方误差计算得到每层的激活值量化因子,保证了模型的精度,同时通过提前计算量化因子,提高了模型的推理速度。The neural network model quantification method, system, device, and computer-readable medium of the present invention have the following advantages: the method for calculating the quantization factor of the activation value in advance before model reasoning, and calculating the quantization factor of the activation value of each layer by minimizing the mean square error, The accuracy of the model is guaranteed, and the reasoning speed of the model is improved by calculating the quantization factor in advance.
附图说明Description of drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following will briefly introduce the accompanying drawings that need to be used in the descriptions of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only of the present invention. For some embodiments, those skilled in the art can also obtain other drawings based on these drawings without creative effort.
下面结合附图对本发明进一步说明。The present invention will be further described below in conjunction with the accompanying drawings.
图1为实施例1神经网络模型量化方法的流程框图。Fig. 1 is a flow chart of the neural network model quantification method in Embodiment 1.
具体实施方式Detailed ways
下面结合附图和具体实施例对本发明作进一步说明,以使本领域的技术人员可以更好地理解本发明并能予以实施,但所举实施例不作为对本发明的限定,在不冲突的情况下,本发明实施例以及实施例中的技术特征可以相互结合。The present invention will be further described below in conjunction with accompanying drawing and specific embodiment, so that those skilled in the art can better understand the present invention and can be implemented, but the embodiment given is not as the limitation of the present invention, in the case of no conflict Next, the embodiments of the present invention and the technical features in the embodiments can be combined with each other.
本发明实施例提供神经网络模型量化方法、系统、装置及计算机可读介质, 用于解决如何避免在推理时计算激活值量化因子的技术问题。Embodiments of the present invention provide a neural network model quantization method, system, device, and computer-readable medium for solving the technical problem of how to avoid calculating activation value quantization factors during reasoning.
实施例1:Example 1:
本发明神经网络模型量化方法,在神经网络模型进行推理之前,通过最小化方程计算神经网络模型每层的激活值量化因子,该方法包括如下步骤:The neural network model quantification method of the present invention, before the neural network model is inferred, calculates the activation value quantification factor of each layer of the neural network model by minimizing the equation, and the method includes the following steps:
S100、构建神经网络模型,并对神经网络模型进行训练,得到浮点类型的神经网络模型作为目标模型;S100. Construct a neural network model, and train the neural network model to obtain a floating-point neural network model as a target model;
S200、对于目标模型,通过计算模型权重绝对值最大值,基于量化范围计算模型权重量化因子;S200. For the target model, calculate the model weight quantization factor based on the quantization range by calculating the maximum value of the absolute value of the model weight;
S300、对于目标模型的每一层,通过最小化均方误差,计算目标模型每层的激活值量化因子;S300. For each layer of the target model, calculate the activation value quantization factor of each layer of the target model by minimizing the mean square error;
S400、对于目标模型的每一层,通过量化后的定点类型的权重和激活值进行模型推理,并将推理结果逆量化为int32数据类型;S400. For each layer of the target model, perform model inference through the quantized weight and activation value of the fixed-point type, and dequantize the inference result into an int32 data type;
S500、对于目标模型的每一层,通过非对称量化的方式对每个算子进行量化,将浮点类型模型权重量化成int8数据类型,并将激活值量化成uint8数据类型,得到最终量化后模型。S500. For each layer of the target model, quantize each operator by means of asymmetric quantization, quantize the weight of the floating-point type model into an int8 data type, and quantize the activation value into a uint8 data type, and obtain the final quantization Model.
本实施例中,步骤S200中计算模型权重的量化因子,通过计算模型权重绝对值最大值,基于量化范围计算模型权重量化因子,模型权重量化成int8类型,因此量化范围为[-128,127]。In this embodiment, the quantization factor of the model weight is calculated in step S200. By calculating the maximum absolute value of the model weight, the quantization factor of the model weight is calculated based on the quantization range. The model weight is quantized into an int8 type, so the quantization range is [-128, 127].
步骤S300通过最小化均方误差计算得到每层的激活值量化因子,基于部分测试数据集,计算每层量化后输出以及不量化输出的均方误差,通过使得均方误差最小,得到激活值量化因子。均方误差公式为:Step S300 calculates the quantization factor of the activation value of each layer by minimizing the mean square error, calculates the mean square error of the quantized output and the non-quantized output of each layer based on a part of the test data set, and obtains the quantization of the activation value by minimizing the mean square error factor. The formula for mean square error is:
Figure PCTCN2022105317-appb-000005
Figure PCTCN2022105317-appb-000005
其中,y i表示不量化输出,
Figure PCTCN2022105317-appb-000006
表示量化后输出。
Among them, y i represents the non-quantized output,
Figure PCTCN2022105317-appb-000006
Indicates the output after quantization.
模型各层重复步骤S400和步骤S500计算,最终得到模型输出。Each layer of the model repeats step S400 and step S500 to calculate, and finally obtains the model output.
实施例2:Example 2:
本发明的神经网络模型量化系统,用于在神经网络模型进行推理之前,通过最小化方程计算神经网络模型每层的激活值量化因子。该系统包括构建训练模块、量化因子计算模块、激活值量化因子计算模块、推理逆量化模块以及最终量化模块,构建训练模块用于构建神经网络模型,并对神经网络模型进行训练,得到浮点类型的神经网络模型作为目标模型;量化因子计算模块应用于目标模型,用于通过计算模型权重绝对值最大值,基于量化范围计算模型权重量化因子;激活值量化因子计算模块应用于目标模型的每一层,用于通过最小化均方误差,计算目标模型每层的激活值量化因子;推理逆量化模块应用于目标模型的每一层,用于通过量化后的定点类型的权重和激活值进行模型推理,并将推理结果逆量化为int32数据类型;最终量化模块应用于目标模型的每一层,用于通过非对称量化的方式对每个算子进行量化,将浮点类型模型权重量化成int8数据类型,并将激活值量化成uint8数据类型,得到最终量化后模型。The neural network model quantification system of the present invention is used to calculate the activation value quantification factor of each layer of the neural network model by minimizing the equation before the neural network model performs inference. The system includes a training module, a quantization factor calculation module, an activation value quantization factor calculation module, an inference inverse quantization module, and a final quantization module. The training module is used to build a neural network model, and the neural network model is trained to obtain floating point types. The neural network model of the target model is used as the target model; the quantization factor calculation module is applied to the target model, and is used to calculate the model weight quantization factor based on the quantization range by calculating the maximum absolute value of the model weight; the activation value quantization factor calculation module is applied to each of the target model Layer, which is used to calculate the activation value quantization factor of each layer of the target model by minimizing the mean square error; the inference inverse quantization module is applied to each layer of the target model, and is used to perform model quantization by weights and activation values of the fixed-point type Inference, and dequantize the inference result into int32 data type; the final quantization module is applied to each layer of the target model to quantize each operator through asymmetric quantization, and quantize the weight of the floating-point type model into int8 data type, and quantize the activation value into a uint8 data type to obtain the final quantized model.
对于目标模型,模型权重量化为int8类型,量化范围为[-128,127]。For the target model, the model weights are quantized to int8 type, and the quantization range is [-128,127].
激活值量化因子计算模块用于通过如下方法计算激活值量化因子:获取测试集数据集,计算目标模型每层量化后输出以及不量化后输出的均方误差,通过最小化均方误差得到激活值量化因子。均方误差公式为:The activation value quantization factor calculation module is used to calculate the activation value quantization factor by the following method: obtain the test set data set, calculate the mean square error of the quantized output of each layer of the target model and the output without quantization, and obtain the activation value by minimizing the mean square error quantization factor. The formula for mean square error is:
Figure PCTCN2022105317-appb-000007
Figure PCTCN2022105317-appb-000007
其中,y i表示不量化输出,
Figure PCTCN2022105317-appb-000008
表示量化后输出。
Among them, y i represents the non-quantized output,
Figure PCTCN2022105317-appb-000008
Indicates the output after quantization.
实施例3:Example 3:
本发明的装置,包括:至少一个存储器和至少一个处理器;至少一个存 储器,用于存储机器可读程序;至少一个处理器,用于调用所述机器可读程序,执行本发明实施例1公开的方法。The device of the present invention includes: at least one memory and at least one processor; at least one memory for storing a machine-readable program; at least one processor for calling the machine-readable program to execute the disclosure disclosed in Embodiment 1 of the present invention Methods.
实施例4:Example 4:
本发明实施例还提供了一种计算机可读介质,:计算机可读介质上存储有计算机指令,计算机指令在被处理器执行时,使所述处理器执行本发明实施例1公开的方法。具体地,可以提供配有存储介质的系统或者装置,在该存储介质上存储着实现上述实施例中任一实施例的功能的软件程序代码,且使该系统或者装置的计算机(或CPU或MPU)读出并执行存储在存储介质中的程序代码。An embodiment of the present invention also provides a computer-readable medium, wherein computer instructions are stored on the computer-readable medium, and when the computer instructions are executed by a processor, the processor executes the method disclosed in Embodiment 1 of the present invention. Specifically, a system or device equipped with a storage medium may be provided, on which a software program code for realizing the functions of any of the above embodiments is stored, and the computer (or CPU or MPU of the system or device) ) to read and execute the program code stored in the storage medium.
在这种情况下,从存储介质读取的程序代码本身可实现上述实施例中任何一项实施例的功能,因此程序代码和存储程序代码的存储介质构成了本发明的一部分。In this case, the program code itself read from the storage medium can realize the function of any one of the above-mentioned embodiments, so the program code and the storage medium storing the program code constitute a part of the present invention.
用于提供程序代码的存储介质实施例包括软盘、硬盘、磁光盘、光盘(如CD-ROM、CD-R、CD-RW、DVD-ROM、DVD-RAM、DVD-RW、DVD+RW)、磁带、非易失性存储卡和ROM。可选择地,可以由通信网络从服务器计算机上下载程序代码。Examples of storage media for providing program code include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), Tape, non-volatile memory card, and ROM. Alternatively, the program code can be downloaded from a server computer via a communication network.
此外,应该清楚的是,不仅可以通过执行计算机所读出的程序代码,而且可以通过基于程序代码的指令使计算机上操作的操作系统等来完成部分或者全部的实际操作,从而实现上述实施例中任意一项实施例的功能。In addition, it should be clear that not only by executing the program code read by the computer, but also by making the operating system on the computer complete part or all of the actual operations through instructions based on the program code, so as to realize the function of any one of the embodiments.
此外,可以理解的是,将由存储介质读出的程序代码写到插入计算机内的扩展板中所设置的存储器中或者写到与计算机相连接的扩展单元中设置的存储器中,随后基于程序代码的指令使安装在扩展板或者扩展单元上的CPU等来执行部分和全部实际操作,从而实现上述实施例中任一实施例的功能。In addition, it can be understood that the program code read from the storage medium is written into the memory provided in the expansion board inserted into the computer or written into the memory provided in the expansion unit connected to the computer, and then based on the program code The instruction causes the CPU installed on the expansion board or the expansion unit to perform some or all of the actual operations, so as to realize the functions of any one of the above-mentioned embodiments.
需要说明的是,上述各流程和各系统结构图中不是所有的步骤和模块都是 必须的,可以根据实际的需要忽略某些步骤或模块。各步骤的执行顺序不是固定的,可以根据需要进行调整。上述各实施例中描述的系统结构可以是物理结构,也可以是逻辑结构,即,有些模块可能由同一物理实体实现,或者,有些模块可能分由多个物理实体实现,或者,可以由多个独立设备中的某些部件共同实现。It should be noted that not all steps and modules in the above-mentioned processes and system structure diagrams are necessary, and some steps or modules can be ignored according to actual needs. The execution order of each step is not fixed and can be adjusted as required. The system structures described in the above embodiments may be physical structures or logical structures, that is, some modules may be realized by the same physical entity, or some modules may be realized by multiple physical entities, or may be realized by multiple Certain components in individual devices are implemented together.
以上各实施例中,硬件单元可以通过机械方式或电气方式实现。例如,一个硬件单元可以包括永久性专用的电路或逻辑(如专门的处理器,FPGA或ASIC)来完成相应操作。硬件单元还可以包括可编程逻辑或电路(如通用处理器或其它可编程处理器),可以由软件进行临时的设置以完成相应操作。具体的实现方式(机械方式、或专用的永久性电路、或者临时设置的电路)可以基于成本和时间上的考虑来确定。In the above embodiments, the hardware unit may be implemented mechanically or electrically. For example, a hardware unit may include permanently dedicated circuitry or logic (such as a dedicated processor, FPGA or ASIC) to perform the corresponding operations. The hardware unit may also include programmable logic or circuits (such as general-purpose processors or other programmable processors), which can be temporarily set by software to complete corresponding operations. The specific implementation (mechanical way, or a dedicated permanent circuit, or a temporary circuit) can be determined based on cost and time considerations.
上文通过附图和优选实施例对本发明进行了详细展示和说明,然而本发明不限于这些已揭示的实施例,基与上述多个实施例本领域技术人员可以知晓,可以组合上述不同实施例中的手段得到本发明更多的实施例,这些实施例也在本发明的保护范围之内。The present invention has been shown and described in detail through the accompanying drawings and preferred embodiments above, but the present invention is not limited to these disclosed embodiments, and those skilled in the art based on the above-mentioned multiple embodiments can know that the above-mentioned different embodiments can be combined The means in get more embodiments of the present invention, and these embodiments are also within the protection scope of the present invention.

Claims (10)

  1. 神经网络模型量化方法,其特征在于在神经网络模型进行推理之前,通过最小化方程计算神经网络模型每层的激活值量化因子,所述方法包括如下步骤:The neural network model quantification method is characterized in that before the neural network model is inferred, the activation value quantization factor of each layer of the neural network model is calculated by the minimization equation, and the method includes the following steps:
    构建神经网络模型,并对神经网络模型进行训练,得到浮点类型的神经网络模型作为目标模型;Construct a neural network model and train the neural network model to obtain a floating-point neural network model as the target model;
    对于目标模型,通过计算模型权重绝对值最大值,基于量化范围计算模型权重量化因子;For the target model, calculate the model weight quantization factor based on the quantization range by calculating the maximum value of the absolute value of the model weight;
    对于目标模型的每一层,通过最小化均方误差,计算目标模型每层的激活值量化因子;For each layer of the target model, calculate the activation value quantization factor of each layer of the target model by minimizing the mean square error;
    对于目标模型的每一层,通过量化后的定点类型的权重和激活值进行模型推理,并将推理结果逆量化为int32数据类型;For each layer of the target model, perform model inference through quantized weights and activation values of the fixed-point type, and dequantize the inference result into an int32 data type;
    对于目标模型的每一层,通过非对称量化的方式对每个算子进行量化,将浮点类型模型权重量化成int8数据类型,并将激活值量化成uint8数据类型,得到最终量化后模型。For each layer of the target model, each operator is quantized by asymmetric quantization, the weight of the floating-point model is quantized into an int8 data type, and the activation value is quantized into a uint8 data type to obtain the final quantized model.
  2. 根据权利要求1所述的神经网络模型量化方法,其特征在于对于目标模型,模型权重量化为int8类型,量化范围为[-128,127]。The neural network model quantification method according to claim 1, characterized in that for the target model, the model weight quantization is int8 type, and the quantization range is [-128,127].
  3. 根据权利要求1或2任一项所述的神经网络模型量化方法,其特征在于获取测试数据集,基于测试数据集计算目标模型每层量化后输出以及不量化后输出的均方误差,通过最小化均方误差得到激活值量化因子。According to the neural network model quantification method described in any one of claim 1 or 2, it is characterized in that obtaining a test data set, based on the test data set, calculating the mean square error of the quantized output of each layer of the target model and the output without quantization, through the minimum The activation value quantization factor is obtained by normalizing the mean square error.
  4. 根据权利要求3所述的神经网络模型量化方法,其特征在于所述均方误差公式为:neural network model quantification method according to claim 3, is characterized in that described mean square error formula is:
    Figure PCTCN2022105317-appb-100001
    Figure PCTCN2022105317-appb-100001
    其中,y i表示不量化输出,
    Figure PCTCN2022105317-appb-100002
    表示量化后输出。
    Among them, y i represents the non-quantized output,
    Figure PCTCN2022105317-appb-100002
    Indicates the output after quantization.
  5. 神经网络模型量化系统,其特征在于用于在神经网络模型进行推理之前,通过最小化方程计算神经网络模型每层的激活值量化因子,所述系统包括:The neural network model quantization system is characterized in that it is used to calculate the activation value quantization factor of each layer of the neural network model by minimizing the equation before the neural network model performs inference, and the system includes:
    构建训练模块,所述构建训练模块用于构建神经网络模型,并对神经网络模型进行训练,得到浮点类型的神经网络模型作为目标模型;Construct a training module, which is used to construct a neural network model, and train the neural network model to obtain a floating-point type neural network model as the target model;
    量化因子计算模块,所述量化因子计算模块应用于目标模型,用于通过计算模型权重绝对值最大值,基于量化范围计算模型权重量化因子;A quantization factor calculation module, the quantization factor calculation module is applied to the target model, and is used to calculate the model weight quantization factor based on the quantization range by calculating the maximum absolute value of the model weight;
    激活值量化因子计算模块,所述激活值量化因子计算模块应用于目标模型的每一层,用于通过最小化均方误差,计算目标模型每层的激活值量化因子;An activation value quantization factor calculation module, the activation value quantization factor calculation module is applied to each layer of the target model, and is used to calculate the activation value quantization factor of each layer of the target model by minimizing the mean square error;
    推理逆量化模块,所述推理逆量化模块应用于目标模型的每一层,用于通过量化后的定点类型的权重和激活值进行模型推理,并将推理结果逆量化为int32数据类型;Inference inverse quantization module, the inference inverse quantization module is applied to each layer of the target model, for performing model inference through quantized fixed-point type weights and activation values, and inverse quantization of inference results into int32 data types;
    最终量化模块,所述最终量化模块应用于目标模型的每一层,用于通过非对称量化的方式对每个算子进行量化,将浮点类型模型权重量化成int8数据类型,并将激活值量化成uint8数据类型,得到最终量化后模型。The final quantization module, which is applied to each layer of the target model, is used to quantize each operator by means of asymmetric quantization, quantizes the weight of the floating-point type model into an int8 data type, and converts the activation value Quantize to uint8 data type to get the final quantized model.
  6. 根据权利要求5所述的神经网络模型量化系统,其特征在于对于目标模型,模型权重量化为int8类型,量化范围为[-128,127]。The neural network model quantization system according to claim 5, characterized in that for the target model, the model weight is quantized as int8 type, and the quantization range is [-128,127].
  7. 根据权利要求5或6任一项所述的神经网络模型量化系统,其特征在于激活值量化因子计算模块用于通过如下方法计算激活值量化因子:According to the neural network model quantification system described in any one of claim 5 or 6, it is characterized in that the activation value quantization factor calculation module is used to calculate the activation value quantization factor by the following method:
    获取测试集数据集,计算目标模型每层量化后输出以及不量化后输出的均方误差,通过最小化均方误差得到激活值量化因子。Obtain the test set data set, calculate the mean square error of the output of each layer of the target model after quantization and non-quantization, and obtain the quantization factor of the activation value by minimizing the mean square error.
  8. 根据权利要求7所述的神经网络模型量化系统,其特征在于所述均方 误差公式为:Neural network model quantification system according to claim 7, is characterized in that described mean square error formula is:
    Figure PCTCN2022105317-appb-100003
    Figure PCTCN2022105317-appb-100003
    其中,y i表示不量化输出,
    Figure PCTCN2022105317-appb-100004
    表示量化后输出。
    Among them, y i represents the non-quantized output,
    Figure PCTCN2022105317-appb-100004
    Indicates the output after quantization.
  9. 装置,其特征在于,包括:至少一个存储器和至少一个处理器;The device is characterized by comprising: at least one memory and at least one processor;
    所述至少一个存储器,用于存储机器可读程序;said at least one memory for storing machine-readable programs;
    所述至少一个处理器,用于调用所述机器可读程序,执行权利要求1至4中任一所述的方法。The at least one processor is configured to call the machine-readable program to execute the method of any one of claims 1-4.
  10. 计算机可读介质,其特征在于,所述计算机可读介质上存储有计算机指令,所述计算机指令在被处理器执行时,使所述处理器执行权利要求1至4任一所述的方法。A computer-readable medium, wherein computer instructions are stored on the computer-readable medium, and when executed by a processor, the computer instruction causes the processor to execute the method according to any one of claims 1 to 4.
PCT/CN2022/105317 2021-10-13 2022-07-13 Neural network model quantification method, system and device, and computer-readable medium WO2023060959A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111190372.1A CN114021691A (en) 2021-10-13 2021-10-13 Neural network model quantification method, system, device and computer readable medium
CN202111190372.1 2021-10-13

Publications (1)

Publication Number Publication Date
WO2023060959A1 true WO2023060959A1 (en) 2023-04-20

Family

ID=80056227

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/105317 WO2023060959A1 (en) 2021-10-13 2022-07-13 Neural network model quantification method, system and device, and computer-readable medium

Country Status (2)

Country Link
CN (1) CN114021691A (en)
WO (1) WO2023060959A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114021691A (en) * 2021-10-13 2022-02-08 山东浪潮科学研究院有限公司 Neural network model quantification method, system, device and computer readable medium
CN114492778A (en) * 2022-02-16 2022-05-13 安谋科技(中国)有限公司 Operation method of neural network model, readable medium and electronic device
CN114821660A (en) * 2022-05-12 2022-07-29 山东浪潮科学研究院有限公司 Pedestrian detection inference method based on embedded equipment
CN115357381A (en) * 2022-08-11 2022-11-18 山东浪潮科学研究院有限公司 Memory optimization method and system for deep learning inference of embedded equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814955A (en) * 2020-06-19 2020-10-23 浙江大华技术股份有限公司 Method and apparatus for quantizing neural network model, and computer storage medium
CN111950716A (en) * 2020-08-25 2020-11-17 云知声智能科技股份有限公司 Quantification method and system for optimizing int8
CN111950715A (en) * 2020-08-24 2020-11-17 云知声智能科技股份有限公司 8-bit integer full-quantization inference method and device based on self-adaptive dynamic shift
US20210004679A1 (en) * 2019-07-01 2021-01-07 Baidu Usa Llc Asymmetric quantization for compression and for acceleration of inference for neural networks
CN112766484A (en) * 2020-12-30 2021-05-07 上海熠知电子科技有限公司 Floating point neural network model quantization system and method
CN114021691A (en) * 2021-10-13 2022-02-08 山东浪潮科学研究院有限公司 Neural network model quantification method, system, device and computer readable medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210004679A1 (en) * 2019-07-01 2021-01-07 Baidu Usa Llc Asymmetric quantization for compression and for acceleration of inference for neural networks
CN111814955A (en) * 2020-06-19 2020-10-23 浙江大华技术股份有限公司 Method and apparatus for quantizing neural network model, and computer storage medium
CN111950715A (en) * 2020-08-24 2020-11-17 云知声智能科技股份有限公司 8-bit integer full-quantization inference method and device based on self-adaptive dynamic shift
CN111950716A (en) * 2020-08-25 2020-11-17 云知声智能科技股份有限公司 Quantification method and system for optimizing int8
CN112766484A (en) * 2020-12-30 2021-05-07 上海熠知电子科技有限公司 Floating point neural network model quantization system and method
CN114021691A (en) * 2021-10-13 2022-02-08 山东浪潮科学研究院有限公司 Neural network model quantification method, system, device and computer readable medium

Also Published As

Publication number Publication date
CN114021691A (en) 2022-02-08

Similar Documents

Publication Publication Date Title
WO2023060959A1 (en) Neural network model quantification method, system and device, and computer-readable medium
US11580719B2 (en) Dynamic quantization for deep neural network inference system and method
CN110210621B (en) Improved target detection method based on residual error network
CN110379416A (en) A kind of neural network language model training method, device, equipment and storage medium
CN110609474B (en) Data center energy efficiency optimization method based on reinforcement learning
JP2003076585A5 (en)
CN111078853B (en) Question-answering model optimization method, device, computer equipment and storage medium
CN110795235A (en) Method and system for deep learning and cooperation of mobile web
CN115567405A (en) Network flow gray prediction method based on self-adaptive feedback regulation mechanism
WO2022246986A1 (en) Data processing method, apparatus and device, and computer-readable storage medium
JP2013074365A (en) Method, program and system for processing kalman filter
CN112633516B (en) Performance prediction and machine learning compiling optimization method and device
CN112052945A (en) Neural network training method, neural network training device and electronic equipment
CN106971023B (en) Wheel disc special-shaped hole structure design method based on hyperelliptic curve
CN113673532B (en) Target detection method and device based on quantitative model
CN113709249B (en) Safe balanced unloading method and system for driving assisting service
CN108520299A (en) Activation value quantization method and device between grade
TWI763975B (en) System and method for reducing computational complexity of artificial neural network
CN113284102A (en) Fan blade damage intelligent detection method and device based on unmanned aerial vehicle
Puangpontip et al. Energy usage of deep learning in smart cities
CN110399121A (en) Private clound management system health degree design method, equipment and medium based on piecewise function
CN113991638B (en) Prediction method for generating power of new energy station aiming at different places
CN117931211A (en) Model deployment method, device, apparatus, chip and storage medium
CN111105019A (en) Neural network operation device and operation method
CN117648163A (en) Application migration CPU estimation method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22879906

Country of ref document: EP

Kind code of ref document: A1