CN113850374A - Neural network model quantization method, electronic device, and medium - Google Patents

Neural network model quantization method, electronic device, and medium Download PDF

Info

Publication number
CN113850374A
CN113850374A CN202111196316.9A CN202111196316A CN113850374A CN 113850374 A CN113850374 A CN 113850374A CN 202111196316 A CN202111196316 A CN 202111196316A CN 113850374 A CN113850374 A CN 113850374A
Authority
CN
China
Prior art keywords
quantization
neural network
network model
quantized
value range
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111196316.9A
Other languages
Chinese (zh)
Inventor
许礼武
余宗桥
韩冥生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Technology China Co Ltd
Original Assignee
ARM Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARM Technology China Co Ltd filed Critical ARM Technology China Co Ltd
Priority to CN202111196316.9A priority Critical patent/CN113850374A/en
Publication of CN113850374A publication Critical patent/CN113850374A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the technical field of artificial intelligence, in particular to a quantification method of a neural network model, electronic equipment and a medium. In the method for quantizing a neural network model according to the present application, positive and negative values of weight parameters of each layer of the neural network model are taken as consideration factors, and then shrinkage factor | scale of the weight parameters equal to or greater than zero is calculated(Z‑X)| and a shrinkage factor | scale of the weight parameter less than zero(Z′‑Y)Then calculate | scale(Z‑X)| and | scale(Z′‑Y)And | obtaining the final shrinkage factor by weighted sum. By the method, the weight parameters of the neural network model before and after quantization are closer, so that the storage space and the computing resources occupied by the weight parameters of the neural network model are reduced, and the accuracy of the quantized neural network model is ensured to be closer to the accuracy of the neural network model before quantization.

Description

Neural network model quantization method, electronic device, and medium
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a quantification method of a neural network model, an image recognition method, electronic equipment and a medium.
Background
The neural network model has hundreds or even tens of millions of parameters after training, such as weight parameters and bias parameters in each layer network layer, and the parameters are stored based on 32 bits of floating point type or higher. Because the data size of these parameters is huge, a large amount of memory space and computing resources are consumed in the whole convolution calculation process. The quantization aims to compress the size of the neural network model, and the neural network model is conveniently deployed on terminals such as mobile phones and the like with limited computing resources.
The current practice is to map the parameters of the neural network model into a value range smaller than 32 bits to reduce the storage space occupied by the neural network model and the consumed computing resources, but because the value range that the 32 bits can represent is larger than the value range that the 32 bits can represent, the weight parameters or bias parameters of the neural network model are defective in the quantization process, which may cause the accuracy of the neural network model to be reduced and affect the user experience.
Disclosure of Invention
In view of the above, the present application provides a quantization method of a neural network model, an image recognition method, an electronic device and a medium, which can solve the above technical problems. The quantization method of the present application is described below.
In a first aspect, the present application provides a method for quantifying a neural network model, the method comprising: determining a first quantization parameter and a second quantization parameter required for quantizing the data to be quantized to a target value range according to the quantization value range of the data to be quantized and the value range of the target value range, wherein the first quantization parameter corresponds to the data to be quantized which is larger than 0 in the quantization value range, and the second quantization parameter corresponds to the data to be quantized which is smaller than 0 in the quantization value range; determining a third quantization parameter of the neural network model according to the first quantization parameter and the second quantization parameter; and quantizing the data to be quantized to a target value range according to the third quantization parameter.
When the method is used for quantifying the data to be quantified in the neural network model, the positive and negative values of the data to be quantified in the neural network model are considered, and the final quantization parameter (namely, the third quantization parameter which corresponds to the final shrinkage factor of the neural network in the following specific embodiment) of the neural network model is determined by using the quantization parameter corresponding to the positive value and the quantization parameter corresponding to the negative value in the data to be quantified, so that the quantization scale of the neural network model is more reasonable, and the precision loss of the quantized neural network model is smaller. The precision loss of the neural network model may be measured by a similarity between a final result of floating point deduction of the neural network model and a model quantized model result, and it may be understood that the higher the similarity between a final prediction result of floating point deduction of the neural network model and the model quantized result is, the higher the quantization precision of the neural network model is, and in the specific implementation, the difference between the floating point deduction result of a specific model and the quantized deduction result is presented, taking the image recognition neural network model as an example, if the recognition rate obtained by running the deduction network at the floating point is 80%, the recognition rate of the quantized running deduction model is 79% or higher, that is, the closer to 80%, the better.
Further, in a possible implementation manner, the data to be quantized may include any one or more of the following: a weight parameter and a bias parameter in any one or more network layers of a plurality of network layers forming the neural network model; or input data for any one or more network layers. The weighting parameter represents an actual value associated with each element, and indicates the importance of the element in the predicted final value, and the weighting parameter represents the importance of a certain pixel of the image in the image recognition result, taking the image recognition network model as an example. In a possible implementation manner, the data to be quantized may further include activation functions of each network layer in the neural network model, and it should be understood that the type and type of the data to be quantized are not limited in this application.
With reference to the first aspect and possible implementations of the first aspect, in another possible implementation of the first aspect, the input data includes image data. The image data may be original image data input into the neural network model, or may be feature image data obtained by performing convolution operation on the original image data in a certain layer of network layer. In a possible implementation manner, taking a neural network model as an image recognition network model, for example, where the image recognition network model includes 100 network layers, the image data may be raw image data input to a first network layer in the image recognition network model, or may be feature map data of the raw image data after being subjected to convolution operation of any one or more (for example, 10 th layer or 20 th to 30 th layer) network layers in the image recognition network model.
In a possible implementation manner, the quantization value range of the data to be quantized refers to a numerical range determined by values of each data in the data to be quantized. For example, if the data to be quantized is [ -22.00, -19.00, -17.00, 10.00, 20.00, 30.00], the range of the data to be quantized is greater than or equal to-22.00 and less than or equal to 30.00.
It should be understood that, in other possible implementations, the data to be quantized of the neural network model may have special data that is significantly higher and/or lower than other data to be quantized, in this case, the special data is still quantized using the quantization value range determined according to the other data to be quantized and the quantization parameter determined by the target value range, and when the quantization result of the special data is larger than the value range of the target value range, the special data is truncated, that is, the maximum value or the minimum value of the target value range is directly used as the quantization result of the special data. For example, if the data to be quantized is [ -600.00, -22.00, -19.00, -17.00, 10.00, 20.00, 30.00, 200.00], and the target value range is the value range [ -128, 127] represented by 8 bits, the value range of the data to be quantized is greater than or equal to-22.00 and less than or equal to 30.00, and if the result of quantization by the-600.00 according to the value range and the value range of the target value range exceeds [ -128, 127], the quantization result of-600.00 is truncated, that is, the minimum value in the target value range, which is the same as the sign of-600.00, is directly used as the quantization result of-600.00, that is, the final quantization result of-600.00 is-128. If the result of the 200.00 quantization according to the value range and the value range of the target value range does not exceed [ -128, 127], then the quantization result of 200.00 does not need to be cut off. The method for determining whether the data to be quantized has special data that is significantly higher or lower than other data to be quantized may be to determine whether a certain data to be quantized is significantly higher or lower than other data to be quantized by using a difference between an average value of the data to be quantized and each data to be quantized, and it should be understood that this mode is not limited in this application.
It is understood that the target value range refers to a value range to which data to be quantized needs to be quantized, and in a possible implementation, the target value range may be a value range [ -128, 127] that can be represented by 8-bit integer, and the target value range may also be a value range that can be represented by other data types. It should be understood that the determination of the target value range is related to storage resources, calculation resources, and the like of the electronic device suitable for the neural network model, and in a possible implementation manner, when the storage resources and the calculation resources of the electronic device are large, the value range of the target value range may also be large, so as to reduce the influence of quantization on the accuracy of the neural network model as much as possible, and when the storage resources and the calculation resources of the electronic device are small, the value range of the target value range may also be small, so as to facilitate the transplantation of the neural network model. The specific determination method of the target value range will be described below, and will not be described herein again.
In a possible implementation manner, the first quantization parameter is a first shrinkage factor in the following specific embodiment, the first quantization parameter corresponds to data to be quantized whose value is greater than or equal to 0 in the data to be quantized, and the second quantization parameter is a second shrinkage factor in the following specific embodiment, and the second quantization parameter corresponds to data to be quantized whose value is less than or equal to 0 in the data to be quantized. With reference to the first aspect and possible implementations of the first aspect, in another possible implementation of the first aspect, the data to be quantized in the quantization value range corresponding to the first quantization parameter is equal to 0. With reference to the first aspect and possible implementations of the first aspect, in another possible implementation of the first aspect, the data to be quantized in the quantization value range corresponding to the second quantization parameter is equal to 0. That is, in one possible implementation manner, the data to be quantized corresponding to the first quantization parameter may be data greater than or equal to 0 (or a positive number and 0), and the data to be quantized corresponding to the second quantization parameter may be data less than or equal to 0 (or a negative number and 0).
With reference to the first aspect and possible implementation manners of the first aspect, in another possible implementation manner of the first aspect, the method for determining the first quantization parameter required for quantizing the data to be quantized to the target value range according to the quantization value range of the data to be quantized and the value range of the target value range includes: and determining a first maximum value in the data to be quantized, which is greater than or equal to 0, in the quantization value range, wherein the ratio of the first maximum value to the maximum value in the positive value interval is a first quantization parameter. In a possible implementation manner, the range of the target value range includes a positive value interval and a negative value interval, and then the first quantization parameter is a ratio of a first maximum value in the range of the data to be quantized to a maximum value in the positive value interval. For example, if the range of the target value range is [ -128, 127], the positive value interval of the target value range is [0, 127], the negative value interval is [ -128, 0], and meanwhile, if the range of the data to be quantized is [ -22.00, 32.00], the first maximum value greater than or equal to 0 in the data to be quantized is 32.00, and the first quantization parameter is 32.00/127.
With reference to the first aspect and possible implementation manners of the first aspect, in another possible implementation manner of the first aspect, the method for determining the second quantization parameter required for quantizing the data to be quantized to the target value range according to the quantization value range of the data to be quantized and the value range of the target value range includes: and determining a first minimum value in the data to be quantized, which is smaller than 0 in the quantization value range, wherein the ratio of the first minimum value to the minimum value in the negative value interval is a second quantization parameter. Consistent with the calculation method of the first quantization parameter, taking the range of the target value range of [ -128, 127] and the range of the data to be quantized of [ -22.00, 32.00] as examples, the positive value interval of the target value range is [0, 127], the negative value interval is [ -128, 0], the first minimum value of the data to be quantized, which is less than or equal to 0, is-22.00/-128, and the second quantization parameter is-22.00/-128.
With reference to the first aspect and possible implementations of the first aspect, in another possible implementation of the first aspect, the method for determining a third quantization parameter of a neural network model according to the first quantization parameter and the second quantization parameter includes: according to a preset amount, gradually increasing a first weight value of a first quantization parameter from 0, and calculating the sum of a first quantization parameter corresponding to each first weight value and a second quantization parameter corresponding to each second weight value to obtain a plurality of third quantization parameters, wherein the sum of the first weight value and the second weight value is 1; and quantizing the data to be quantized to the target value range according to the third quantization parameter comprises: quantizing the data to be quantized to a target value range according to the third quantization parameters to obtain a plurality of quantization results of the neural network model; and determining a first quantization result with the similarity to the data to be quantized being greater than a preset similarity from the plurality of quantization results, and taking the first quantization result as the quantization result of the neural network model. The first weight value corresponds to the weight α in the following specific embodiment, and the inventor obtains that the first weight value is in a range of greater than or equal to 0 and less than or equal to 1 through a large number of experiments and tests, and the quantization result of the neural network model is better, so that the present application describes the quantization method of the present application by taking the range of the weight α as an example, where the range is greater than or equal to 0 and less than or equal to 1. It should be understood that the principles and concepts of the quantization method of the present application are equally applicable to the quantization of neural network models when the value range of the weight α is other values.
With reference to the first aspect and possible implementations of the first aspect, in another possible implementation of the first aspect, the determining a third quantization parameter of the neural network model according to the first quantization parameter and the second quantization parameter includes: determining a value range of a first weight value or a second weight value according to the relationship between the attribute parameters of the neural network model and the first quantization parameter and the second quantization parameter, wherein the first weight value corresponds to the first quantization parameter, and the second weight value corresponds to the second quantization parameter; within the value range of the first weight value, the first weight value is gradually increased according to a preset amount, and the weighted sum of a first quantization parameter corresponding to each first weight value and a second quantization parameter corresponding to each second weight value is calculated to obtain a plurality of third quantization parameters, wherein the sum of the first weight value and the second weight value is 1; and quantizing the data to be quantized to the target value range according to the third quantization parameter comprises: quantizing the data to be quantized to a target value range according to the third quantization parameters to obtain a plurality of quantization results of the neural network model; and determining a first quantization result with the similarity to the data to be quantized being greater than a preset similarity from the plurality of quantization results, and taking the first quantization result as the quantization result of the neural network model. It can be understood that, the calculation amount required by the method for determining the third quantization parameter directly from 0 in a manner of gradually increasing the first weight value by a preset amount is relatively large, and in order to further improve the quantization efficiency of the neural network model, in a possible implementation manner, the value range of the first weight value may be further defined according to the relationship between the attribute parameter of the neural network model and the first quantization parameter and the second quantization parameter. The attribute parameters of the neural network model refer to parameters which are specially owned by different neural network models, and values of the attribute parameters are not limited in the application. For example, for the image recognition neural network model, the attribute parameters include a first attribute value β and a second attribute value γ, and after a lot of experiments, for the image recognition neural network model, when the first attribute value β is 0.5 and the second attribute value γ is 0.095, according to the relationship between the first quantization parameter and the second quantization parameter, and the first attribute value β and the second attribute value γ, the value range of the first weight value in the image recognition neural network model can be further specified, and in the value range, the quantization result of the neural network model is the best, that is, the accuracy of the quantized neural network model is high.
With reference to the first aspect and possible implementations of the first aspect, in another possible implementation of the first aspect, a value range of the preset amount is greater than 0 and less than 1. It should be understood that the value range of the preset quantity is related to the value of the above weight value, and if the value range of the weight value is greater than or equal to 0 and less than or equal to 1, the value of the preset quantity naturally cannot exceed the value range of the weight value, that is, the value range of the preset quantity is also greater than 0 and less than 1, and if the value range of the weight value is greater than or equal to 0 and less than or equal to 10, the value of the preset quantity may be greater than 0 and less than 10. It should be noted that the present application is not limited thereto.
In a second aspect, the present application provides an image recognition method, including: acquiring an image to be identified; and identifying the image to be identified by adopting a neural network model, wherein the neural network model adopts the quantification method of any one of the first aspect. It is to be understood that, in one possible implementation manner, the method for quantifying a neural network model according to any one of the possible implementation manners of the first aspect may be applied to a neural network model including, but not limited to, an image recognition neural network model, an object detection neural network model, a speech segmentation neural network model, and the like. Taking the image recognition neural network model as an example, after the image recognition neural network model is quantized by using the quantization method in any possible implementation manner of the first aspect, the scale of the image recognition neural network model is reduced, that is, both the storage space occupied by the image recognition neural network model and the computing resources consumed during image recognition are reduced, and meanwhile, as mentioned above, the recognition accuracy of the image recognition neural network model is not greatly reduced, so that the quantized image recognition neural network model can be transplanted to an electronic device such as a mobile phone with limited computing resources or storage resources, so that a user can realize a responsive image recognition function on the electronic device such as the mobile phone, for example, face recognition authentication, and the like, thereby improving the user experience. The image to be recognized may be a facial image of a user, a fingerprint image of the user, a two-dimensional image, or a three-dimensional image (e.g., a depth image, etc.), and it should be understood that the type and form of the image to be recognized are not limited in any way in the present application.
In a third aspect, the present application provides an electronic device, comprising:
a memory storing instructions;
a processor, coupled to the memory, that when executed by the processor causes the electronic device to implement any of the possible implementations of the first and second aspects.
In a fourth aspect, the present application provides a computer-readable storage medium having instructions stored therein, wherein the instructions, when executed on the computer-readable storage medium, cause the computer-readable storage medium to implement any of the possible implementations of the first and second aspects.
In a fifth aspect, the present application provides a computer program product for causing a computer to perform the method of any of the above aspects and any of its possible embodiments, when the computer program product is run on a computer.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a diagram of an example neural network model system provided in some embodiments;
FIG. 2(A) is a schematic diagram of an example of the location of a zero point in symmetric quantization according to some embodiments;
FIG. 2(B) is a schematic diagram of an example of zero point location in asymmetric quantization according to some embodiments;
FIG. 3 is a schematic diagram of input data for an object detection neural network model provided by some embodiments;
fig. 4 is a flowchart illustrating an example quantization method according to some embodiments.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Various embodiments of the present application are described below with reference to the drawings.
Fig. 1 is a schematic block diagram of a neural network model system 100 provided in some embodiments, and as shown in fig. 1, the neural network model system 100 includes a quantization module 110 and a convolution module 120.
The quantization module 110 is configured to quantize input data of an i-layer network layer of the neural network model, where i is a positive integer. In some embodiments, the input data may comprise an input image of a layer i convolution calculation of a neural network model, for example if the neural network model is an image recognition network model, the input data may comprise an image input to the layer i of the neural network model. In other embodiments, the input data may include a weight parameter for a layer i network layer of the neural network model, wherein the weight parameter represents an actual value associated with each element indicating the importance of the element in predicting the final value, continuing with the image recognition network model as an example, the weight parameter represents the importance of a certain pixel of the image in the image recognition result. In other embodiments, the input data may further include bias parameters, activation functions, and the like of the i-layer network layer, which is not limited in this application.
Wherein, quantization refers to a process of mapping a set of numbers in an original value range to another target value range through a mathematical transformation. For example, assuming that the original range is floating-point 32-bit represented [ -22.00, 32.00], the target range is integer 8-bit represented [ -128, 127], and the quantization is to map [ -22.00, 32.00] to the range between [ -128, 127], i.e., 8-bit [ -128, 127] represents 32-bit [ -22.00, 32.00 ]. The purpose of quantization is to reduce the size of the neural network model and reduce the occupied storage space. It can be understood that, in the quantization process, since the range of values that the original value range can represent is generally much larger than the target value range, for example, 32 bits of floating point type can represent a decimal, and 8 bits of integer type can only represent an integer, there must be rounding or data truncation and other processes in the process of quantizing the value of the target value range from the original value range, which affect the actual values of part of the weight parameters of the neural network model, and thus affect the precision of the neural network model. In particular, this part will be described later in connection with fig. 2 to 3.
And a convolution module 120, configured to perform convolution calculation on the quantized input data of the i-th layer network layer and the quantized weight parameter, so as to obtain a convolution result of the i-th layer. In some embodiments, when i is equal to 1, the input data of the i-th layer network layer is the original input picture. In other embodiments, when i is greater than 1, the input data of the i-th network layer is feature map data, the feature map data represents a calculation result of a certain network layer and is an intermediate calculation result for the whole convolutional neural network model, and for the input data of the i-th network layer, the input data is the feature map data of the i-1-th network layer, that is, the convolution result of the i-1-th network layer. For example, if i is 2, the input data of the layer 2 network layer is the feature map data of the layer 1, and the feature map data of the layer 1 is the convolution result of the layer 1 network layer.
It will be appreciated that each layer network layer has its own convolution calculation model, i.e. each layer network layer has its own weight parameters and bias parameters. For example, assume that the convolution model of the i-layer network layer is shown in the following equation (1):
y=∑ax+b (1)
wherein, a is a weight parameter, b is a bias parameter, x is input data input to the i-th layer network layer, and the convolution result y of the i-th layer network layer is obtained through the calculation of the formula (1).
It should be understood that although the network layers of the neural network model include multiple layers, it does not mean that all the network layers included in the neural network model need to be quantized. That is, in some embodiments, only a portion of the network layer may be quantized, and the portion of the network layer may be a continuous network layer or a discontinuous network layer. For example, assuming that the network layer has 100 layers, quantization processing may be performed on the 10 th to 90 th layers, or quantization processing may be performed on the 10 th, 20 th, and 30 th layer network layers, and the application does not limit the number and the positions of the network layers that need to be quantized.
It should also be understood that the above equation (1) is only exemplary, and is only for illustrating the relationship among the weight parameter, the bias parameter, and the convolution result of the i-th network layer, and does not constitute any limitation on the convolution calculation model of the i-th network layer.
It should also be understood that the system 100 may also include other modules, such as an input module, a pooling module, and the like. For supporting the system to perform other functions of the neural network model, which is not limited in this application.
It is also understood that the system 100 may be a chip or an apparatus, which may include the quantization module 110 and the convolution module 120, etc. In some embodiments, the apparatus may be an electronic device with computing capability, or the chip may be integrated on an electronic device with computing capability, where the electronic device may include, but is not limited to, a smartphone, a tablet, a server, a supercomputer, a wearable electronic device, a smart appliance, and so on, and the application is not limited to the kind of this electronic device.
For convenience of description, the quantization method in each embodiment of the present application is described below by taking an example that the quantization module 110 in the neural network model system 100 in fig. 1 quantizes the weight parameter of a certain layer in the neural network model.
As mentioned above, each parameter in the neural network model may be stored and calculated based on 32 bits or 64 bits (floating point type), which may occupy tens of hundreds of megabits of storage space and a large amount of computing resources of the electronic device, and is not favorable for transplanting the neural network model to the electronic device with limited computing resources or storage space, and may affect user experience. For example, it is not easy for research and development personnel to transplant the trained face recognition model into a mobile phone, so that the user can complete face authentication on the mobile phone, and further use functions such as mobile phone payment and identity authentication. For another example, it is also not beneficial for the research and development personnel to transplant the trained image classification model to a mobile phone, so that the user can perform image retrieval by using the functions of searching images with images and the like.
In order to reduce the storage space occupied by the neural network model and reduce the calculation resources consumed by the neural network model, some embodiments of the present application provide a quantization method for the neural network model, in which the quantization module 110 uses the weight parameter with the maximum absolute value and the weight parameter with the minimum absolute value in the neural network model as the original value range of the neural network model, and then maps the original value range into a target value range smaller than 32 bits, so as to reduce the storage space occupied by the neural network model and the calculation resources consumed when performing convolution calculation.
Specifically, in some embodiments, the quantization module 110 first determines a maximum value max and a minimum value min of an absolute value in the weight parameters of the neural network model that need to be quantized, that is, determines an original value range of the weight parameters of the neural network model, then the quantization module 110 determines a target value range to which the neural network model needs to be quantized, then calculates a shrinkage factor scale and a zero _ point of the neural network model according to the original value range and the target value range of the neural network model, and finally determines the quantized weight parameters of the neural network model according to the shrinkage factor scale and the zero _ point of the neural network model and the weight parameters of the neural network model before quantization.
For example, in some embodiments, the target value range is typically an 8-bit integer value range [ -128, 127], but it is understood that in other embodiments, the target value range may also be a 16-bit value range, or a 24-bit value range, or other value range smaller than 32-bit, and it should be understood that the value range of the target value range is not limited in any way by this application.
The scaling factor scale may be understood as a reduced multiple of the neural network model from the original value range to the target value range, i.e. the quantization scale of the neural network model. In some embodiments, as described above, the shrinking factor sca ] e may be determined by the value range of the original value range and the value range of the target value range of the weight parameter of the neural network model. In some embodiments, quantization module 110 may determine the shrinkage factor scale by the following equation (2):
Figure BDA0003303144760000071
wherein the content of the first and second substances,
Figure BDA0003303144760000072
a weight parameter representing the maximum absolute value in the domain of the original values of the pre-quantization neural network model,
Figure BDA0003303144760000073
representing the weight parameter with the smallest absolute value in the domain of the original values of the pre-quantization neural network model,
Figure BDA0003303144760000074
a value representing the maximum of the absolute value in the target value domain,
Figure BDA0003303144760000075
indicating the value in the target domain where the absolute value is smallest.
The zero point zero _ point refers to a corresponding position of a zero point in an original value domain before quantization of the neural network model in a target value domain after quantization. In some embodiments, when the quantization process is symmetric quantization, i.e. as shown in fig. 2(a), the zero point of the original value range corresponds to the zero point of the target value range, and zero _ point is 0. In other embodiments, when the quantization is asymmetric quantization, that is, as shown in fig. 2(B), the zero point of the original value range does not correspond to the zero point of the target value range, and the zero _ point can be calculated by the following formula (3).
Figure BDA0003303144760000076
Wherein the content of the first and second substances,
Figure BDA0003303144760000081
a weight parameter representing the maximum absolute value in the domain of the original values of the pre-quantization neural network model,
Figure BDA0003303144760000082
the scale represents the shrinkage factor of the neural network model from the original value range to the target value range.
After the shrinkage factor scale and the zero point zero _ point are both determined, the quantization module 110 calculates a weight parameter corresponding to the quantized weight parameter of the neural network model before quantization according to the following formula (4):
xquant=xfloat×scale+zero_point (4)
wherein x isfloatWeight parameter, x, representing a pre-quantization neural network modelquantRepresenting the weight parameters of the neural network model after quantization, zero _ point representing the corresponding position of the zero point of the original value range of the neural network model before quantization in the quantized target value range, and scale representing the contraction factor of the neural network model from the original value range to the target value range.
In symmetric quantization, since zero _ point is 0, the quantization module 110 may calculate a quantized weight parameter of the neural network model according to the following equation (5):
xquant=xfloat×scale (5)
for example, assuming that the weight parameter of the neural network model is in the range of-22.00 or more and 32.00 or less, i.e., -22.00, 32.00, the target value range is 8-bit integer, and the representable value range is in the range of-128 or more and 127 or less, i.e., -128, 127.
Because of the symmetric quantization, the zero point of the original value domain of the neural network model is consistent with the zero point of the target value domain and is 0, namely zero _ point is 0; the maximum absolute value of the weight parameters of the neural network model is |32.00|, that is
Figure BDA0003303144760000083
Therefore, the quantization module 110 may calculate the scale of 32.00/128 of 0.2500 by using the above equation (2), and then the quantization module 110 calculates the quantized corresponding weight parameter of the weight parameters of the neural network model by using the above equation (5). Here, only the quantized weight parameters corresponding to the maximum value MAX (32.00) and the minimum value MIN (-22.00) among the weight parameters of the neural network model are calculated as an example: as shown in fig. 3, the quantization module 110 calculates a quantized weight parameter corresponding to MAX (32.00) to be 8 and a quantized weight parameter corresponding to MIN (-22.00) to be-5.5 according to the above equation (5), but since the integer can only represent an integer, rounding-5.5 to-6.
As can be seen from the calculation processes of the above equations (2) to (5), when calculating the contraction factor of the neural network model, the quantization module 110 calculates the contraction factor by using only the absolute value of the weight parameter of the neural network model, the positive and negative values of each weight parameter in the neural network model are not considered, that is, the influence of the positive and negative values of the weight parameter on the convolution result is not considered, and it can be understood that, since the positive and negative values of the weight parameter are not considered, when the absolute value of the negative value in the weight parameter is smaller than the absolute value of the positive value in the weight parameter, when the quantization module 110 calculates the shrinkage factor using the above equation (2), a larger shrinkage factor is obtained, and when the quantized weight parameter is calculated using the larger shrinkage factor, the quantization scale of the neural network is reduced, and the neural network model with larger scale is still not beneficial to be transplanted to the electronic equipment with limited resources.
And, the sign of each weight parameter in the neural network model is used to distinguish whether the corresponding element is the target element, and more specifically, taking the example that the electronic device uses the neural network model to detect the target object Q shown in fig. 3, the weight parameter corresponding to the pixel within the range of the outline L of the target object Q is a positive value (or a negative value), while the weight parameter corresponding to the pixel outside the range of the outline L of the target object Q is a negative value (or a positive value), if the sign of the weight parameter is not considered, the quantization result of the weight parameter inside and outside the range of the outline L in the graph is affected, and the accuracy of the neural network model for detecting the target object Q is reduced.
In order to solve the above technical problem, an embodiment of the present application provides a quantization method for a neural network model. In the quantization method of the neural network model of the present application, the quantization module 110 will respectively calculate the shrinkage factor scales corresponding to the value ranges of the weight parameters greater than or equal to 0Greater than or equal to 0And a shrinkage factor scale corresponding to a value range of the weight parameter less than or equal to 0Less than or equal to 0Then calculate the shrinkage factor scaleGreater than or equal to 0And a contractile factor scaleLess than or equal to 0By the method, the accurate contraction factor of the neural network model is obtained, so that the quantization scale of the neural network model is more reasonable, and the accuracy of the neural network model is more accurate.
Next, the process of quantifying the weight parameters of the neural network model by the quantifying method of the neural network model of the present application is described with reference to the neural network system 100 shown in fig. 1 and the symmetric quantifying process shown in fig. 2 (a).
It should be understood that the quantization method of the neural network model of the present application is also applicable to the quantization of various input data of the neural network model mentioned above, such as bias parameters, activation functions, images, and the like, and the present application is not limited thereto.
It should also be understood that, for convenience of description, the following is to quantify the weight parameters in a certain network layer of the neural network model, and as mentioned above, in other embodiments, the weight parameters in all network layers or a part of network layers of the neural network model may also be quantified, or the weight parameters of multiple continuous network layers or discontinuous network layers in a part of network layers of the neural network model may also be quantified at the same time, which is not limited in this application.
Specifically, as shown in fig. 4, the method 400 includes:
401, determining a first contraction factor and a second contraction factor of the neural network model weight parameter according to the value range of the target value range.
The value range of the target value range may refer to the foregoing description. In some embodiments, the target value range is determined in a manner preset by a developer according to a storage space, a computing resource, and the like of an electronic device to which the neural network model is applied. For example, if the electronic device to which the neural network model is applied is an electronic device with limited storage space, such as a smartphone, the range of the target value range may be a smaller range, such as [ -128, 127] represented by an 8-bit integer, and if the electronic device to which the neural network model is applied is an electronic device with a relatively larger storage space, such as a notebook computer, the range of the target value range may be a larger range, such as a 16-bit floating point type represented value range, and the determination manner of the target value range is not limited in the present application.
It is understood that the weight parameter of the neural network model may form a value range including a positive interval and a negative interval, for example, the value range of the weight parameter of the neural network model is [ -80.00, -60.00, 0.00, 40.00, 80.00, 100.00]For example, the value range may be divided into a first value range [ -80.00, -60.00, 0.00 [ -80.00 [ -60.00 [ -0.00 [ ]]And a second value range [0, 40.00, 80.00, 100.00]]Two intervals. Similarly, the value range of the target value range is [ -128, 127]]For example, the value range can also be divided into [ -128, 0 [)]And [0, 127]]. In combination with the value range of the target value range and the value ranges of the weight parameters of the neural network model, the quantization module 110 may respectively calculate the first shrinkage factor scale corresponding to the first value range by using the above formula (4)negative-80.0-0/| -128-0| -80/128 | -0.6250 and a second shrinkage factor corresponding to a second range of valuesscalepositive=|100.00-0|/127=100/127=0.7874。
And 402, calculating the weighted sum of the first contraction factor and the second contraction factor to obtain the contraction factor of the neural network model.
After obtaining a first shrinkage factor corresponding to the negative weight parameter and a second shrinkage factor corresponding to the positive weight parameter of the neural network model, the quantization module 110 calculates a weighted sum of the first shrinkage factor and the second shrinkage factor to obtain a shrinkage factor considering the sign of the weight parameter of the neural network model.
Specifically, in some embodiments, the quantization module 110 assigns a weight α to the first contraction factor, and correspondingly, the weight of the second contraction factor is 1- α, and then increases the value of the weight α from 0 in accordance with a preset amount, and calculates a weighted sum of the first contraction factor and the second contraction factor corresponding to each α, so as to obtain a plurality of contraction factors of the neural network model.
In other embodiments, the quantization module 110 may further determine a value range of α according to a relationship between the first shrinkage factor and the second shrinkage factor of the neural network model and the attribute value of the neural network model, then gradually increase the value of the weight α according to a preset amount in the value range, and calculate a weighted sum of the first shrinkage factor and the second shrinkage factor corresponding to each α, so as to obtain a plurality of shrinkage factors of the neural network model. The manner of calculating the weighted sum of the first and second puncturing factors will be described in detail below, and will not be described herein again.
And 403, quantifying the neural network model according to the contraction factor of the neural network model.
In some embodiments, the quantization module 110 quantizes the weight parameters [ -80.00, -60.00, 0.00, 40.00, 80.00, 100.00] of the neural network model by using the above formula (5) after obtaining the plurality of shrinkage factors of the neural network model, obtains quantization results of the plurality of weight parameters of the neural network model, then calculates the similarity between each quantization result and the weight parameters before quantization, and selects the quantization result with the highest similarity as the final quantization result of the neural network model.
In some embodiments, the way to calculate the similarity of the pre-and post-quantization neural network model weight parameters includes, but is not limited to, calculating euclidean distances between the pre-and post-quantization weight parameters, spectral information divergence, and the like. The specific manner of quantifying the weights of the neural network model will be described below, and will not be described herein again.
By the method 400, the quantization module 110 can obtain the shrinkage factor considering the sign of the weight parameter of the neural network model, and when the neural network model is quantized according to the shrinkage factor, the quantized neural network model with more reasonable quantization scale and more accurate precision of the neural network model can be obtained.
The general flow of the quantization method of the present application is described above, and the specific calculation processes involved in the method 400 of the present application are described below corresponding to the respective steps in the method 400 described above.
In some embodiments, corresponding to 402 above, quantization module 110 may calculate a weighted sum of the first and second contraction factors by equation (5) below.
scale=|scalenegative|*α+(1-α)*|scalepositive| (5)
Wherein scale represents a scaling factor of the neural network model weights, scalenegativeRepresenting a first shrinkage factor, scale, calculated from weights equal to or less than 0 in the neural network modelpositiveRepresenting a second shrinkage factor calculated from a weight greater than or equal to 0 in the neural network model, alpha being scalenegativeThe weight of (c). Wherein the value range of the weight alpha is more than or equal to 0 and less than or equal to 1.
Then, the quantization module 110 may start from 0 to 1, increase α by a preset amount each time, and calculate a shrinkage factor quantized by the neural network model corresponding to each α. In some embodiments, the preset amount may be any positive number greater than 0, such as 0.1, 0.01, 1, and the like, which is not limited in the present application. However, it should be understood that the smaller the value of the preset quantity is, the more accurate the obtained weight α is, and the better the quantization effect that can be achieved by the shrinkage factor of the corresponding neural network model is.
Specifically, taking the example that the weight parameter of the neural network model is [ -80.00, -60.00, 0.00, 40.00, 80.00, 100.00], and the preset quantity is 0.2, the quantization module 110 assigns α values from 0 to 1, and calculates the shrinkage factor scale of the neural network model corresponding to each α value, i.e., scale ═ 0.6250 × α + (1- α) 0.7874, respectively, to obtain the contents shown in table 1 below:
TABLE 1
Figure BDA0003303144760000101
As can be seen from table 1 above, when α is 0, the scaling factor scale of the neural network model is 0.7874, when α is 0.2, the scaling factor scale of the neural network model is 0.7549, when α is 0.4, the scaling factor scale of the neural network model is 0.7224, when α is 0.6, the scaling factor scale of the neural network model is 0.6900, when α is 0.8, the scaling factor scale of the neural network model is 0.6575, and when α is 1.0, the scaling factor scale of the neural network model is 0.6250.
Thereafter, corresponding to 403 above, in some embodiments, the quantization module 110 quantizes the weight parameters [ -80.00, -60.00, 0.00, 40.00, 80.00, 100.00] of the neural network model according to the shrinkage factor of table 1 above by using the above equation (5), to obtain a plurality of quantized weight parameters, as shown in table 2 below:
TABLE 2
Figure BDA0003303144760000111
As can be seen from table 2 above, when the scaling factor scale of the neural network model is 0.7874, the weight parameters of the neural network model are-63, -47, 0, 31, 63, 79 }; when the shrinkage factor scale of the neural network model is 0.7549, the weight parameter of the neural network model is-60, -45, 0, 30, 60, 75 }; when the shrinkage factor scale of the neural network model is 0.7224, the weight parameters of the neural network model are 58, -43, 0, 29, 58, 72 }; when the shrinkage factor scale of the neural network model is 0.6900, the weight parameter of the neural network model is-55, -41, 0, 28, 55, 69 }; when the shrinkage factor scale of the neural network model is 0.6575, the weight parameter of the neural network model is 53, -39, 0, 53, 66 }; when the shrinkage factor scale of the neural network model is 0.6250, the weight parameters of the neural network model are-50, -38, 0, 25, 50, 63 }.
Then, the quantization module 110 calculates the similarity between the quantized weight parameter corresponding to each shrinkage factor in table 2 above and the weight parameter before quantization, and further determines the weight parameter closest to the quantized weight parameter before quantization, and uses the weight parameter as the final quantization result of the neural network model. Specifically, in some embodiments, the quantization module 110 may calculate the similarity between the weights of the neural network model before and after quantization by using a method of divergence of spectral information.
SID(A,B)=D(A||B)+D(B||A)
Figure BDA0003303144760000112
Figure BDA0003303144760000113
Where a denotes a weight parameter of the neural network model before quantization, may be expressed as (p1, p2, p3, … pn) T, B denotes a weight parameter of the neural network model after quantization, may be expressed as (q1, q2, q3, … qn) T, D (a | | B) denotes a dispersion of the weight parameter before quantization with respect to the weight parameter after quantization, D (B | | a) denotes a dispersion of the weight parameter after quantization with respect to the weight parameter before quantization, and SID (a, B) denotes a dispersion of spectral information between the weight parameter before quantization and the weight parameter after quantization. It can be understood that the lower the divergence of the spectral information, the more similar the weighting parameters of the neural network model before and after quantization, i.e. the lower the influence of quantization on the accuracy of the neural network model.
Calculating the spectral information divergence before and after quantization of the neural network model corresponding to each contraction factor by using the formula (7) to obtain the following table 3:
TABLE 3
Figure BDA0003303144760000121
As can be seen from table 3 above, when the divergence of the spectral information of the weight parameters of the neural network model before and after the quantization is minimum (1.7654 is taken), the corresponding shrinkage factor is 0.7874, and the corresponding weight α is 0 as can be seen from table 2 above. Therefore, for the neural network model with the weight parameters of-80.00, -60.00, 0.00, 40.00, 80.00, 100.00, the shrinkage factor with the best quantization effect is 0.7874, and the corresponding weight α is 0.
As can be seen by comparing table 3 with the quantized weight parameters (table 4 below) obtained by the above equations (1) to (3):
table 4:
Figure BDA0003303144760000122
the quantization method can enable the weight parameters before and after quantization to be closer, and the divergence of the spectral information is smaller than that 1.7968 of the spectral information of the weight parameters obtained by using the formulas (1) to (3), so that the neural network model is ensured to have the same or similar precision with that before quantization.
The specific implementation details of the quantization method of the present application are introduced above, and it can be seen from the above that, in the quantization method of the present application, the setting mode of the value of the weight α is the optimal value that is finally determined from zero to one step-by-step test according to the preset value, and in order to further improve the efficiency of calculating the weight α, the inventor obtains the relationship between the weight parameter of the neural network model and the value range of the weight α after a large number of tests and trainings, that is, when the weight parameter of the neural network model satisfies a certain constraint condition, the value range of the weight α is also clear. In this way, when the quantization module 110 calculates the weight α, the relationship between the weight parameter of the neural network model and the constraint condition may be calculated first, and then a more definite value range of the weight α is obtained, and then the value of the weight α is adjusted according to the preset amount within the value range until the shrinkage factor corresponding to the minimum value of the divergence of the spectral information before and after quantization is obtained.
The weight parameter scale of the neural network model is described belownegative、scalepositiveAnd the corresponding relation between the satisfied constraint condition and the value range of the weight alpha.
scalenegative、scalepositiveThe relationship between the parameter beta and the parameter gamma corresponding to the neural network model and the relationship between the value range of alpha are shown in the following table 5:
TABLE 5
Figure BDA0003303144760000123
Figure BDA0003303144760000131
The parameters β and γ are statistical values, and the values of the parameters β and γ are related to the type of the neural network model, the number of network layers, and the like, but the parameters β and γ are determined for a specific neural network model. Taking the image recognition neural network model as an example, the value of β is 0.15, and the value of γ is 0.095. Continuing with the example of the neural network model for image recognition, the process of determining the weight α by the quantization module 110 using the relationship between the weight parameter of the neural network model and the constraint condition is described.
Specifically, in the process of calculating the image recognition neural network model by the quantization module 110, the shrinkage factor scale of the weight parameter less than or equal to 0 of the image recognition neural network model may be calculated firstnegativeAnd a shrinkage factor scale of the weight parameter greater than or equal to 0positiveAnd the variance of the weight parameters of the image recognition neural network model, and then the shrinkage factor scale of the weight range smaller than or equal to 0negativeAnd a scale of the shrinkage factor of the weight parameter greater than or equal to 0positiveAnd the relationship between the variance of the weight parameter when the image recognizes the spiritScale via network modelnegative、scalepositiveWhen the relationship between the variables satisfies condition 1 in table 5, the value range of α is 0 or more and α < 0.2, that is, the quantization module 110 increases the predetermined amount from 0 to 0.2, and calculates the contraction factor scale of the image recognition neural network model corresponding to each alpha value, and the similarity between the weight parameters of the image recognition neural network model corresponding to each contraction factor scale before and after quantization, and determining the shrinkage factor scale with the best quantization effect of the image recognition neural network model according to the similarity between the weight parameters of the image recognition neural network model corresponding to each shrinkage factor scale before and after quantization, the shrinkage factor can minimize the similarity of the weight parameters before and after quantization of the image recognition neural network model, and further, the accuracy of the image recognition neural network model after quantization is consistent with that before quantization as much as possible.
It should be understood that, in the embodiment of the present application, the quantization of the weight parameters, the bias, and the like of the neural network model may be performed by a model quantizer. If the neural network model is a neural network model of an image processing domain, quantization of input data (e.g., images, feature maps, etc.) of the neural network model of the image processing domain may be performed by an image quantizer, weighting parameters, biases, etc. for these models may be performed by a model quantizer, and the image quantizer and the model quantizer may be integrated in the quantization module 110 or may be separately provided. This is not limited by the present application.
Similarly, the convolution calculation performed in the convolution module 120 may be performed by a multiplier and an adder, and these multiplier and adder may be integrated in the convolution module 120 or may be separately provided. Nor is it limited by this application.
An embodiment of the present application further provides an electronic device, including: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of the decoding procedure implemented in the form of computer program code in the above embodiments when executing the computer program. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer memory, read-only memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunication signals, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
The embodiment of the application also provides an image identification method, which comprises the following steps: acquiring an image to be identified by the electronic equipment; and then, identifying the image to be identified by adopting a neural network model, wherein the neural network model is quantized by adopting the neural network model in each embodiment, so that the neural network model used by the image identification method can be more miniaturized under the condition of ensuring the identification accuracy of the neural network model, the storage space and the computing resource occupied by the neural network model are reduced, and the transplantation of the neural network model is facilitated.
Embodiments of the present application further provide a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program can implement the steps in the decoding flow embodiments implemented by the computer program code.
The embodiments of the present application provide a computer program product, which when running on an electronic device, enables the electronic device to implement the steps in the above embodiments when executed.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/quantization module 110 and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (13)

1. A method of quantifying a neural network model, the method comprising:
determining a first quantization parameter and a second quantization parameter required for quantizing the data to be quantized to a target value range according to a quantization value range of the data to be quantized and the value range of the target value range, wherein the first quantization parameter corresponds to the data to be quantized which is larger than 0 in the quantization value range, and the second quantization parameter corresponds to the data to be quantized which is smaller than 0 in the quantization value range;
determining a third quantization parameter of the neural network model according to the first quantization parameter and the second quantization parameter;
and quantizing the data to be quantized to the target value range according to the third quantization parameter.
2. The method of claim 1, wherein the data to be quantized comprises any one or more of:
a weight parameter and a bias parameter in any one or more network layers of a plurality of network layers forming the neural network model; or input data for any one or more network layers.
3. The method of claim 2, wherein the input data comprises image data.
4. The method according to claim 1, wherein the data to be quantized in the quantization range corresponding to the first quantization parameter is equal to 0.
5. The method according to claim 1 or 4, wherein the data to be quantized in the quantization range corresponding to the second quantization parameter is equal to 0.
6. The method of claim 1, wherein the range of the target range comprises a positive interval and a negative interval, and
the method for determining a first quantization parameter required for quantizing the data to be quantized to the target value range according to the quantization value range of the data to be quantized and the value range of the target value range comprises the following steps:
and determining a first maximum value in the data to be quantized, which is greater than or equal to 0, in the quantization value range, wherein the ratio of the first maximum value to the maximum value of the positive value interval is the first quantization parameter.
7. The method of claim 6, wherein the range of the target range comprises a positive interval and a negative interval, and
the method for determining the second quantization parameter required for quantizing the data to be quantized to the target value range according to the quantization value range of the data to be quantized and the value range of the target value range comprises the following steps:
and determining a first minimum value in the data to be quantized, which is smaller than 0 in the quantization value range, wherein the ratio of the first minimum value to the minimum value of the negative value interval is the second quantization parameter.
8. The method of claim 1, wherein determining the third quantization parameter of the neural network model based on the first quantization parameter and the second quantization parameter comprises:
according to a preset amount, gradually increasing a first weight value of the first quantization parameter from 0, and calculating the sum of a first quantization parameter corresponding to each first weight value and a second quantization parameter corresponding to each second weight value to obtain a plurality of third quantization parameters, wherein the sum of the first weight value and the second weight value is 1; and
the quantizing the data to be quantized to the target value range according to the third quantization parameter includes:
quantizing the data to be quantized to the target value range according to the third quantization parameters to obtain a plurality of quantization results of the neural network model;
determining a first quantization result with similarity greater than a preset similarity with the data to be quantized from the plurality of quantization results, and using the first quantization result as a quantization result of the neural network model.
9. The method of claim 1, wherein determining the third quantization parameter of the neural network model based on the first quantization parameter and the second quantization parameter comprises:
determining a value range of a first weight value or a second weight value according to a relation between an attribute parameter of the neural network model and the first quantization parameter and the second quantization parameter, wherein the first weight value corresponds to the first quantization parameter, and the second weight value corresponds to the second quantization parameter;
within the value range of the first weight value, the first weight value is gradually increased according to a preset amount, and the weighted sum of a first quantization parameter corresponding to each first weight value and a second quantization parameter corresponding to each second weight value is calculated to obtain a plurality of third quantization parameters, wherein the sum of the first weight value and the second weight value is 1; and
the quantizing the data to be quantized to the target value range according to the third quantization parameter includes:
quantizing the data to be quantized to the target value range according to the third quantization parameters to obtain a plurality of quantization results of the neural network model;
determining a first quantization result with similarity greater than a preset similarity with the data to be quantized from the plurality of quantization results, and using the first quantization result as a quantization result of the neural network model.
10. The method according to claim 8 or 9, wherein the predetermined amount has a value range greater than 0 and less than 1.
11. An image recognition method, characterized in that the method comprises:
acquiring an image to be identified;
identifying the image to be identified by using a neural network model, wherein the neural network model is the quantification method of any one of claims 1 to 10.
12. An electronic device, characterized in that the electronic device comprises:
a memory storing instructions;
a processor coupled with a memory, the program instructions stored by the memory when executed by the processor causing the electronic device to perform the method of any of claims 1-11.
13. A computer-readable storage medium having instructions stored therein, which when run on the computer-readable storage medium, cause the computer-readable storage medium to perform the method of any one of claims 1 to 11.
CN202111196316.9A 2021-10-14 2021-10-14 Neural network model quantization method, electronic device, and medium Pending CN113850374A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111196316.9A CN113850374A (en) 2021-10-14 2021-10-14 Neural network model quantization method, electronic device, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111196316.9A CN113850374A (en) 2021-10-14 2021-10-14 Neural network model quantization method, electronic device, and medium

Publications (1)

Publication Number Publication Date
CN113850374A true CN113850374A (en) 2021-12-28

Family

ID=78978251

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111196316.9A Pending CN113850374A (en) 2021-10-14 2021-10-14 Neural network model quantization method, electronic device, and medium

Country Status (1)

Country Link
CN (1) CN113850374A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272706A (en) * 2022-07-28 2022-11-01 腾讯科技(深圳)有限公司 Image processing method and device, computer equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106485316A (en) * 2016-10-31 2017-03-08 北京百度网讯科技有限公司 Neural network model compression method and device
AU2017201504A1 (en) * 2011-07-12 2017-03-23 Nec Corporation Image quantization parameter decoding method
CN107239826A (en) * 2017-06-06 2017-10-10 上海兆芯集成电路有限公司 Computational methods and device in convolutional neural networks
US20180077416A1 (en) * 2016-09-09 2018-03-15 Hanwha Techwin Co., Ltd. Quantization parameter determination method and image capture apparatus
WO2019184823A1 (en) * 2018-03-26 2019-10-03 华为技术有限公司 Convolutional neural network model-based image processing method and device
WO2019223594A1 (en) * 2018-05-21 2019-11-28 Oppo广东移动通信有限公司 Neural network model processing method and device, image processing method, and mobile terminal
CN110598839A (en) * 2018-06-12 2019-12-20 华为技术有限公司 Convolutional neural network system and method for quantizing convolutional neural network
CN110826685A (en) * 2018-08-08 2020-02-21 华为技术有限公司 Method and device for convolution calculation of neural network
CN111126558A (en) * 2018-10-31 2020-05-08 北京嘉楠捷思信息技术有限公司 Convolution neural network calculation acceleration method, device, equipment and medium
CN111741302A (en) * 2020-08-07 2020-10-02 腾讯科技(深圳)有限公司 Data processing method and device, computer readable medium and electronic equipment
CN112101543A (en) * 2020-07-29 2020-12-18 北京迈格威科技有限公司 Neural network model determination method and device, electronic equipment and readable storage medium
US20210004663A1 (en) * 2019-07-04 2021-01-07 Samsung Electronics Co., Ltd. Neural network device and method of quantizing parameters of neural network
WO2021179587A1 (en) * 2020-03-10 2021-09-16 北京迈格威科技有限公司 Neural network model quantification method and apparatus, electronic device and computer-readable storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2017201504A1 (en) * 2011-07-12 2017-03-23 Nec Corporation Image quantization parameter decoding method
US20180077416A1 (en) * 2016-09-09 2018-03-15 Hanwha Techwin Co., Ltd. Quantization parameter determination method and image capture apparatus
CN106485316A (en) * 2016-10-31 2017-03-08 北京百度网讯科技有限公司 Neural network model compression method and device
CN107239826A (en) * 2017-06-06 2017-10-10 上海兆芯集成电路有限公司 Computational methods and device in convolutional neural networks
WO2019184823A1 (en) * 2018-03-26 2019-10-03 华为技术有限公司 Convolutional neural network model-based image processing method and device
WO2019223594A1 (en) * 2018-05-21 2019-11-28 Oppo广东移动通信有限公司 Neural network model processing method and device, image processing method, and mobile terminal
CN110598839A (en) * 2018-06-12 2019-12-20 华为技术有限公司 Convolutional neural network system and method for quantizing convolutional neural network
CN110826685A (en) * 2018-08-08 2020-02-21 华为技术有限公司 Method and device for convolution calculation of neural network
CN111126558A (en) * 2018-10-31 2020-05-08 北京嘉楠捷思信息技术有限公司 Convolution neural network calculation acceleration method, device, equipment and medium
US20210004663A1 (en) * 2019-07-04 2021-01-07 Samsung Electronics Co., Ltd. Neural network device and method of quantizing parameters of neural network
WO2021179587A1 (en) * 2020-03-10 2021-09-16 北京迈格威科技有限公司 Neural network model quantification method and apparatus, electronic device and computer-readable storage medium
CN112101543A (en) * 2020-07-29 2020-12-18 北京迈格威科技有限公司 Neural network model determination method and device, electronic equipment and readable storage medium
CN111741302A (en) * 2020-08-07 2020-10-02 腾讯科技(深圳)有限公司 Data processing method and device, computer readable medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
田一姝;沈强;刘延伟;张宇;赵志军;: "X264的平均比特率控制算法优化", 计算机应用, no. 03, 1 March 2013 (2013-03-01) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272706A (en) * 2022-07-28 2022-11-01 腾讯科技(深圳)有限公司 Image processing method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111950723B (en) Neural network model training method, image processing method, device and terminal equipment
CN110222220B (en) Image processing method, device, computer readable medium and electronic equipment
CN112287986B (en) Image processing method, device, equipment and readable storage medium
CN110175641B (en) Image recognition method, device, equipment and storage medium
CN109284761B (en) Image feature extraction method, device and equipment and readable storage medium
EP4087239A1 (en) Image compression method and apparatus
CN110929836B (en) Neural network training and image processing method and device, electronic equipment and medium
CN111898578A (en) Crowd density acquisition method and device, electronic equipment and computer program
CN112200296A (en) Network model quantification method and device, storage medium and electronic equipment
CN115393633A (en) Data processing method, electronic device, storage medium, and program product
CN113850374A (en) Neural network model quantization method, electronic device, and medium
CN111385601B (en) Video auditing method, system and equipment
CN114119560A (en) Image quality evaluation method, system, and computer-readable storage medium
CN117422182A (en) Data prediction method, device and storage medium
CN112561779B (en) Image stylization processing method, device, equipment and storage medium
CN113435499A (en) Label classification method and device, electronic equipment and storage medium
CN115062777B (en) Quantization method, quantization device, equipment and storage medium of convolutional neural network
CN113159318B (en) Quantification method and device of neural network, electronic equipment and storage medium
CN116363641A (en) Image processing method and device and electronic equipment
CN112598078B (en) Hybrid precision training method and device, electronic equipment and storage medium
CN112418388A (en) Method and device for realizing deep convolutional neural network processing
CN112200275B (en) Artificial neural network quantification method and device
CN113887709A (en) Neural network adaptive quantization method, apparatus, device, medium, and product
CN114139678A (en) Convolutional neural network quantization method and device, electronic equipment and storage medium
CN112819079A (en) Model sampling algorithm matching method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination