CN116956989A - Quantification method and device of normalization operator in neural network model and electronic equipment - Google Patents

Quantification method and device of normalization operator in neural network model and electronic equipment Download PDF

Info

Publication number
CN116956989A
CN116956989A CN202310955099.XA CN202310955099A CN116956989A CN 116956989 A CN116956989 A CN 116956989A CN 202310955099 A CN202310955099 A CN 202310955099A CN 116956989 A CN116956989 A CN 116956989A
Authority
CN
China
Prior art keywords
point
fixed
quantization
fixed point
outputs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310955099.XA
Other languages
Chinese (zh)
Inventor
许礼武
余宗桥
黄敦博
周生伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Technology China Co Ltd
Original Assignee
ARM Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARM Technology China Co Ltd filed Critical ARM Technology China Co Ltd
Priority to CN202310955099.XA priority Critical patent/CN116956989A/en
Publication of CN116956989A publication Critical patent/CN116956989A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/01Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
    • G06F5/015Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising having at least two separately controlled shifting levels, e.g. using shifting matrices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Nonlinear Science (AREA)
  • Data Mining & Analysis (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The embodiment of the application provides a quantization method and device of a normalization operator in a neural network model, electronic equipment and a computer readable storage medium, and relates to the field of neural networks. The method comprises the following steps: obtaining a first quantization coefficient according to the floating point output data, the quantization bit width, the floating point scaling parameter and the first shift number; obtaining a fixed-point normalization result according to the fixed-point input data, the second shifting number and the quantization bit width; obtaining initial fixed-point output data according to fixed-point scaling parameters, fixed-point translation parameters and fixed-point normalization results; and quantizing the initial fixed-point output data according to the first quantization coefficient, and shifting the quantization result by a first shifting number to obtain fixed-point output data. According to the embodiment of the application, the mean value and the variance are not required to be calculated completely, so that errors caused by a traditional mean value and variance calculation method are avoided, a shifting mode based on the first shifting number and the second shifting number is adopted for partial division operation, the operation amount is greatly reduced, and the operation efficiency is improved.

Description

Quantification method and device of normalization operator in neural network model and electronic equipment
Technical Field
The application relates to the technical field of neural networks, in particular to a quantization method and device of normalization operators in a neural network model and electronic equipment.
Background
The field of artificial intelligence technology (Artificial Intelligence, AI) is becoming more and more widely used in people's production and life. The data of words, video, audio, images, etc. can be processed by the neural network model, thereby obtaining the required result. For example, when a user is in a mobile phone, a face recognition model built in the mobile phone can process a shot face image so as to recognize the identity of the person; for another example, a text recognition model built in the mobile phone can process a certain section of speech on the webpage, so that the provenance of the text is obtained; also for example, an audio matching model built into the handset may process the audio, identify sounds/songs, and so on.
Because the neural network model is a resource intensive algorithm, the calculation cost and the memory occupation amount of the neural network model are large, and the neural network model built in the electronic equipment is usually a quantized neural network model. Neural network model quantization generally refers to converting high-precision floating point calculations in a neural network model into fixed point calculations, resulting in a fixed point calculated neural network model.
The normalization operator is an operator which can be applied to the neural network model to process the characteristic data in the operation process of the neural network model. For example, for the face recognition model, the normalization operator in the normalization layer may process the feature data associated with the face image input in the previous layer, so as to obtain processed new feature data associated with the face image.
Because the root formula calculation in the normalization operator is floating point calculation, when the normalization operator performs normalization processing on fixed point data in the neural network, the fixed point data is inversely quantized and then floating point operation is performed, and then the calculation result is quantized and then fixed point data is output. The calculation method has the advantages of large calculation amount, high calculation cost and low calculation speed.
Disclosure of Invention
Embodiments of the present application provide a method, apparatus, electronic device, computer-readable storage medium, and computer program product for quantifying normalization operators in neural network models, which can solve the above-mentioned problems in the prior art. The technical scheme is as follows:
according to an aspect of the embodiment of the present application, there is provided a quantization method of a normalization operator in a neural network model, the method including:
determining fixed point input data Q of the normalization operator inputs Quantization bit width b, floating point scaling parameter gamma, fixed point scaling parameter Q γ Fixed point translation parameter Q β Floating point output data, first shift number N outputs A second shift number exp;
according to floating point output data output, quantized bit width b, floating point scaling parameter gamma and first shift number N outputs Obtaining a first quantization coefficient M outputs
According to fixed-point input data Q inputs A second shift number exp and a quantization bit width b to obtain a fixed-point normalization result Q norm
Scaling parameter Q according to a fixed point γ Fixed point translation parameter Q β Fixed point normalization result Q norm Obtaining initial fixed-point output data; according to the first quantization coefficient M outputs Quantizing the initial fixed point output data, and using the first shift number N as the quantization result outputs Shift to obtain fixed-point output data Q outputs
Wherein the first shift number N outputs Not exceeding the quantization bit width b, the input data is a feature vector for representing preset information, the second shift number is even and 2 exp Is the power of 2 closest to the length N of the eigenvector.
Based on the above embodiments, as an alternative embodiment, the floating point output data output, the quantization bit width b, the floating point scaling parameter γ, and the first shift number N outputs ObtainingObtaining a first quantized coefficient M outputs Comprising:
determining a second quantization coefficient output according to the first difference between the maximum value and the minimum value of the floating point output data output and the quantization bit width b scale
Determining a third quantization coefficient gamma according to the second difference between the maximum value and the minimum value of the floating point scaling parameter gamma and the quantization bit width b scale
Determining a fourth quantization coefficient norm based on the length N and the quantization bit width b scale
According to the second quantized coefficients output scale Third quantization coefficient gamma scale And a fourth quantization coefficient norm scale Determining a fifth quantization coefficient;
the fifth quantization coefficient is shifted by the first shift number N outputs Shifting to obtain the first quantized coefficient M outputs
Based on the above embodiment, as an alternative embodiment, the data Q is input according to a fixed point inputs A second shift number exp and a quantization bit width b to obtain a fixed-point normalization result Q norm Comprising:
for fixed point input data Q inputs Taking the average value to obtain a fixed-point average value Q μ
Determining fixed-point input data Q inputs And fixed point mean value Q μ The degree of difference between the square root of the fixed point variance and the reciprocal of the square root Q is determined by shifting the degree of difference by the second shift number exp alpha
According to fixed-point input data Q inputs Fixed point average value Q μ Fixed point variance reciprocal square root Q alpha Determining initial fixed point normalization result Q norm_tmp
According to Q alpha Normalization of the initial fixed point by the number of bits and the second shift number exp norm_tmp Bit number shifted to quantization bit width, and fixed point normalization result Q is determined norm
Based on the above embodiment, as an alternative embodiment, the fixed point scalingParameter Q γ Is determined by the following means:
determining a third quantization coefficient gamma according to a third difference between the maximum value and the minimum value of the floating point scaling parameter gamma and the quantization bit width b scale
According to floating point scaling parameter gamma and third quantization coefficient gamma scale Determining a fixed point scaling parameter Q γ
On the basis of the above embodiment, as an alternative embodiment, the fixed point translation parameter Q is determined β Comprising:
determining a fourth quantization coefficient norm based on the length N and the quantization bit width b scale
According to quantized coefficients norm scale And quantization coefficient gamma scale Determining a sixth quantization coefficient beta scale
Determining a floating point translation parameter beta, and quantizing the coefficient beta according to the floating point translation parameter beta and the quantized coefficient beta scale Determining a fixed point translation parameter Q β
Based on the above embodiment, as an alternative embodiment, the data Q is input according to a fixed point inputs Fixed point average value Q μ Fixed point variance reciprocal square root Q alpha Determining initial fixed point normalization result Q norm_tmp Comprising:
determining fixed-point input data Q inputs And fixed point mean value Q μ A fourth difference value;
inverse square root of the fourth difference and fixed point variance Q alpha As the product of the initial fixed point normalization result Q norm_tmp
On the basis of the above embodiment, as an alternative embodiment, the preset information is any one of image, text, audio, and environmental information.
According to another aspect of an embodiment of the present application, there is provided a quantization apparatus for a normalization operator in a neural network model, the apparatus including:
a preparation module for determining the fixed-point input data Q of the normalization operator inputs Quantization bit width b, floating pointScaling parameter gamma, fixed point scaling parameter Q γ Fixed point translation parameter Q β Floating point output data, first shift number N outputs A second shift number exp;
a quantization coefficient determining module for determining a quantization coefficient according to the floating point output data output, the quantization bit width b, the floating point scaling parameter gamma and the first shift number N outputs Obtaining a first quantization coefficient M outputs
A fixed point normalization module for inputting data Q according to fixed point inputs A second shift number exp and a quantization bit width b to obtain a fixed-point normalization result Q norm
An output module for scaling the parameter Q according to the fixed point γ Fixed point translation parameter Q β Fixed point normalization result Q norm Obtaining initial fixed-point output data; according to the first quantization coefficient M outputs Quantizing the initial fixed point output data, and using the first shift number N as the quantization result outputs Shift to obtain fixed-point output data Q outputs
Wherein the first shift number N outputs Not exceeding the quantization bit width b, the input data is a feature vector for representing preset information, the second shift number is even and 2 exp Is the power of 2 closest to the length N of the eigenvector.
According to another aspect of an embodiment of the present application, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory, the processor executing the computer program to implement the steps of the above method.
According to a further aspect of embodiments of the present application, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above method.
According to an aspect of an embodiment of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the above method.
The technical scheme provided by the embodiment of the application has the beneficial effects that:
the embodiment of the application does not need to totally calculate the mean value and the variance, avoids errors caused by the traditional mean value and variance calculation method, adopts a shifting mode based on the first shifting number and the second shifting number for partial division operation, greatly reduces the operation amount and improves the operation efficiency.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.
FIG. 1 is a schematic view of a scene provided by an embodiment of the present application;
FIG. 2 is a schematic flow chart of a method for quantifying normalization operators in a neural network model according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a quantization apparatus for normalization operators in a neural network model according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the drawings in the present application. It should be understood that the embodiments described below with reference to the drawings are exemplary descriptions for explaining the technical solutions of the embodiments of the present application, and the technical solutions of the embodiments of the present application are not limited.
As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and "comprising," when used in this specification, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, all of which may be included in the present specification. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates that at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
First, several terms related to the present application are described and explained:
(1) Symmetric quantization: will range from [ X ] min ,X max ]Floating point number X is mapped to a range of Q by quantization coefficients min ,Q max ]Fixed point number Q in, as shown in equation 1:
Q=X*S (1)
the quantization parameter includes a scaling factor S, which is the minimum scale of different floating point numbers X quantized to the same fixed point number Q. It can be appreciated that for symmetric quantization, after the scaling factor S in the quantization parameter is determined, the quantized fixed-point number Q can be obtained according to the input floating-point number X through the quantization parameter. It will be appreciated that the range of fixed point numbers Q corresponds to quantized data types (i.e., data types to which the fixed point numbers Q correspond), where the quantized data types include: int32, int16, int8, int4, uint32, uint16, uint8, or uint4, and the like. For example, the quantized data type is int8, i.e. the fixed point number Q is int8 type, then [ Q min ,Q max ]In particular [ -128, 127](i.e., [ -2n-1,2 n-1)]Wherein n=8). The floating point number X may be a statistical value obtained by counting historical training data. For example, in the history training process, X has a maximum statistical value of 10 and a minimum statistical value of-20, then [ X ] min ,X max ]Can be determined as [ -20,10]. In other embodiments, [ X ] min ,X max ]Or may be a preset empirical value.
For symmetric quantization, equation (2) shows oneFormula for obtaining quantization coefficient S through int type quantization, wherein floating point number X ranges from [ X ] min ,X max ]The fixed point number Q is in the range of [ Q min ,Q max ]:
It should be noted that, in other embodiments of the present application, the quantization coefficient S may be calculated by other modification formulas in addition to the calculation method shown in formula (2), and no limitation is made here.
(2) Normalization operator
Neural network models typically involve the superposition of multiple layers of data, which, after being superimposed, can result in a change in the range of data intervals. This change may lead to a slow gradient drop in the training of the neural network model, resulting in a slow convergence rate in the training of the neural network model, or to a reduced accuracy of the neural network model, or to a non-uniform range of intervals of data, or the like.
To avoid this, it is often necessary to normalize the data in the neural network model. When data in the neural network model needs to be normalized, a normalization layer is added in the neural network model, and a normalization operator is deployed in the normalization layer, so that the data in the neural network model needs to be normalized through the normalization operator.
In the embodiment of the application, the normalization operator can be InstanceNorm (IN) operator, layerNorm (LN) operator, groupNorm (GN) operator, switchableNorm (SN) operator and the like.
The data structure input into the normalization operator may be [ Z, C, H, W ], where Z is the number of samples (e.g., images), C is the number of channels, H is the height of the samples, and W is the width of the samples. The InstanceNorm operator may normalize the HW channels of the data. The LayerNorm operator can normalize the CHW channel. The GroupNorm operator groups the channels C and then normalizes them. The data were normalized by the SwitchableNorm normalization method combined with BN, LN, IN. Then, the InstanceNorm operator, layerNorm operator, groupNorm operator and SwitchableNorm operator can scale and translate the respective normalized processing results according to the scaling coefficients and translation coefficients, and then output scaled and translated data.
In some embodiments of the present application, input data inputs represent relevant feature data of characters, audio, images, etc. to be processed by a normalization operator, normalization results normalization represents relevant feature normalization data corresponding to characters, video, audio, images, etc. after being processed by the normalization operator, and output data outputs are relevant feature result data corresponding to characters, video, audio, images, etc. after being processed by the normalization operator in a neural network model, where the feature result data is data obtained after scaling and shifting feature normalization data.
It can be appreciated that the technical scheme of the application can be applied to any scene in which data such as characters, audio, images and the like are required to be processed through a neural network model.
In the related art, an InstanceNorm operator normalizes HW channels of input data; layerNorm normalizes CHW channels; grouping the channels C by GroupNorm, and then normalizing; normalization method of SwitchableNorm binding BN, LN, IN. And scaling and translating the normalized result according to the scaling parameter gamma and the translation parameter beta. The normalization operator calculates the mean and variance at each forward reasoning, so the accuracy in calculating the mean and variance is also considered during quantization.
The related art schemes all need to calculate the mean and variance, so that there is an error.
Edge computing devices typically support only fixed-point operations, and neural network models, when adapted to such devices, perform quantization (i.e., mapping floating-point parameters in the model to integer parameters, and determining quantization coefficients for each layer of activation response) first, and then calculate the quantization coefficients in an off-line stage.
In order to better understand the present solution, an application scenario of the technical solution of the present application will be described first.
The application provides a quantization method, a quantization device, electronic equipment, a computer readable storage medium and a computer program product of a normalization operator in a neural network model, and aims to solve the technical problems in the prior art.
The technical solutions of the embodiments of the present application and technical effects produced by the technical solutions of the present application are described below by describing several exemplary embodiments. It should be noted that the following embodiments may be referred to, or combined with each other, and the description will not be repeated for the same terms, similar features, similar implementation steps, and the like in different embodiments.
Fig. 1 shows a scene diagram of a terminal 100 for recognizing an acquired face image through a face recognition model. As shown in fig. 1, the terminal 100 is deployed with a face recognition model, wherein the face recognition model is obtained through training in a floating point domain by a server 200. A face recognition model is taken as an example of the neural network model in fig. 1. In other embodiments, the neural network model may also be other models, such as a speech recognition model, a text recognition model.
It should be noted that the terminal 100 includes, but is not limited to, one of a mobile phone, a tablet computer, a smart screen, a wearable device (e.g., a wristwatch, a bracelet, a helmet, an earphone, etc.), a vehicle-mounted device, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personalcomputser (UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA), etc. The server 200 may be a single server or a server cluster composed of a plurality of servers. A mobile phone is taken as an example of the terminal 100 in fig. 1. Because the face recognition model belongs to a resource intensive algorithm, the terminal 100 has larger calculation amount and slower calculation speed when the face recognition model with a built-in floating point domain is used for processing the face image data. Therefore, it is generally necessary to quantize the face recognition model to reduce the amount of computation and increase the computation speed.
In the embodiment of the present application, the face image data may be obtained by shooting the user by the terminal 100, may be pre-stored, or may be transmitted to the terminal 100 by another device, which is not limited.
In some embodiments, the server 200 quantizes the face recognition model trained in the floating point domain, obtains a quantized face recognition model, and then deploys the quantized face recognition model to the terminal 100.
In other implementations, the face recognition model to be trained is deployed in the terminal 100, the face recognition model to be trained is trained in the floating domain by the terminal 100, and then the trained face recognition model is quantized to obtain the quantized face recognition model. It can be appreciated that when the terminal 100 uses the quantized face recognition model, high-precision floating point calculation can be converted into fixed point calculation, so that the calculation amount can be reduced and the calculation speed can be increased.
In an embodiment of the present application, the face recognition model in fig. 1 may include a normalization layer and other layers (e.g., an input layer, an activation layer, an output layer, etc.), where the normalization layer includes a normalization operator, such as an InstanceNorm operator, a LayerNorm operator, a groupnum operator, a switchblenorm operator. The normalization operator in the normalization layer is used for normalizing the feature data (i.e. the input data of the normalization layer) associated with the input face image data. Where the input data inputs of the normalization layer are the feature data output by a layer preceding the normalization layer (e.g., the activation layer).
The embodiment of the application adopts symmetrical quantization, and the maximum and minimum values of each layer of parameters and the activation value are required to be counted in an off-line stage to calculate the quantization parameter S. The general formula for the normalized norm calculation first is:
outputs=γ*norm+β (6)
wherein the input data is a feature vector having a plurality of dimensions, and the feature vector of dimension i is expressed as x i The length of the eigenvector is N, delta represents the variance of floating point input data, epsilon represents the minimum floating point number (1 e-8) larger than 0, mu represents the average value of floating point input data, norm represents the normalization result, gamma and beta represent scaling parameters and translation parameters respectively, output is floating point output data, represents layer normalization layerNorm result, epsilon is ignored, and combining formulas 3,4 and 5 can obtain formula 7:
the following describes the process steps of the quantization stage and fixed point forward operation:
assuming the quantization bit width is b, then Q in equation 2 max =2 b-1 -1,Q min =-2 b-1
Offline quantization step:
step 1: respectively counting the maximum value and the minimum value of floating point output data output in the normalization operator, and the maximum value and the minimum value of floating point scaling parameters gamma;
step2: as can be seen from equation 7, the maximum and minimum values of the normalized result norm are respectivelyBringing into equation 2, it is known that the quantized coefficients of the normalization result +.>Since the quantization bit width b and the length N of the feature vector are known, the quantization coefficient norm of the normalization result of the embodiment of the present application scale Can be easily obtained.
Further, the embodiment of the present application calculates a power of 2 nearest to N, exp=ceil (log) 2 N) such that equation 7 can be transformed into:
after such a change, in order to change the operation of dividing by N in the formula 5 to the right shift operation, the shift operation replaces the division operation, and thus the performance can be greatly improved. Considering that the second shift number exp needs to be an even number, exp=exp+ (exp & 1) is required.
In the fixed point operation, equation 8 is multiplied by norm scale This is further reduced to equation 9, which reduces the operation of multiplying the quantized coefficients:
note that in equation 9The fixed point operation result of (2) is in the format of Q31.
The second shift number exp obtained by Step2 needs to be stored for use in the on-line forward operation.
Step3: according to the statistic value of the floating point scaling parameter gamma, the quantization coefficient gamma of the scaling parameter can be obtained by being brought into the formula (2) scale Further updating the floating point scaling parameter gamma to the fixed point scaling parameter Q γ :Q γ =γ*γ scale
To add or subtract on the same metric, the quantized coefficient β of the translation parameter scale =norm scalescale So fixed point translation parameter Q β =β*β scale
Step 4: formulas (10), (11) and (12) can be obtained according to formula (6):
in the above formula, the first shift number N outputs Not exceeding quantization bit width b, and outputting quantization coefficient output of data scale The maximum value and the minimum value of floating point output data can be brought into the formula 2 to obtain, and the same is true, the quantization coefficient output of the normalization result scale Can be obtained by substituting the maximum value and the minimum value of the floating point normalization result into the formula 2, and is based on the output scale 、γ scale 、norm scale The result of equation (12) can be obtained by further shifting the number N by the first shift number outputs Shifting to obtain quantized coefficient M outputs
The fixed point calculation process of the embodiment of the application is as follows:
step 1: for fixed point input data Q inputs Taking the average value to obtain a fixed-point average value Q μ Based on fixed-point input data Q inputs And fixed point mean value Q μ Equation (13) is implemented to yield the result of Q31:
note that > exp in equation (13) represents shifting by the second shift number exp.
Further, to Q alpha Shifting 16 bits to obtain the result of Q15, the formula is: q (Q) alpha =Q alpha >>16。
Step2: the modification is performed on equation 9:
Q norm_tmp =(Q input -Q μ )*Q alpha (14)
wherein Q is norm_tmp Representing an initial fixed point normalization result, the bit width of the initial fixed point normalization result being greater than the quantized bit width b by Q alpha The occupied bit width is 15 bits.
In order not to lose accuracy, the 15 bits of the move of this step are incorporated into equation 9 It should be noted that the division here is +.>That is, the right shift operation, so that the 15-bit widths of formula (14) requiring the right shift are combined, thus being changed to +.> So finally only Q is needed norm_tmp Right shift (16+exp/2-b) to obtain fixed point normalization result Q norm
Step3: calculating the final result, i.e. the fixed point output data Q, according to equation (15) outputs
Q outputs =(Q γ *Q norm +Q β )*M outputs /(2^N outputs ) (15)
The embodiment of the application provides a quantization method of a normalization operator in a neural network model, as shown in fig. 2, comprising the following steps:
s101, determining fixed-point input data Q of the normalization operator inputs Quantization bit width b, floating point scaling parameter gamma, fixed point scaling parameter Q γ Fixed point translation parameter Q β Floating point output data, first shift number N outputs A second shift number exp;
s102, according to the floating point output data output, the quantization bit width b, the floating point scaling parameter gamma and the first shift number N outputs Obtaining a first quantization coefficient M outputs
S103, according to the fixed point input data Q inputs A second shift number exp and a quantization bit width b to obtain a fixed-point normalization result Q norm
S104, scaling parameters Q according to fixed points γ Fixed point translation parameter Q β Fixed point normalization result Q norm Obtaining initial fixed-point output data; according to the first quantization coefficient M outputs Quantizing the initial fixed point output data, and using the first shift number N as the quantization result outputs Shift to obtain fixed-point output data Q outputs
Wherein the first shift number N outputs Not exceeding the quantization bit width b, the input data is a feature vector for representing preset information, the second shift number is even and 2 exp Is the power of 2 closest to the length N of the eigenvector.
The method of the embodiment of the application does not need to totally calculate the mean value and the variance, avoids the error caused by the traditional mean value and variance calculation method, adopts a shifting mode based on the first shifting number and the second shifting number for partial division operation, greatly reduces the operation amount and improves the operation efficiency.
Based on the above embodiments, as an alternative embodiment, the floating point output data output, the quantization bit width b, the floating point scaling parameter γ, and the first shift number N outputs Obtaining a first quantization coefficient M outputs Comprising:
s201, determining the second quantization coefficient output according to the first difference between the maximum value and the minimum value of the floating point output data output and the quantization bit width b scale
Specifically, the embodiment of the application brings the maximum value, the minimum value and the quantization bit width b of the floating point output data into the formula (2) to obtain the quantization coefficient of the floating point output data, namely the second quantization coefficient output scale
S202, determining a third quantization coefficient gamma according to the second difference between the maximum value and the minimum value of the floating point scaling parameter gamma and the quantization bit width b scale
Specifically, the embodiment of the application brings the maximum value and the minimum value of the floating point scaling parameter gamma and the quantization bit width b into the formula (2) to obtain the quantization coefficient of the scaling parameter, namely the third quantization coefficient gamma scale
S203, determining a fourth quantization coefficient norm according to the length N and the quantization bit width b scale
According to the formulaA quantized coefficient of the normalization result, that is, a fourth quantized coefficient, can be obtained.
S204, according to the second quantization coefficient output scale Third quantization coefficient gamma scale And a fourth quantization coefficient norm scale A fifth quantization coefficient is determined.
Specifically, the formula of the fifth quantization coefficient may be expressed as;i.e., to the left of equation (12).
S205, using the first shift number N to shift the fifth quantization coefficient outputs Shifting to obtain the first quantized coefficient M outputs
Based on the above embodiments, as an alternative embodiment, the data Q is input according to a fixed point inputs A second shift number exp and a quantization bit width b to obtain a fixed-point normalization result Q norm Comprising:
s301, input data Q to fixed point inputs Taking the average value to obtain a fixed-point average value Q μ
S302, determining fixed-point input data Q inputs And fixed point mean value Q μ The degree of difference between the square root of the fixed point variance and the reciprocal of the square root Q is determined by shifting the degree of difference by the second shift number exp alpha
In particular, embodiments of the present application may be according toDetermining fixed-point input data Q inputs And fixed point mean value Q μ The difference degree between the two is shifted based on the second shift number exp, and the result after the shift is squared and is taken back to obtain the fixed point variance square root reciprocal Q alpha
S303, input data Q according to fixed point inputs Fixed point average value Q μ Fixed point variance reciprocal square root Q alpha Determining initial fixed point normalization result Q norm_tmp
The embodiment of the application can obtain the initialized normalization result Q according to the formula (14) norm_tmp
S304 according to Q alpha Normalization of the initial fixed point by the number of bits and the second shift number exp norm_tmp Bit number shifted to quantization bit width, and fixed point normalization result Q is determined norm
Note that Q norm_tmp Representing an initial fixed point normalization result, the bit width of the initial fixed point normalization result being greater than the quantized bit width b by Q alpha The occupied bit width is 15 bits.
In order not to lose accuracy, the 15 bits of the move of this step are incorporated into equation 9 It should be noted that the division here is +.>That is, the right shift operation, so that the 15-bit widths of formula (14) requiring the right shift are combined, thus being changed to +.> So finally only Q is needed norm_tmp Right shift (16+exp/2-b) to obtain fixed point normalization result Q norm
Based on the above embodiments, as an alternative embodiment, the fixed point scaling parameter Q γ Comprising:
s401, determining a third quantization coefficient gamma according to a third difference between the maximum value and the minimum value of the floating point scaling parameter gamma and the quantization bit width b scale
S402, scaling parameter gamma and third quantization coefficient gamma according to floating point scale Determining a fixed point scaling parameter Q γ
Specifically, according to the statistical value of the floating point scaling parameter gamma, the embodiment of the application is brought into the formula (2) to obtain the quantization coefficient gamma of the scaling parameter scale Further updating the floating point scaling parameter gamma to the fixed point scaling parameter Q γ :Q γ =γ*γ scale
On the basis of the above embodiments, as an alternative embodiment, the fixed point translation parameter Q is determined β Comprising:
s501, determining a quantization coefficient norm of the normalization result according to the length N and the quantization bit width b scale
Specifically, the embodiment of the application is according to the formulaObtaining quantized coefficients of normalized result, i.e. fourth quantized coefficients norm scale
S502, according to the quantized coefficient norm scale And quantization coefficient gamma scale Determining quantization coefficients beta of a translation parameter scale
Specifically, according to formula beta scale =norm scalescale Obtaining quantized coefficients of translation parameters, i.e. sixth quantized coefficients beta scale
S503, determiningDetermining a floating point translation parameter beta according to the floating point translation parameter beta and a quantization coefficient beta scale Determining a fixed point translation parameter Q β
Specifically, according to formula Q β =β*β scale Determining a fixed point translation parameter Q β
Based on the above embodiments, as an alternative embodiment, the method uses the fixed-point input data Q inputs Fixed point average value Q μ Fixed point variance reciprocal square root Q alpha Determining initial fixed point normalization result Q norm_tmp Comprising:
determining fixed-point input data Q inputs And fixed point mean value Q μ A fourth difference value;
inverse square root of the fourth difference and fixed point variance Q alpha As the product of the initial fixed point normalization result Q norm_tmp
The embodiment of the application provides a quantization device of a normalization operator in a neural network model, as shown in fig. 3, the quantization device of the normalization operator in the neural network model can comprise: a preparation module 301, a quantization coefficient determination module 302, a fixed point normalization module 303, and an output module 304, wherein,
a preparation module 301 for determining fixed-point input data Q of the normalization operator inputs Quantization bit width b, floating point scaling parameter gamma, fixed point scaling parameter Q γ Fixed point translation parameter Q β Floating point output data, first shift number N outputs A second shift number exp;
a quantization coefficient determination module 302 for determining a quantization coefficient according to the floating point output data output, the quantization bit width b, the floating point scaling parameter gamma and the first shift number N outputs Obtaining a first quantization coefficient M outputs
A fixed point normalization module 303 for inputting data Q according to fixed point inputs A second shift number exp and a quantization bit width b to obtain a fixed-point normalization result Q norm
An output module 304 for scaling the parameter Q according to the fixed point γ Fixed pointTranslation parameter Q β Fixed point normalization result Q norm Obtaining initial fixed-point output data; according to the first quantization coefficient M outputs Quantizing the initial fixed point output data, and using the first shift number N as the quantization result outputs Shift to obtain fixed-point output data Q outputs
Wherein the first shift number N outputs Not exceeding the quantization bit width b, the input data is a feature vector for representing preset information, the second shift number is even and 2 exp Is the power of 2 closest to the length N of the eigenvector.
The device of the embodiment of the present application may perform the method provided by the embodiment of the present application, and its implementation principle is similar, and actions performed by each module in the device of the embodiment of the present application correspond to steps in the method of the embodiment of the present application, and detailed functional descriptions of each module of the device may be referred to the descriptions in the corresponding methods shown in the foregoing, which are not repeated herein.
The embodiment of the application provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to realize the steps of a quantization method of a normalization operator in a neural network model, and compared with the related technology, the method can realize the steps of the quantization method of the normalization operator in the neural network model: the embodiment of the application does not need to totally calculate the mean value and the variance, avoids errors caused by the traditional mean value and variance calculation method, adopts a shifting mode based on the first shifting number and the second shifting number for partial division operation, greatly reduces the operation amount and improves the operation efficiency.
In an alternative embodiment, an electronic device is provided, as shown in fig. 4, the electronic device 4000 shown in fig. 4 includes: a processor 4001 and a memory 4003. Wherein the processor 4001 is coupled to the memory 4003, such as via a bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004, the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data, etc. It should be noted that, in practical applications, the transceiver 4004 is not limited to one, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The processor 4001 may be a CPU (Central Procesing Unit, central processing unit), general purpose processor, DSP (Digital Signal Procesor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 4001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.
Bus 4002 may include a path to transfer information between the aforementioned components. Bus 4002 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The bus 4002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 4, but not only one bus or one type of bus.
Memory 4003 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Acces Memory, random access Memory) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically Erasable Programmable Read Only Memory ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer.
The memory 4003 is used for storing a computer program for executing an embodiment of the present application, and is controlled to be executed by the processor 4001. The processor 4001 is configured to execute a computer program stored in the memory 4003 to realize the steps shown in the foregoing method embodiment.
Embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the foregoing method embodiments and corresponding content.
The embodiment of the application also provides a computer program product, which comprises a computer program, wherein the computer program can realize the steps and corresponding contents of the embodiment of the method when being executed by a processor.
The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such that the embodiments of the application described herein may be implemented in other sequences than those illustrated or otherwise described.
It should be understood that, although various operation steps are indicated by arrows in the flowcharts of the embodiments of the present application, the order in which these steps are implemented is not limited to the order indicated by the arrows. In some implementations of embodiments of the application, the implementation steps in the flowcharts may be performed in other orders as desired, unless explicitly stated herein. Furthermore, some or all of the steps in the flowcharts may include multiple sub-steps or multiple stages based on the actual implementation scenario. Some or all of these sub-steps or phases may be performed at the same time, or each of these sub-steps or phases may be performed at different times, respectively. In the case of different execution time, the execution sequence of the sub-steps or stages can be flexibly configured according to the requirement, which is not limited by the embodiment of the present application.
The foregoing is merely an optional implementation manner of some of the implementation scenarios of the present application, and it should be noted that, for those skilled in the art, other similar implementation manners based on the technical ideas of the present application are adopted without departing from the technical ideas of the scheme of the present application, and the implementation manner is also within the protection scope of the embodiments of the present application.

Claims (10)

1. A method for quantifying a normalization operator in a neural network model, comprising:
determining fixed point input data Q of the normalization operator inputs Quantization bit width b, floating point scaling parameter gamma, fixed point scaling parameter Q γ Fixed point translation parameter Q β Floating point output data, first shift number N outputs A second shift number exp;
according to floating point output data output, quantized bit width b, floating point scaling parameter gamma and first shift number N outputs Obtaining a first quantization coefficient M outputs
According to fixed-point input data Q inputs A second shift number exp and a quantization bit width b to obtain a fixed-point normalization result Q norm
Scaling parameter Q according to a fixed point γ Fixed point translation parameter Q β Fixed point normalization result Q norm Obtaining initial fixed-point output data; according to the first quantization coefficient M outputs Quantizing the initial fixed point output data, and using the first shift number N as the quantization result outputs Shift to obtain fixed-point output data Q outputs
Wherein the first shift number N outputs Not exceeding the quantization bit width b, the input data is a feature vector for representing preset information, the second shift number is even and 2 exp Is the power of 2 closest to the length N of the eigenvector.
2. The method of claim 1, wherein the floating point output data output, the quantization bit width b, the floating point scaling parameter γ, and the first shift number N outputs Obtaining a first quantization coefficient M outputs Comprising:
determining a second quantization coefficient output according to the first difference between the maximum value and the minimum value of the floating point output data output and the quantization bit width b scale
Determining a third quantization coefficient gamma according to the second difference between the maximum value and the minimum value of the floating point scaling parameter gamma and the quantization bit width b scale
Determining a fourth quantization coefficient norm based on the length N and the quantization bit width b scale
According to the second quantized coefficients output scale Third quantization coefficient gamma scale And a fourth quantization coefficient norm scale Determining a fifth quantization coefficient;
the fifth quantization coefficient is shifted by the first shift number N outputs Shifting to obtain the first quantized coefficient M outputs
3. The method according to claim 1, wherein the input data Q is based on a fixed point inputs A second shift number exp and a quantization bit width b to obtain a fixed-point normalization result Q norm Comprising:
for fixed point input data Q inputs Taking the average value to obtain a fixed-point average value Q μ
Determining fixed-point input data Q inputs And fixed point mean value Q μ The degree of difference between the square root of the fixed point variance and the reciprocal of the square root Q is determined by shifting the degree of difference by the second shift number exp alpha
According to fixed-point input data Q inputs Fixed point average value Q μ Fixed point variance reciprocal square root Q alpha Determining initial fixed point normalization result Q norm_tmp
According to Q alpha Normalization of the initial fixed point by the number of bits and the second shift number exp norm_tmp Bit number shifted to quantization bit width, and fixed point normalization result Q is determined norm
4. The method of claim 1, wherein the determiningFixed point scaling parameter Q γ Comprising:
determining a third quantization coefficient gamma according to a third difference between the maximum value and the minimum value of the floating point scaling parameter gamma and the quantization bit width b scale
According to floating point scaling parameter gamma and third quantization coefficient gamma scale Determining a fixed point scaling parameter Q γ
5. The method of claim 4, wherein the determining the setpoint translation parameter Q β Comprising:
determining a fourth quantization coefficient norm based on the length N and the quantization bit width b scale
According to quantized coefficients norm scale And quantization coefficient gamma scale Determining a sixth quantization coefficient beta scale
Determining a floating point translation parameter beta, and quantizing the coefficient beta according to the floating point translation parameter beta and the quantized coefficient beta scale Determining a fixed point translation parameter Q β
6. A method according to claim 3, wherein the input data Q is based on fixed point input data Q inputs Fixed point average value Q μ Fixed point variance reciprocal square root Q alpha Determining initial fixed point normalization result Q norm_tmp Comprising:
determining fixed-point input data Q inputs And fixed point mean value Q μ A fourth difference value;
inverse square root of the fourth difference and fixed point variance Q alpha As the product of the initial fixed point normalization result Q norm_tmp
7. The method according to any one of claims 1-6, wherein the preset information is any one of image, text, audio, and environmental information.
8. A quantization apparatus for normalization operators in a neural network model, comprising:
a preparation module for determining the fixed-point input data Q of the normalization operator inputs Quantization bit width b, floating point scaling parameter gamma, fixed point scaling parameter Q γ Fixed point translation parameter Q β Floating point output data, first shift number N outputs A second shift number exp;
a quantization coefficient determining module for determining a quantization coefficient according to the floating point output data output, the quantization bit width b, the floating point scaling parameter gamma and the first shift number N outputs Obtaining a first quantization coefficient M outputs
A fixed point normalization module for inputting data Q according to fixed point inputs A second shift number exp and a quantization bit width b to obtain a fixed-point normalization result Q norm
An output module for scaling the parameter Q according to the fixed point γ Fixed point translation parameter Q β Fixed point normalization result Q norm Obtaining initial fixed-point output data; according to the first quantization coefficient M outputs Quantizing the initial fixed point output data, and using the first shift number N as the quantization result outputs Shift to obtain fixed-point output data Q outputs
Wherein the first shift number N outputs Not exceeding the quantization bit width b, the input data is a feature vector for representing preset information, the second shift number is even and 2 exp Is the power of 2 closest to the length N of the eigenvector.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to carry out the steps of the method according to any one of claims 1-7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1-7.
CN202310955099.XA 2023-07-31 2023-07-31 Quantification method and device of normalization operator in neural network model and electronic equipment Pending CN116956989A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310955099.XA CN116956989A (en) 2023-07-31 2023-07-31 Quantification method and device of normalization operator in neural network model and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310955099.XA CN116956989A (en) 2023-07-31 2023-07-31 Quantification method and device of normalization operator in neural network model and electronic equipment

Publications (1)

Publication Number Publication Date
CN116956989A true CN116956989A (en) 2023-10-27

Family

ID=88446130

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310955099.XA Pending CN116956989A (en) 2023-07-31 2023-07-31 Quantification method and device of normalization operator in neural network model and electronic equipment

Country Status (1)

Country Link
CN (1) CN116956989A (en)

Similar Documents

Publication Publication Date Title
WO2019238029A1 (en) Convolutional neural network system, and method for quantifying convolutional neural network
CN110929865B (en) Network quantification method, service processing method and related product
EP4087239A1 (en) Image compression method and apparatus
CN110036384B (en) Information processing apparatus, information processing method, and computer program
JP7231731B2 (en) Adaptive quantization method and apparatus, device, medium
US20220004884A1 (en) Convolutional Neural Network Computing Acceleration Method and Apparatus, Device, and Medium
CN110888623B (en) Data conversion method, multiplier, adder, terminal device and storage medium
CN111240746A (en) Floating point data inverse quantization and quantization method and equipment
CN113780523A (en) Image processing method, image processing device, terminal equipment and storage medium
CN112200299B (en) Neural network computing device, data processing method and device
CN107220025B (en) Apparatus for processing multiply-add operation and method for processing multiply-add operation
CN112561050B (en) Neural network model training method and device
CN116956989A (en) Quantification method and device of normalization operator in neural network model and electronic equipment
CN111767993A (en) INT8 quantization method, system, device and storage medium for convolutional neural network
CN116306709A (en) Data processing method, medium and electronic equipment
JP2021033994A (en) Text processing method, apparatus, device and computer readable storage medium
CN112418388A (en) Method and device for realizing deep convolutional neural network processing
WO2019205064A1 (en) Neural network acceleration apparatus and method
CN111930670B (en) Heterogeneous intelligent processing quantization device, quantization method, electronic device and storage medium
KR20230076641A (en) Apparatus and method for floating-point operations
CN110574024A (en) Information processing apparatus, information processing method, and computer program
JP2021076900A (en) Data processing apparatus and operation method thereof, and program
CN116341572A (en) Data processing method, device, medium and electronic equipment
TWI776090B (en) Computer-readable storage medium, computer-implemented method and compute logic section
US20210334635A1 (en) Neural network accelerator configured to perform operation on logarithm domain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination