CN116956989A

CN116956989A - Quantification method and device of normalization operator in neural network model and electronic equipment

Info

Publication number: CN116956989A
Application number: CN202310955099.XA
Authority: CN
Inventors: 许礼武; 余宗桥; 黄敦博; 周生伟
Original assignee: ARM Technology China Co Ltd
Current assignee: ARM Technology China Co Ltd
Priority date: 2023-07-31
Filing date: 2023-07-31
Publication date: 2023-10-27

Abstract

The embodiment of the application provides a quantization method and device of a normalization operator in a neural network model, electronic equipment and a computer readable storage medium, and relates to the field of neural networks. The method comprises the following steps: obtaining a first quantization coefficient according to the floating point output data, the quantization bit width, the floating point scaling parameter and the first shift number; obtaining a fixed-point normalization result according to the fixed-point input data, the second shifting number and the quantization bit width; obtaining initial fixed-point output data according to fixed-point scaling parameters, fixed-point translation parameters and fixed-point normalization results; and quantizing the initial fixed-point output data according to the first quantization coefficient, and shifting the quantization result by a first shifting number to obtain fixed-point output data. According to the embodiment of the application, the mean value and the variance are not required to be calculated completely, so that errors caused by a traditional mean value and variance calculation method are avoided, a shifting mode based on the first shifting number and the second shifting number is adopted for partial division operation, the operation amount is greatly reduced, and the operation efficiency is improved.

Description

Quantification method and device of normalization operator in neural network model and electronic equipment

Technical Field

The application relates to the technical field of neural networks, in particular to a quantization method and device of normalization operators in a neural network model and electronic equipment.

Background

The field of artificial intelligence technology (Artificial Intelligence, AI) is becoming more and more widely used in people's production and life. The data of words, video, audio, images, etc. can be processed by the neural network model, thereby obtaining the required result. For example, when a user is in a mobile phone, a face recognition model built in the mobile phone can process a shot face image so as to recognize the identity of the person; for another example, a text recognition model built in the mobile phone can process a certain section of speech on the webpage, so that the provenance of the text is obtained; also for example, an audio matching model built into the handset may process the audio, identify sounds/songs, and so on.

Because the neural network model is a resource intensive algorithm, the calculation cost and the memory occupation amount of the neural network model are large, and the neural network model built in the electronic equipment is usually a quantized neural network model. Neural network model quantization generally refers to converting high-precision floating point calculations in a neural network model into fixed point calculations, resulting in a fixed point calculated neural network model.

The normalization operator is an operator which can be applied to the neural network model to process the characteristic data in the operation process of the neural network model. For example, for the face recognition model, the normalization operator in the normalization layer may process the feature data associated with the face image input in the previous layer, so as to obtain processed new feature data associated with the face image.

Because the root formula calculation in the normalization operator is floating point calculation, when the normalization operator performs normalization processing on fixed point data in the neural network, the fixed point data is inversely quantized and then floating point operation is performed, and then the calculation result is quantized and then fixed point data is output. The calculation method has the advantages of large calculation amount, high calculation cost and low calculation speed.

Disclosure of Invention

Embodiments of the present application provide a method, apparatus, electronic device, computer-readable storage medium, and computer program product for quantifying normalization operators in neural network models, which can solve the above-mentioned problems in the prior art. The technical scheme is as follows:

according to an aspect of the embodiment of the present application, there is provided a quantization method of a normalization operator in a neural network model, the method including:

determining fixed point input data Q of the normalization operator _inputs Quantization bit width b, floating point scaling parameter gamma, fixed point scaling parameter Q _γ Fixed point translation parameter Q _β Floating point output data, first shift number N _outputs A second shift number exp;

according to floating point output data output, quantized bit width b, floating point scaling parameter gamma and first shift number N _outputs Obtaining a first quantization coefficient M _outputs ；

According to fixed-point input data Q _inputs A second shift number exp and a quantization bit width b to obtain a fixed-point normalization result Q _norm ；

Scaling parameter Q according to a fixed point _γ Fixed point translation parameter Q _β Fixed point normalization result Q _norm Obtaining initial fixed-point output data; according to the first quantization coefficient M _outputs Quantizing the initial fixed point output data, and using the first shift number N as the quantization result _outputs Shift to obtain fixed-point output data Q _outputs ；

Wherein the first shift number N _outputs Not exceeding the quantization bit width b, the input data is a feature vector for representing preset information, the second shift number is even and 2 ^exp Is the power of 2 closest to the length N of the eigenvector.

Based on the above embodiments, as an alternative embodiment, the floating point output data output, the quantization bit width b, the floating point scaling parameter γ, and the first shift number N _outputs ObtainingObtaining a first quantized coefficient M _outputs Comprising:

determining a second quantization coefficient output according to the first difference between the maximum value and the minimum value of the floating point output data output and the quantization bit width b _scale ；

Determining a third quantization coefficient gamma according to the second difference between the maximum value and the minimum value of the floating point scaling parameter gamma and the quantization bit width b _scale ；

Determining a fourth quantization coefficient norm based on the length N and the quantization bit width b _scale ；

According to the second quantized coefficients output _scale Third quantization coefficient gamma _scale And a fourth quantization coefficient norm _scale Determining a fifth quantization coefficient;

the fifth quantization coefficient is shifted by the first shift number N _outputs Shifting to obtain the first quantized coefficient M _outputs 。

Based on the above embodiment, as an alternative embodiment, the data Q is input according to a fixed point _inputs A second shift number exp and a quantization bit width b to obtain a fixed-point normalization result Q _norm Comprising:

for fixed point input data Q _inputs Taking the average value to obtain a fixed-point average value Q _μ ；

Determining fixed-point input data Q _inputs And fixed point mean value Q _μ The degree of difference between the square root of the fixed point variance and the reciprocal of the square root Q is determined by shifting the degree of difference by the second shift number exp _alpha ；

According to fixed-point input data Q _inputs Fixed point average value Q _μ Fixed point variance reciprocal square root Q _alpha Determining initial fixed point normalization result Q _{norm_tmp} ；

According to Q _alpha Normalization of the initial fixed point by the number of bits and the second shift number exp _{norm_tmp} Bit number shifted to quantization bit width, and fixed point normalization result Q is determined _norm 。

Based on the above embodiment, as an alternative embodiment, the fixed point scalingParameter Q _γ Is determined by the following means:

determining a third quantization coefficient gamma according to a third difference between the maximum value and the minimum value of the floating point scaling parameter gamma and the quantization bit width b _scale ；

According to floating point scaling parameter gamma and third quantization coefficient gamma _scale Determining a fixed point scaling parameter Q _γ 。

On the basis of the above embodiment, as an alternative embodiment, the fixed point translation parameter Q is determined _β Comprising:

According to quantized coefficients norm _scale And quantization coefficient gamma _scale Determining a sixth quantization coefficient beta _scale ；

Determining a floating point translation parameter beta, and quantizing the coefficient beta according to the floating point translation parameter beta and the quantized coefficient beta _scale Determining a fixed point translation parameter Q _β 。

Based on the above embodiment, as an alternative embodiment, the data Q is input according to a fixed point _inputs Fixed point average value Q _μ Fixed point variance reciprocal square root Q _alpha Determining initial fixed point normalization result Q _{norm_tmp} Comprising:

determining fixed-point input data Q _inputs And fixed point mean value Q _μ A fourth difference value;

inverse square root of the fourth difference and fixed point variance Q _alpha As the product of the initial fixed point normalization result Q _{norm_tmp} 。

On the basis of the above embodiment, as an alternative embodiment, the preset information is any one of image, text, audio, and environmental information.

According to another aspect of an embodiment of the present application, there is provided a quantization apparatus for a normalization operator in a neural network model, the apparatus including:

a preparation module for determining the fixed-point input data Q of the normalization operator _inputs Quantization bit width b, floating pointScaling parameter gamma, fixed point scaling parameter Q _γ Fixed point translation parameter Q _β Floating point output data, first shift number N _outputs A second shift number exp;

a quantization coefficient determining module for determining a quantization coefficient according to the floating point output data output, the quantization bit width b, the floating point scaling parameter gamma and the first shift number N _outputs Obtaining a first quantization coefficient M _outputs ；

A fixed point normalization module for inputting data Q according to fixed point _inputs A second shift number exp and a quantization bit width b to obtain a fixed-point normalization result Q _norm ；

An output module for scaling the parameter Q according to the fixed point _γ Fixed point translation parameter Q _β Fixed point normalization result Q _norm Obtaining initial fixed-point output data; according to the first quantization coefficient M _outputs Quantizing the initial fixed point output data, and using the first shift number N as the quantization result _outputs Shift to obtain fixed-point output data Q _outputs ；

According to another aspect of an embodiment of the present application, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory, the processor executing the computer program to implement the steps of the above method.

According to a further aspect of embodiments of the present application, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above method.

According to an aspect of an embodiment of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the above method.

The technical scheme provided by the embodiment of the application has the beneficial effects that:

the embodiment of the application does not need to totally calculate the mean value and the variance, avoids errors caused by the traditional mean value and variance calculation method, adopts a shifting mode based on the first shifting number and the second shifting number for partial division operation, greatly reduces the operation amount and improves the operation efficiency.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.

FIG. 1 is a schematic view of a scene provided by an embodiment of the present application;

FIG. 2 is a schematic flow chart of a method for quantifying normalization operators in a neural network model according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a quantization apparatus for normalization operators in a neural network model according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the drawings in the present application. It should be understood that the embodiments described below with reference to the drawings are exemplary descriptions for explaining the technical solutions of the embodiments of the present application, and the technical solutions of the embodiments of the present application are not limited.

As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and "comprising," when used in this specification, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, all of which may be included in the present specification. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates that at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

First, several terms related to the present application are described and explained:

(1) Symmetric quantization: will range from [ X ] _min ，X _max ]Floating point number X is mapped to a range of Q by quantization coefficients _min ，Q _max ]Fixed point number Q in, as shown in equation 1:

Q＝X*S (1)

the quantization parameter includes a scaling factor S, which is the minimum scale of different floating point numbers X quantized to the same fixed point number Q. It can be appreciated that for symmetric quantization, after the scaling factor S in the quantization parameter is determined, the quantized fixed-point number Q can be obtained according to the input floating-point number X through the quantization parameter. It will be appreciated that the range of fixed point numbers Q corresponds to quantized data types (i.e., data types to which the fixed point numbers Q correspond), where the quantized data types include: int32, int16, int8, int4, uint32, uint16, uint8, or uint4, and the like. For example, the quantized data type is int8, i.e. the fixed point number Q is int8 type, then [ Q _min ，Q _max ]In particular [ -128, 127](i.e., [ -2n-1,2 n-1)]Wherein n=8). The floating point number X may be a statistical value obtained by counting historical training data. For example, in the history training process, X has a maximum statistical value of 10 and a minimum statistical value of-20, then [ X ] _min ，X _max ]Can be determined as [ -20,10]. In other embodiments, [ X ] _min ，X _max ]Or may be a preset empirical value.

For symmetric quantization, equation (2) shows oneFormula for obtaining quantization coefficient S through int type quantization, wherein floating point number X ranges from [ X ] _min ，X _max ]The fixed point number Q is in the range of [ Q _min ，Q _max ]：

It should be noted that, in other embodiments of the present application, the quantization coefficient S may be calculated by other modification formulas in addition to the calculation method shown in formula (2), and no limitation is made here.

(2) Normalization operator

Neural network models typically involve the superposition of multiple layers of data, which, after being superimposed, can result in a change in the range of data intervals. This change may lead to a slow gradient drop in the training of the neural network model, resulting in a slow convergence rate in the training of the neural network model, or to a reduced accuracy of the neural network model, or to a non-uniform range of intervals of data, or the like.

To avoid this, it is often necessary to normalize the data in the neural network model. When data in the neural network model needs to be normalized, a normalization layer is added in the neural network model, and a normalization operator is deployed in the normalization layer, so that the data in the neural network model needs to be normalized through the normalization operator.

In the embodiment of the application, the normalization operator can be InstanceNorm (IN) operator, layerNorm (LN) operator, groupNorm (GN) operator, switchableNorm (SN) operator and the like.

The data structure input into the normalization operator may be [ Z, C, H, W ], where Z is the number of samples (e.g., images), C is the number of channels, H is the height of the samples, and W is the width of the samples. The InstanceNorm operator may normalize the HW channels of the data. The LayerNorm operator can normalize the CHW channel. The GroupNorm operator groups the channels C and then normalizes them. The data were normalized by the SwitchableNorm normalization method combined with BN, LN, IN. Then, the InstanceNorm operator, layerNorm operator, groupNorm operator and SwitchableNorm operator can scale and translate the respective normalized processing results according to the scaling coefficients and translation coefficients, and then output scaled and translated data.

In some embodiments of the present application, input data inputs represent relevant feature data of characters, audio, images, etc. to be processed by a normalization operator, normalization results normalization represents relevant feature normalization data corresponding to characters, video, audio, images, etc. after being processed by the normalization operator, and output data outputs are relevant feature result data corresponding to characters, video, audio, images, etc. after being processed by the normalization operator in a neural network model, where the feature result data is data obtained after scaling and shifting feature normalization data.

It can be appreciated that the technical scheme of the application can be applied to any scene in which data such as characters, audio, images and the like are required to be processed through a neural network model.

In the related art, an InstanceNorm operator normalizes HW channels of input data; layerNorm normalizes CHW channels; grouping the channels C by GroupNorm, and then normalizing; normalization method of SwitchableNorm binding BN, LN, IN. And scaling and translating the normalized result according to the scaling parameter gamma and the translation parameter beta. The normalization operator calculates the mean and variance at each forward reasoning, so the accuracy in calculating the mean and variance is also considered during quantization.

The related art schemes all need to calculate the mean and variance, so that there is an error.

Edge computing devices typically support only fixed-point operations, and neural network models, when adapted to such devices, perform quantization (i.e., mapping floating-point parameters in the model to integer parameters, and determining quantization coefficients for each layer of activation response) first, and then calculate the quantization coefficients in an off-line stage.

In order to better understand the present solution, an application scenario of the technical solution of the present application will be described first.

The application provides a quantization method, a quantization device, electronic equipment, a computer readable storage medium and a computer program product of a normalization operator in a neural network model, and aims to solve the technical problems in the prior art.

The technical solutions of the embodiments of the present application and technical effects produced by the technical solutions of the present application are described below by describing several exemplary embodiments. It should be noted that the following embodiments may be referred to, or combined with each other, and the description will not be repeated for the same terms, similar features, similar implementation steps, and the like in different embodiments.

Fig. 1 shows a scene diagram of a terminal 100 for recognizing an acquired face image through a face recognition model. As shown in fig. 1, the terminal 100 is deployed with a face recognition model, wherein the face recognition model is obtained through training in a floating point domain by a server 200. A face recognition model is taken as an example of the neural network model in fig. 1. In other embodiments, the neural network model may also be other models, such as a speech recognition model, a text recognition model.

It should be noted that the terminal 100 includes, but is not limited to, one of a mobile phone, a tablet computer, a smart screen, a wearable device (e.g., a wristwatch, a bracelet, a helmet, an earphone, etc.), a vehicle-mounted device, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personalcomputser (UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA), etc. The server 200 may be a single server or a server cluster composed of a plurality of servers. A mobile phone is taken as an example of the terminal 100 in fig. 1. Because the face recognition model belongs to a resource intensive algorithm, the terminal 100 has larger calculation amount and slower calculation speed when the face recognition model with a built-in floating point domain is used for processing the face image data. Therefore, it is generally necessary to quantize the face recognition model to reduce the amount of computation and increase the computation speed.

In the embodiment of the present application, the face image data may be obtained by shooting the user by the terminal 100, may be pre-stored, or may be transmitted to the terminal 100 by another device, which is not limited.

In some embodiments, the server 200 quantizes the face recognition model trained in the floating point domain, obtains a quantized face recognition model, and then deploys the quantized face recognition model to the terminal 100.

In other implementations, the face recognition model to be trained is deployed in the terminal 100, the face recognition model to be trained is trained in the floating domain by the terminal 100, and then the trained face recognition model is quantized to obtain the quantized face recognition model. It can be appreciated that when the terminal 100 uses the quantized face recognition model, high-precision floating point calculation can be converted into fixed point calculation, so that the calculation amount can be reduced and the calculation speed can be increased.

In an embodiment of the present application, the face recognition model in fig. 1 may include a normalization layer and other layers (e.g., an input layer, an activation layer, an output layer, etc.), where the normalization layer includes a normalization operator, such as an InstanceNorm operator, a LayerNorm operator, a groupnum operator, a switchblenorm operator. The normalization operator in the normalization layer is used for normalizing the feature data (i.e. the input data of the normalization layer) associated with the input face image data. Where the input data inputs of the normalization layer are the feature data output by a layer preceding the normalization layer (e.g., the activation layer).

The embodiment of the application adopts symmetrical quantization, and the maximum and minimum values of each layer of parameters and the activation value are required to be counted in an off-line stage to calculate the quantization parameter S. The general formula for the normalized norm calculation first is:

outputs＝γ*norm+β (6)

wherein the input data is a feature vector having a plurality of dimensions, and the feature vector of dimension i is expressed as x _i The length of the eigenvector is N, delta represents the variance of floating point input data, epsilon represents the minimum floating point number (1 e-8) larger than 0, mu represents the average value of floating point input data, norm represents the normalization result, gamma and beta represent scaling parameters and translation parameters respectively, output is floating point output data, represents layer normalization layerNorm result, epsilon is ignored, and combining formulas 3,4 and 5 can obtain formula 7:

the following describes the process steps of the quantization stage and fixed point forward operation:

assuming the quantization bit width is b, then Q in equation 2 _max ＝2 ^b-1 -1,Q _min ＝-2 ^b-1 ，

Offline quantization step:

step 1: respectively counting the maximum value and the minimum value of floating point output data output in the normalization operator, and the maximum value and the minimum value of floating point scaling parameters gamma;

step2: as can be seen from equation 7, the maximum and minimum values of the normalized result norm are respectivelyBringing into equation 2, it is known that the quantized coefficients of the normalization result +.>Since the quantization bit width b and the length N of the feature vector are known, the quantization coefficient norm of the normalization result of the embodiment of the present application _scale Can be easily obtained.

Further, the embodiment of the present application calculates a power of 2 nearest to N, exp=ceil (log) ₂ N) such that equation 7 can be transformed into:

after such a change, in order to change the operation of dividing by N in the formula 5 to the right shift operation, the shift operation replaces the division operation, and thus the performance can be greatly improved. Considering that the second shift number exp needs to be an even number, exp=exp+ (exp & 1) is required.

In the fixed point operation, equation 8 is multiplied by norm _scale This is further reduced to equation 9, which reduces the operation of multiplying the quantized coefficients:

note that in equation 9The fixed point operation result of (2) is in the format of Q31.

The second shift number exp obtained by Step2 needs to be stored for use in the on-line forward operation.

Step3: according to the statistic value of the floating point scaling parameter gamma, the quantization coefficient gamma of the scaling parameter can be obtained by being brought into the formula (2) _scale Further updating the floating point scaling parameter gamma to the fixed point scaling parameter Q _γ ：Q _γ ＝γ*γ _scale 。

To add or subtract on the same metric, the quantized coefficient β of the translation parameter _scale ＝norm _scale *γ _scale So fixed point translation parameter Q _β ＝β*β _scale 。

Step 4: formulas (10), (11) and (12) can be obtained according to formula (6):

in the above formula, the first shift number N _outputs Not exceeding quantization bit width b, and outputting quantization coefficient output of data _scale The maximum value and the minimum value of floating point output data can be brought into the formula 2 to obtain, and the same is true, the quantization coefficient output of the normalization result _scale Can be obtained by substituting the maximum value and the minimum value of the floating point normalization result into the formula 2, and is based on the output _scale 、γ _scale 、norm _scale The result of equation (12) can be obtained by further shifting the number N by the first shift number _outputs Shifting to obtain quantized coefficient M _outputs 。

The fixed point calculation process of the embodiment of the application is as follows:

step 1: for fixed point input data Q _inputs Taking the average value to obtain a fixed-point average value Q _μ Based on fixed-point input data Q _inputs And fixed point mean value Q _μ Equation (13) is implemented to yield the result of Q31:

note that > exp in equation (13) represents shifting by the second shift number exp.

Further, to Q _alpha Shifting 16 bits to obtain the result of Q15, the formula is: q (Q) _alpha ＝Q _alpha >>16。

Step2: the modification is performed on equation 9:

Q _{norm_tmp} ＝(Q _input -Q _μ )*Q _alpha (14)

wherein Q is _{norm_tmp} Representing an initial fixed point normalization result, the bit width of the initial fixed point normalization result being greater than the quantized bit width b by Q _alpha The occupied bit width is 15 bits.

In order not to lose accuracy, the 15 bits of the move of this step are incorporated into equation 9 It should be noted that the division here is +.>That is, the right shift operation, so that the 15-bit widths of formula (14) requiring the right shift are combined, thus being changed to +.> So finally only Q is needed _{norm_tmp} Right shift (16+exp/2-b) to obtain fixed point normalization result Q _norm 。

Step3: calculating the final result, i.e. the fixed point output data Q, according to equation (15) _outputs ：

Q _outputs ＝(Q _γ *Q _norm +Q _β )*M _outputs /(2^N _outputs ) (15)

The embodiment of the application provides a quantization method of a normalization operator in a neural network model, as shown in fig. 2, comprising the following steps:

s101, determining fixed-point input data Q of the normalization operator _inputs Quantization bit width b, floating point scaling parameter gamma, fixed point scaling parameter Q _γ Fixed point translation parameter Q _β Floating point output data, first shift number N _outputs A second shift number exp;

s102, according to the floating point output data output, the quantization bit width b, the floating point scaling parameter gamma and the first shift number N _outputs Obtaining a first quantization coefficient M _outputs ；

S103, according to the fixed point input data Q _inputs A second shift number exp and a quantization bit width b to obtain a fixed-point normalization result Q _norm ；

S104, scaling parameters Q according to fixed points _γ Fixed point translation parameter Q _β Fixed point normalization result Q _norm Obtaining initial fixed-point output data; according to the first quantization coefficient M _outputs Quantizing the initial fixed point output data, and using the first shift number N as the quantization result _outputs Shift to obtain fixed-point output data Q _outputs ；

The method of the embodiment of the application does not need to totally calculate the mean value and the variance, avoids the error caused by the traditional mean value and variance calculation method, adopts a shifting mode based on the first shifting number and the second shifting number for partial division operation, greatly reduces the operation amount and improves the operation efficiency.

Based on the above embodiments, as an alternative embodiment, the floating point output data output, the quantization bit width b, the floating point scaling parameter γ, and the first shift number N _outputs Obtaining a first quantization coefficient M _outputs Comprising:

s201, determining the second quantization coefficient output according to the first difference between the maximum value and the minimum value of the floating point output data output and the quantization bit width b _scale 。

Specifically, the embodiment of the application brings the maximum value, the minimum value and the quantization bit width b of the floating point output data into the formula (2) to obtain the quantization coefficient of the floating point output data, namely the second quantization coefficient output _scale 。

S202, determining a third quantization coefficient gamma according to the second difference between the maximum value and the minimum value of the floating point scaling parameter gamma and the quantization bit width b _scale 。

Specifically, the embodiment of the application brings the maximum value and the minimum value of the floating point scaling parameter gamma and the quantization bit width b into the formula (2) to obtain the quantization coefficient of the scaling parameter, namely the third quantization coefficient gamma _scale 。

S203, determining a fourth quantization coefficient norm according to the length N and the quantization bit width b _scale 。

According to the formulaA quantized coefficient of the normalization result, that is, a fourth quantized coefficient, can be obtained.

S204, according to the second quantization coefficient output _scale Third quantization coefficient gamma _scale And a fourth quantization coefficient norm _scale A fifth quantization coefficient is determined.

Specifically, the formula of the fifth quantization coefficient may be expressed as;i.e., to the left of equation (12).

S205, using the first shift number N to shift the fifth quantization coefficient _outputs Shifting to obtain the first quantized coefficient M _outputs 。

Based on the above embodiments, as an alternative embodiment, the data Q is input according to a fixed point _inputs A second shift number exp and a quantization bit width b to obtain a fixed-point normalization result Q _norm Comprising:

s301, input data Q to fixed point _inputs Taking the average value to obtain a fixed-point average value Q _μ 。

S302, determining fixed-point input data Q _inputs And fixed point mean value Q _μ The degree of difference between the square root of the fixed point variance and the reciprocal of the square root Q is determined by shifting the degree of difference by the second shift number exp _alpha 。

In particular, embodiments of the present application may be according toDetermining fixed-point input data Q _inputs And fixed point mean value Q _μ The difference degree between the two is shifted based on the second shift number exp, and the result after the shift is squared and is taken back to obtain the fixed point variance square root reciprocal Q _alpha 。

S303, input data Q according to fixed point _inputs Fixed point average value Q _μ Fixed point variance reciprocal square root Q _alpha Determining initial fixed point normalization result Q _{norm_tmp} ；

The embodiment of the application can obtain the initialized normalization result Q according to the formula (14) _{norm_tmp} 。

S304 according to Q _alpha Normalization of the initial fixed point by the number of bits and the second shift number exp _{norm_tmp} Bit number shifted to quantization bit width, and fixed point normalization result Q is determined _norm 。

Note that Q _{norm_tmp} Representing an initial fixed point normalization result, the bit width of the initial fixed point normalization result being greater than the quantized bit width b by Q _alpha The occupied bit width is 15 bits.

Based on the above embodiments, as an alternative embodiment, the fixed point scaling parameter Q _γ Comprising:

s401, determining a third quantization coefficient gamma according to a third difference between the maximum value and the minimum value of the floating point scaling parameter gamma and the quantization bit width b _scale ；

S402, scaling parameter gamma and third quantization coefficient gamma according to floating point _scale Determining a fixed point scaling parameter Q _γ 。

Specifically, according to the statistical value of the floating point scaling parameter gamma, the embodiment of the application is brought into the formula (2) to obtain the quantization coefficient gamma of the scaling parameter _scale Further updating the floating point scaling parameter gamma to the fixed point scaling parameter Q _γ ：Q _γ ＝γ*γ _scale 。

On the basis of the above embodiments, as an alternative embodiment, the fixed point translation parameter Q is determined _β Comprising:

s501, determining a quantization coefficient norm of the normalization result according to the length N and the quantization bit width b _scale ；

Specifically, the embodiment of the application is according to the formulaObtaining quantized coefficients of normalized result, i.e. fourth quantized coefficients norm _scale 。

S502, according to the quantized coefficient norm _scale And quantization coefficient gamma _scale Determining quantization coefficients beta of a translation parameter _scale ；

Specifically, according to formula beta _scale ＝norm _scale *γ _scale Obtaining quantized coefficients of translation parameters, i.e. sixth quantized coefficients beta _scale ；

S503, determiningDetermining a floating point translation parameter beta according to the floating point translation parameter beta and a quantization coefficient beta _scale Determining a fixed point translation parameter Q _β 。

Specifically, according to formula Q _β ＝β*β _scale Determining a fixed point translation parameter Q _β 。

Based on the above embodiments, as an alternative embodiment, the method uses the fixed-point input data Q _inputs Fixed point average value Q _μ Fixed point variance reciprocal square root Q _alpha Determining initial fixed point normalization result Q _{norm_tmp} Comprising:

The embodiment of the application provides a quantization device of a normalization operator in a neural network model, as shown in fig. 3, the quantization device of the normalization operator in the neural network model can comprise: a preparation module 301, a quantization coefficient determination module 302, a fixed point normalization module 303, and an output module 304, wherein,

a preparation module 301 for determining fixed-point input data Q of the normalization operator _inputs Quantization bit width b, floating point scaling parameter gamma, fixed point scaling parameter Q _γ Fixed point translation parameter Q _β Floating point output data, first shift number N _outputs A second shift number exp;

a quantization coefficient determination module 302 for determining a quantization coefficient according to the floating point output data output, the quantization bit width b, the floating point scaling parameter gamma and the first shift number N _outputs Obtaining a first quantization coefficient M _outputs ；

A fixed point normalization module 303 for inputting data Q according to fixed point _inputs A second shift number exp and a quantization bit width b to obtain a fixed-point normalization result Q _norm ；

An output module 304 for scaling the parameter Q according to the fixed point _γ Fixed pointTranslation parameter Q _β Fixed point normalization result Q _norm Obtaining initial fixed-point output data; according to the first quantization coefficient M _outputs Quantizing the initial fixed point output data, and using the first shift number N as the quantization result _outputs Shift to obtain fixed-point output data Q _outputs ；

The device of the embodiment of the present application may perform the method provided by the embodiment of the present application, and its implementation principle is similar, and actions performed by each module in the device of the embodiment of the present application correspond to steps in the method of the embodiment of the present application, and detailed functional descriptions of each module of the device may be referred to the descriptions in the corresponding methods shown in the foregoing, which are not repeated herein.

The embodiment of the application provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to realize the steps of a quantization method of a normalization operator in a neural network model, and compared with the related technology, the method can realize the steps of the quantization method of the normalization operator in the neural network model: the embodiment of the application does not need to totally calculate the mean value and the variance, avoids errors caused by the traditional mean value and variance calculation method, adopts a shifting mode based on the first shifting number and the second shifting number for partial division operation, greatly reduces the operation amount and improves the operation efficiency.

In an alternative embodiment, an electronic device is provided, as shown in fig. 4, the electronic device 4000 shown in fig. 4 includes: a processor 4001 and a memory 4003. Wherein the processor 4001 is coupled to the memory 4003, such as via a bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004, the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data, etc. It should be noted that, in practical applications, the transceiver 4004 is not limited to one, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The processor 4001 may be a CPU (Central Procesing Unit, central processing unit), general purpose processor, DSP (Digital Signal Procesor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 4001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

Bus 4002 may include a path to transfer information between the aforementioned components. Bus 4002 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The bus 4002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 4, but not only one bus or one type of bus.

Memory 4003 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Acces Memory, random access Memory) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically Erasable Programmable Read Only Memory ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer.

The memory 4003 is used for storing a computer program for executing an embodiment of the present application, and is controlled to be executed by the processor 4001. The processor 4001 is configured to execute a computer program stored in the memory 4003 to realize the steps shown in the foregoing method embodiment.

Embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the foregoing method embodiments and corresponding content.

The embodiment of the application also provides a computer program product, which comprises a computer program, wherein the computer program can realize the steps and corresponding contents of the embodiment of the method when being executed by a processor.

The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such that the embodiments of the application described herein may be implemented in other sequences than those illustrated or otherwise described.

It should be understood that, although various operation steps are indicated by arrows in the flowcharts of the embodiments of the present application, the order in which these steps are implemented is not limited to the order indicated by the arrows. In some implementations of embodiments of the application, the implementation steps in the flowcharts may be performed in other orders as desired, unless explicitly stated herein. Furthermore, some or all of the steps in the flowcharts may include multiple sub-steps or multiple stages based on the actual implementation scenario. Some or all of these sub-steps or phases may be performed at the same time, or each of these sub-steps or phases may be performed at different times, respectively. In the case of different execution time, the execution sequence of the sub-steps or stages can be flexibly configured according to the requirement, which is not limited by the embodiment of the present application.

The foregoing is merely an optional implementation manner of some of the implementation scenarios of the present application, and it should be noted that, for those skilled in the art, other similar implementation manners based on the technical ideas of the present application are adopted without departing from the technical ideas of the scheme of the present application, and the implementation manner is also within the protection scope of the embodiments of the present application.

Claims

1. A method for quantifying a normalization operator in a neural network model, comprising:

2. The method of claim 1, wherein the floating point output data output, the quantization bit width b, the floating point scaling parameter γ, and the first shift number N _outputs Obtaining a first quantization coefficient M _outputs Comprising:

3. The method according to claim 1, wherein the input data Q is based on a fixed point _inputs A second shift number exp and a quantization bit width b to obtain a fixed-point normalization result Q _norm Comprising:

4. The method of claim 1, wherein the determiningFixed point scaling parameter Q _γ Comprising:

5. The method of claim 4, wherein the determining the setpoint translation parameter Q _β Comprising:

6. A method according to claim 3, wherein the input data Q is based on fixed point input data Q _inputs Fixed point average value Q _μ Fixed point variance reciprocal square root Q _alpha Determining initial fixed point normalization result Q _{norm_tmp} Comprising:

7. The method according to any one of claims 1-6, wherein the preset information is any one of image, text, audio, and environmental information.

8. A quantization apparatus for normalization operators in a neural network model, comprising:

a preparation module for determining the fixed-point input data Q of the normalization operator _inputs Quantization bit width b, floating point scaling parameter gamma, fixed point scaling parameter Q _γ Fixed point translation parameter Q _β Floating point output data, first shift number N _outputs A second shift number exp;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to carry out the steps of the method according to any one of claims 1-7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1-7.