CN114841325A - Data processing method and medium of neural network model and electronic device - Google Patents

Data processing method and medium of neural network model and electronic device Download PDF

Info

Publication number
CN114841325A
CN114841325A CN202210556167.0A CN202210556167A CN114841325A CN 114841325 A CN114841325 A CN 114841325A CN 202210556167 A CN202210556167 A CN 202210556167A CN 114841325 A CN114841325 A CN 114841325A
Authority
CN
China
Prior art keywords
parameter
data
quantization parameter
fixed point
threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210556167.0A
Other languages
Chinese (zh)
Inventor
章小龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Technology China Co Ltd
Original Assignee
ARM Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARM Technology China Co Ltd filed Critical ARM Technology China Co Ltd
Priority to CN202210556167.0A priority Critical patent/CN114841325A/en
Publication of CN114841325A publication Critical patent/CN114841325A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The present application relates to the field of artificial intelligence technologies, and in particular, to a data processing method and medium for a neural network model, and an electronic device. The method comprises the following steps: determining floating point type data to be processed in a neural network model; quantizing the floating point type data to obtain fixed point numbers through post quantization parameters, wherein the post quantization parameters are determined according to preset floating point threshold parameters; and taking the fixed point number as fixed point input data of a first ThresholdReLU activation operation included in the neural network model to obtain fixed point output data, wherein the first ThresholdReLU activation operation is associated with the post-quantization parameter. The post-quantization parameters are adopted, so that the fixed point number of the quantized floating point type data larger than the floating point threshold parameter is larger than the fixed point threshold parameter obtained after the floating point threshold parameter is quantized, and the accuracy of the model operation result can be improved.

Description

Data processing method and medium for neural network model and electronic device
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a data processing method and medium for a neural network model, and an electronic device.
Background
With the development of Artificial Intelligence (AI), neural network models are applied more and more widely in the field of artificial intelligence. For example, the method is applied to application scenarios such as image recognition, object detection, reinforcement learning, information recommendation, user behavior prediction and the like. Because the neural network is a resource-intensive algorithm, the calculation cost and the memory consumption are high, and in order to reduce the parameter quantity of the neural network model, the calculation quantity in the computer operation process and the calculation cost and the memory occupation, the neural network model is generally required to be quantized, and high-precision floating point operation in the neural network model is converted into fixed point operation to obtain the neural network model of the fixed point operation. Threshold corrected Linear Unit (Threshold corrected relu) activation functions, which are commonly used activation functions, are widely used in neural network models, such as the activation layer operators of neural networks, to perform nonlinear activation operations. Therefore, in quantifying the neural network model, quantification of the thresholdreu activation function is involved in many cases.
In the operation process of the neural network model, the value of some data to be subjected to the threshold value relu activation operation in the floating point domain is greater than the threshold parameter of the threshold value relu activation operation, but the value of some data to be subjected to the threshold value relu activation operation in the fixed point domain is equal to the threshold parameter of the threshold value relu activation operation, so that the problem that the result obtained by performing the threshold value relu activation operation on the data to be subjected to the threshold value relu activation operation in the floating point domain is not 0, but the floating point value corresponding to the fixed point result obtained by performing the threshold relu activation operation in the fixed point domain is 0 occurs, and the accuracy of the operation result of the neural network model is further influenced.
Disclosure of Invention
In order to solve the above problems, the present application provides a data processing method of a neural network model, a medium, and an electronic device.
In a first aspect, an embodiment of the present application provides a data processing method for a neural network model, which is applied to an electronic device, and includes:
determining floating point type data to be processed in a neural network model;
quantizing the floating point type data to obtain fixed point numbers through post quantization parameters, wherein the post quantization parameters are determined according to preset floating point threshold parameters;
the fixed point number is used as fixed point input data of a first Threshold corrected Linear Unit (Threshold ReLU) activation operation included in the neural network model to obtain fixed point output data, wherein the first Threshold ReLU activation operation is associated with a post-quantization parameter.
It is to be understood that, in the embodiment of the present application, the post-quantization parameter is a second quantization parameter, the preset floating-point threshold parameter is a threshold parameter preset in the floating-point domain, and the first threshold relu activation operation is a threshold relu activation operation performed in the fixed-point domain.
It can be understood that, in the data processing method of the neural network model provided in the embodiment of the present application, the post-quantization parameter is utilized to quantize the floating-point type data to obtain the fixed-point number, and the fixed-point number is used as the fixed-point input data of the first threshold relu activation operation included in the neural network model to obtain the fixed-point output data. Therefore, the situation that in the running process of the neural network model, the floating point type data with the output result larger than 0 is obtained by performing ThresholdReLU activation operation on the parameter larger than the floating point threshold value, and the floating point value corresponding to the output result obtained in the fixed point domain is 0 because the fixed point value obtained after quantization is not larger than the fixed point value obtained after quantization of the floating point threshold value parameter can be conveniently avoided only by adjusting the quantization parameter in the quantization mode; the fixed point number obtained by quantizing the floating point type data larger than the threshold parameter through the post-quantization parameter and the fixed point threshold parameter obtained by quantizing the post-quantization parameter larger than the floating point threshold parameter by using the post-quantization parameter are also realized, so that the floating point value corresponding to the fixed point output data obtained by performing ThresholdReLU activation operation in the fixed point domain is not 0, and the accuracy of the operation result of the neural network model is improved.
In a possible implementation of the first aspect, the post-quantization parameter is determined according to a floating-point threshold parameter and a pre-quantization parameter.
It is to be understood that, in the embodiment of the present application, the previous quantization parameter is the first quantization parameter.
It can be understood that, in the embodiment of the present application, the post quantization parameter is determined according to the floating point threshold parameter and the pre quantization parameter, that is, the post quantization parameter can be determined only by knowing the floating point threshold parameter and the pre quantization parameter, and the logic is simple.
In one possible implementation of the first aspect, the post-quantization parameter is determined by a method including,
determining a pre-quantization parameter according to the range of floating point type data to be processed in the neural network model and the data type of the fixed point number;
and obtaining a rear quantization parameter according to the front quantization parameter and the floating point threshold parameter.
It can be understood that, in the embodiment of the present application, the relevance between the front quantization parameter and the rear quantization parameter is embodied in a manner of determining the front quantization parameter according to the range of the floating point type data to be processed in the neural network model and the data type of the fixed point number and then obtaining the rear quantization parameter according to the front quantization parameter and the floating point threshold parameter, so that the difference and the connection between the rear quantization parameter and the front quantization parameter can be better understood.
In a possible implementation of the first aspect, the deriving the post-quantization parameter according to the pre-quantization parameter and the floating-point threshold parameter includes,
quantizing the floating point threshold parameter according to the previous quantization parameter to obtain a first fixed point threshold parameter;
and obtaining a post-quantization parameter according to the first floating point threshold parameter and the floating point threshold parameter.
It can be understood that, in the embodiment of the present application, the floating point threshold parameter is quantized to obtain the first fixed point threshold parameter according to the previous quantization parameter, and then the post quantization parameter is obtained according to the first fixed point threshold parameter and the floating point threshold parameter, so that a relationship between the first fixed point threshold parameter and the post quantization parameter can be analyzed, which is beneficial to discussing how quickly the post quantization parameter is obtained by the first fixed point threshold parameter under the condition of different parity.
In a possible implementation of the first aspect, the obtaining the post-quantization parameter according to the first floating-point threshold parameter and the floating-point threshold parameter includes,
under the condition that the first fixed point threshold parameter is an odd number, obtaining a post-quantization parameter according to an odd number quantization parameter formula, wherein the odd number quantization parameter formula comprises the first fixed point threshold parameter and a floating point threshold parameter;
and under the condition that the first fixed point threshold parameter is an even number, obtaining a post-quantization parameter according to an even-number quantization parameter formula, wherein the even-number quantization parameter formula comprises the first fixed point threshold parameter and a floating point threshold parameter.
It can be understood that, in the embodiment of the present application, because of the specific rounding operation in the process of quantizing floating-point data into fixed-point numbers, when the parity of the first fixed-point threshold parameter is different, the corresponding odd quantization parameter formula and even quantization parameter formula may be used, so that according to the difference in parity of the first fixed-point threshold parameter, the corresponding odd quantization parameter formula and even quantization parameter formula are provided, and further, the corresponding post-quantization parameter may be obtained in a targeted and accurate manner, so as to provide multiple obtaining manners for the obtained post-quantization parameter.
In a possible implementation of the first aspect, the obtaining of the post-quantization parameter by the first floating-point threshold parameter and the floating-point threshold parameter includes,
and obtaining the post-quantization parameter according to a combined quantization parameter formula, wherein the combined quantization parameter formula comprises a first fixed-point threshold parameter and a floating-point threshold parameter.
It can be understood that, in the embodiment of the present application, the odd quantization parameter formula and the even quantization parameter formula are combined according to the combined quantization parameter formula, so that the parity of the first fixed point threshold parameter does not need to be considered, and the post-quantization parameter can be obtained more directly.
In a possible implementation of the first aspect, the first threshold relu activation operation is determined according to a post-quantization parameter and a floating-point threshold parameter.
It is understood that, in the embodiment of the present application, the first thresholdreu activate operation is determined according to the post-quantization parameter and the floating point threshold parameter, that is, the first thresholdreu activate operation can be determined only by knowing the floating point threshold parameter and the post-quantization parameter, and the logic is simple.
In one possible implementation of the first aspect, the first threshold relu activation operation is determined by a method including,
quantizing the floating point threshold parameter according to the post-quantization parameter to obtain a second fixed point threshold parameter;
and obtaining a ThresholdReLU activation function for performing the first ThresholdReLU activation operation according to the second fixed point threshold parameter.
It can be understood that, in the embodiment of the present application, the floating point threshold parameter is quantized according to the post-quantization parameter to obtain the second fixed point threshold parameter, and the manner of obtaining the second fixed point threshold parameter is the same as the manner of quantizing the floating point type data to be processed, and the logic is scientific; according to the second fixed point threshold parameter, a threshold ReLU activation function for performing the first threshold ReLU activation operation can be obtained, the logic is simple, and the expanded application is convenient.
In one possible implementation of the first aspect, the range of floating point type data to be processed in the neural network model is determined during operation of the neural network model.
It can be understood that, in the embodiment of the present application, the range of the floating point type data to be processed can be obtained by operating the neural network model, and the obtaining mode is convenient.
In a possible implementation of the first aspect, the data type of the fixed-point number includes at least one of the following: int32, int16, int8, int4, uint32, uint16, uint8 or uint 4.
It can be understood that, in the embodiment of the present application, the data type of the fixed point number may include multiple data types, so that the application scope is wider.
In a possible implementation of the first aspect, the floating point type data to be processed in the neural network model is obtained based on at least one of image data, audio data, text data, and video data.
It can be understood that in the embodiment of the application, the neural network has a self-learning function, an association storage function and an ability of efficiently searching for an optimal solution, and can well process image data, audio data, text data and video data.
In one possible implementation of the first aspect, the quantization is a symmetric quantization.
It can be understood that, in the embodiment of the present application, the floating point data is quantized into the fixed point data by using symmetric quantization, so that the calculation speed of the neural network model can be increased, and the power consumption of the device can be reduced.
In a second aspect, an embodiment of the present application provides a data processing apparatus of a neural network model, including a first determining unit, configured to determine floating point type data to be processed in the neural network model;
the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for quantizing the floating point type data to obtain fixed point numbers through post-quantization parameters, and the post-quantization parameters are determined according to preset floating point threshold parameters;
and the second obtaining unit is used for taking the fixed point number as fixed point input data of a first threshold ReLU activation operation included in the neural network model to obtain fixed point output data, wherein the first threshold ReLU activation operation is associated with the post-quantization parameter.
It is to be understood that, in the embodiment of the present application, the post-quantization parameter is a second quantization parameter, and the first threshold relu activation operation is a threshold relu activation operation performed in a fixed-point domain.
In a third aspect, an embodiment of the present application provides a computer-readable storage medium, including: the computer-readable medium may have stored thereon instructions for executing the method according to any one of the first aspect and various possible implementations of the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer program product, including: the computer program product comprises instructions which, when executed by one or more processors, are operable to implement the method as described above in the first aspect and any one of its various possible implementations.
In a fifth aspect, an embodiment of the present application provides an electronic device, including:
a memory for storing instructions, an
One or more processors configured to execute the instructions, when the instructions are executed by the one or more processors, to perform the method as described in the first aspect above and any one of the various possible implementations of the first aspect.
Based on the scheme, the method has the following beneficial effects:
according to the data processing method of the neural network model, a post-quantization parameter is obtained by adjusting a pre-quantization parameter, a fixed point number is obtained by quantizing floating point type data to be processed according to the post-quantization parameter, and the fixed point number is used as fixed point input data of a first threshold ReLU activation operation included in the neural network model to obtain fixed point output data. Therefore, the problem that under the condition of using the original quantization parameter, due to the fact that an integer function is used, an error exists in the process of quantizing the floating point type data to the fixed point type data, the floating point value corresponding to the fixed point output data obtained by performing the first threshold ReLU activating operation on a part of parameters which are larger than the floating point threshold value is 0, and then the error of the operation result of the neural network model is large can be solved. Moreover, the method for adjusting the quantization parameters is simple in logic and easy to operate, only the preset floating point threshold parameter and the original quantization parameter need to be concerned, and the accuracy of the running result of the neural network model can be improved in the process of processing image data, audio data, text data and video data.
Drawings
FIG. 1 illustrates a diagram of a ThresholdReLU activation function according to some embodiments of the present application;
FIG. 2 illustrates a schematic diagram of a symmetric quantization, according to some embodiments of the present application;
fig. 3 illustrates a scene diagram of the terminal 100 recognizing a captured face image through a face recognition model according to some embodiments of the present application;
FIG. 4 illustrates a graph of input data versus fixed point input data with a quantization parameter of 2, according to some embodiments of the present application;
FIG. 5 illustrates a flow chart for using an adjusted quantization parameter to derive a quantized ThresholdReLU activation function, according to some embodiments of the present application;
FIG. 6a illustrates a schematic diagram where a first setpoint threshold parameter value is an even number, according to some embodiments of the present application;
FIG. 6b illustrates a schematic diagram where a first setpoint threshold parameter value is odd, according to some embodiments of the present application;
FIG. 7a illustrates a schematic diagram of obtaining a corresponding second fixed point threshold parameter when a first fixed point threshold parameter value is an even number, according to some embodiments of the present application;
FIG. 7b illustrates a diagram of obtaining a corresponding second fixed point threshold parameter when the first fixed point threshold parameter value is odd, according to some embodiments of the present application;
FIG. 8 illustrates a flow diagram of neural network model data processing, according to some embodiments of the present application;
FIG. 9 illustrates a data processing apparatus diagram of a neural network model, in accordance with some embodiments of the present application;
fig. 10 illustrates a schematic diagram of a terminal 100, according to some embodiments of the present application.
Detailed Description
The illustrative embodiments of the present application include, but are not limited to, a data processing method, medium, and electronic device of a neural network model.
In order to more clearly illustrate the aspects of the embodiments of the present application, some terms referred to in the embodiments of the present application are explained below.
Quantification of a neural network model: and converting floating-point operation in the neural network model into fixed-point operation.
Threshold corrected Linear Unit (Threshold corrected ReLU) activation function
The expression of the ThresholdReLU activation function in the floating point domain is shown in the following equation 1:
Figure BDA0003655022320000051
wherein x is input data of the ThresholdReLU activation function, a is a threshold parameter (a ≧ 0) of the ThresholdReLU activation function, and y is output data of the ThresholdReLU activation function. Fig. 1 shows a schematic diagram of the thresholdreu activation function in some embodiments of the present application. For example, the threshold parameter a is 10.1, the output data is 10.5 if the input data is 10.5, and the output data is 0 if the input data is 9.5.
Threshold corrected Linear Unit (Threshold corrected ReLU) activation function quantization
The thresholdreu activation function quantization can be realized by a symmetric quantization or an asymmetric quantization mode. In the present application, the quantization mode discussed is to quantize the thresholdreu activation function by a symmetric quantization mode to obtain a quantized thresholdreu activation function.
For ease of understanding, the weighing method will be described below.
Symmetric quantization:
will range from [ X min ,X max ]Is mapped to the range [ Q ] by equation 2 min ,Q max ]Number of fixed points Q:
q ═ round (X/S) (equation 2)
Wherein the fixed point number Q is a signed integer number (int), its range [ Q min ,Q max ]Specifically [ -2 [ ] n-1 ,2 n-1 -1]Where n is the number of quantized bits, e.g. the fixed point number Q is int8 type, [ Q ] min ,Q max ]In particular [ -128,127]]As shown in FIG. 2, the floating point number X is quantized to the range of [ -128,127]The number of fixed points.
The round () function is a rounding function that rounds the value of X/S so that the resulting value is integer. In some embodiments of the present application, the rounding rule of the round () function may be: recording the maximum integer smaller than the value of X/S as A, and when the value of X/S is larger than the value of A +0.5, the value of round (X/S) is A + 1; when the value of X/S is less than the value of A +0.5, the value of round (X/S) is A; when the value of X/S is equal to the value of A +0.5, round (X/S) is the even number where X/S is the closest. Specifically, when the value of X/S is equal to the value of A +0.5, and A is an even number, the value of round (X/S) is A; when the value of X/S is equal to the value of A +0.5 and A is an odd number, the value of round (X/S) is A + 1. For example, if X/S is 1.5, round (X/S) is 2,2 is the nearest even number of 1.5, and if X/S is 3.5, round (X/S) is 4, 4 is the nearest even number of 3.5. It is understood that in other embodiments, the same result of the rounding rule of the round () function may be achieved by other rounding rules, which are not described herein again. It will be appreciated that in other implementations, the rounding function may also be a ceil () ceiling function.
S is a quantization parameter for quantizing a floating point number to a fixed point number, and is a minimum scale for quantizing different floating point numbers X to the same fixed point number Q, and S may be obtained according to a quantization parameter formula, for example, the following formula 3, or may be obtained by other modified formulas, and is not limited specifically here. The max () function in equation 3 is a maximum function. | is an absolute value. It is understood that, for a fixed-point number, the floating-point number corresponding to the fixed-point number can be determined according to inverse quantization. In some embodiments, by transforming the above formula 2, inverse quantization formula 2 'can be obtained, and floating point numbers corresponding to the fixed point numbers are obtained according to formula 2'.
Figure BDA0003655022320000071
X ═ Q × S (formula 2')
According to the formula 1, it can be found that quantizing the threshold relu activation function requires quantizing the input data x, the threshold parameter a and the output data y in the activation function, and the fixed point numbers obtained by quantizing the input data x, the threshold parameter a and the output data y are respectively recorded as fixed point input data x q Fixed point threshold parameter a q Fixed point output data y q
In some implementations, the fixed-point input data x is obtained by symmetrically quantizing the input data x, the threshold parameter a, and the output data y, respectively q Fixed point threshold parameter a q Fixed point output data y q . As can be appreciated, the fixed point threshold parameter a q Fixed point input data x q And fixed point output data y q The relationship between the three is shown in the following formula 4.
It should be explained that, in the following formula 4, the fixed point input data and the fixed point output data have the same data type, and the quantization parameter Sa in the symmetric quantization corresponding to the threshold parameter a and the quantization parameter Sy in the symmetric quantization corresponding to the output data y have the same value as the quantization parameter Sx corresponding to the input data x. Therefore, for convenience of explanation, the quantization parameter Sa, the quantization parameter Sy, and the quantization parameter Sx will be hereinafter collectively referred to as a quantization parameter S.
The quantized threshold relu activation function may be a modified formula of formula 4, and the specific form is not specifically required here.
Figure BDA0003655022320000072
Inputting the received fixed point data x according to equation 4 q And a fixed point threshold parameter a q Comparing to obtain fixed-point output data y q . For example, fixed point threshold parameter a q Is 10, if the fixed point input data x q Is 11, then the fixed point output data y q Is 11, if the fixed point input data x q 9, then the fixed point output data y q Is 0.
For better understanding of the present solution, an application scenario of the technical solution of the present application will be described first.
Fig. 3 shows a scene diagram of the terminal 100 recognizing the acquired face image through the face recognition model. As shown in fig. 3, the terminal 100 is deployed with a face recognition model quantized by the server 200. After the terminal 100 acquires the face image of the user, the face recognition may be performed on the acquired face image through a quantized face recognition model, so as to obtain a face recognition result.
The face recognition model is usually quantized to reduce the amount of data. For example, the trained face recognition model is quantized by the server 200, and then the quantized face recognition model (hereinafter referred to as quantization model) is deployed into the terminal device 100. In the process of face recognition by using the quantization model, the terminal 100 usually uses a quantized thresholdreu activation function to perform activation operation in an activation layer operator. After receiving floating point data to be subjected to ThresholdReLU activation operation, quantizing the floating point data according to the same method for obtaining the quantized ThresholdReLU activation function to obtain fixed point data, and taking the obtained fixed point data as fixed point input data of the ThresholdReLU activation function to obtain fixed point output data of the ThresholdReLU activation operation. It is understood that in the face recognition scenario, the input data for performing the activation operation in the face recognition model may be face image data.
In some implementations, the server 200 performs multiple model training using an unquantized face recognition model in advance, obtains a range of input data of threshold relu activation operation required in a quantization formula, substitutes the obtained range of the input data and a range of retrievable values corresponding to a data type of fixed-point input data required to be quantized into the quantization parameter formula to obtain a quantization parameter S, quantizes the threshold parameter a according to the quantization parameter S to obtain a fixed-point threshold parameter a q . After receiving floating point data to be subjected to ThresholdReLU activation operation, quantizing the floating point data according to the obtained quantization parameter S to obtain fixed point data, and using the obtained fixed point data as fixed point input data x in formula 4 q And is compared with a fixed point threshold parameter a q And comparing to obtain fixed-point output data. It can be understood that the obtained fixed-point output data may be inversely quantized according to the quantization parameter S to obtain corresponding floating-point output data.
For example, for the threshold parameter a of the threshold relu activation function, taking the value as 4.1 as an example, in the scenario shown in fig. 3, the server 200 performs multiple model training in advance using an unquantized face recognition model, and obtains the range of the input data of the threshold relu activation function in the floating point domain of the threshold relu activation function in the face recognition model during the training process as [ -6.2,254.0 []The fixed point input data type is int8, and the corresponding range of desirable values is [ -128,127 []Then, the quantization parameter S in the symmetric quantization method is obtained according to equation 3, where S is 254.0/127 is 2, and Q is round in the corresponding symmetric quantization equation 2(X/2), according to the obtained symmetrical quantization formula, input data X and fixed point input data X can be obtained q The relationship between them. FIG. 4 shows that when the quantization parameter is 2, the input data x and the fixed point input data x are shown in FIG. 4 q The relationship between them. Carrying out symmetrical quantization on the threshold parameter a with the value of 4.1 to obtain the fixed point threshold parameter a q Round (4.1/2) ═ 2, to determine the quantified thresholdreu activation function. After receiving floating point data to be subjected to threshold ReLU activation operation, quantizing the floating point data according to a quantization parameter S with the value of 2 to obtain fixed point data, and taking the obtained fixed point data as fixed point input data x in formula 4 q Is substituted into the formula 4 to obtain the fixed point output data y q . It can be understood that the inverse quantization operation is performed on the obtained fixed-point output data according to formula 2 to obtain corresponding floating-point output data.
However, when a quantized threshold relu activation function is used to activate floating point data to be activated, some floating point data to be activated may not have 0 output data obtained by using an unquantized threshold relu activation function, but have 0 floating point output data corresponding to fixed point output data obtained by using the quantized threshold relu activation function, thereby affecting the result of face recognition.
For example, as shown in fig. 4, the threshold parameter a is a fixed point threshold parameter a corresponding to 4.1 q As 2, according to the characteristics of the threshold relu activation function, when the input data x is equal to or less than the threshold parameter a, the output data obtained by performing the activation operation is 0, and when the input data x is greater than the threshold parameter a, the output data obtained by performing the activation operation is the original value. If the floating point data to be activated and operated is larger than the threshold value activation function a and the fixed point input data x obtained after quantization is used in the running process of the face recognition model q And a fixed point threshold parameter a q The same value can be obtained according to formula 4, and the data x is input according to the fixed point q Is derived from the value of q Is 0.
For convenience of explanation, the fixed point input obtained after quantization is described belowData x q And fixed point threshold parameter a q The input data of the same value is called "same-value data"; the fixed point input data x is obtained by quantizing the value of the parameter a which is less than or equal to the threshold value q And fixed point threshold parameter a q The input data of the same value is called as 'left same value data'; the fixed point input data x obtained after the value is larger than the threshold parameter a and the quantization is carried out q And a fixed point threshold parameter a q The input data of the same value is referred to as "right same value data". For the same threshold parameter a, different quantization parameters result in different fixed-value data ranges.
For example, the input data in the interval l1 shown in FIG. 4 are all constant value data, and the specific range of the interval l1 is [3,5 ]]The fixed point value after the data quantization in the interval is 2 and the fixed point threshold parameter a q Equal; the input data in the interval l2 are all the same-value data, and the specific range of the interval l2 is [3,4.1 ]]The value of the threshold parameter a is less than or equal to 4.1, the fixed point value after the data quantization in the interval is 2, and the fixed point threshold parameter a q Equal; the input data in the interval l3 are all the same right constant value data, and the specific range of the interval l3 is (4.1, 5)]A value greater than 4.1 for the threshold parameter a, a fixed point value of 2 after quantization of the data in the interval, and a fixed point threshold parameter a q Are equal.
Taking right constant-value data 4.3 to be subjected to activation operation in the shown interval l3 as an example for explanation, the constant-value data 4.3 is subjected to obtained symmetric quantization, specifically calculated as round (4.3/2) ═ 2, to obtain constant-point data 2, and the constant-point data 2 is taken as fixed-point input data of a quantized threshold relu activation function, and it can be found from formula 4 that, since the constant-point data value 2 is equal to the fixed-point threshold parameter a q The value is 2, so that the fixed point output data obtained by the quantized threshold ReLU activating function is 0, the fixed point output data 0 is subjected to inverse quantization according to a formula 2', and the obtained floating point output data is 0; when the same-fixed-value data 4.3 is activated by using the unquantized threshold ReLU activation function, according to the formula 1, it can be found that the obtained output data is 4.3 because the same-fixed-value data 4.3 is larger than the threshold parameter 4.1, and the floating-point output obtained by the fixed-point operation is 4.3The difference between the data and the floating point output data obtained by the floating point operation is large, so that the result of face recognition is influenced.
Therefore, the present application provides a data processing method for a neural network model, which makes the maximum value in the new same-definite-value data range be the threshold parameter a by adjusting the quantization parameter, and further makes the fixed-point value obtained by quantizing the original right same-definite-value data according to the adjusted quantization parameter larger than the fixed-point value obtained by quantizing the threshold parameter according to the adjusted quantization parameter. It will be appreciated that, after the quantization parameter is changed, the fixed-point value of the floating-point data obtained from the changed quantization parameter may be the same as or different from the fixed-point value obtained from the original quantization parameter. For the same threshold parameter a, different quantization parameters correspond to different same-valued data ranges.
It should be noted that, the neural network model is widely applied in the aspects of video, voice, image, text, etc., when the neural network model is applied in the aspect of image, in the running process of the neural network model, the input data for performing activation operation may be image data or intermediate data obtained based on the image data, and the output data obtained after performing activation operation is data for image processing, for example, in the above-mentioned face recognition scene, in the running process of the face recognition model, the input data for performing activation operation may be face image data or intermediate data obtained based on the face image data, and the output data obtained after performing activation operation is data for face image processing; when the neural network model is applied to the aspect of audio and video processing, in the running process of the neural network model, the input data for activation operation can be audio and video data or intermediate data obtained based on the audio and video data, and the output data obtained after activation operation is still the audio and video data, for example, in an intelligent home voice recognition scene, in the running process of the voice recognition model, the input data for activation operation can be voice data or intermediate data obtained based on the voice data, and the output data obtained after activation operation is data for voice processing; when the neural network model is applied to word processing, in the running process of the neural network model, the input data for performing the activation operation may be text data or intermediate data obtained based on the text data, and the output data obtained after performing the activation operation is data used for the text processing.
For example, in the above example, the quantization parameter S in symmetric quantization is adjusted to obtain a new quantization parameter S ', such that the right parity data in the interval l3 shown in fig. 4 is quantized according to the adjusted quantization parameter S', and the obtained fixed point input data is larger than the fixed point threshold parameter a obtained by quantizing the threshold parameter a according to the adjusted quantization parameter q
Specifically, an original quantization parameter (hereinafter referred to as a first quantization parameter) is obtained, and the threshold parameter a is quantized according to the first quantization parameter to obtain a first fixed point threshold parameter a q1 (ii) a According to a first fixed-point threshold parameter a q1 And adjusting the first quantization parameter by the threshold parameter a according to a preset adjustment mode to obtain an adjusted quantization parameter (hereinafter referred to as a second quantization parameter), and quantizing the threshold parameter a according to the second quantization parameter to obtain a second fixed point threshold parameter a q2 If the maximum value of the same-fixed-value data range corresponding to the second quantization parameter is the threshold parameter a, that is, if the right same-fixed-value data is empty, the fixed-point value quantized by the original right same-value data according to the second quantization parameter must be greater than the second fixed-point threshold parameter a q2 (ii) a Then according to the second fixed point threshold parameter a q2 The quantized thresholdreu activation function is determined.
For example, in the above example, the threshold parameter 4.1 is less than the maximum value 5 of the same value data range; according to a first fixed-point threshold parameter a of value 2 q1 And adjusting the first quantization parameter S to obtain a second quantization parameter S' by using the threshold parameter a with the value of 4.1, so that the threshold parameter a is quantized according to the second quantization parameter to obtain a second fixed point threshold parameter a q2 Wherein, the maximum value of the same-value data range corresponding to the second quantization parameter is 4.1 of the threshold parameter a, when 4.3 uses the second quantization parameterAfter the quantization parameter S' is quantized, the obtained fixed point value is larger than a second fixed point threshold parameter; then according to the second fixed point threshold parameter a q2 The quantized thresholdreu activation function is determined.
Therefore, in the actual operation process of the model, after floating point data to be subjected to ThresholdReLU activation operation is received, the floating point data is quantized by using a second quantization parameter to obtain fixed point data; and then using the obtained fixed point data as fixed point input data of a ThresholdReLU activation function, thereby obtaining fixed point output data of activation operation.
Therefore, according to the technical scheme, when the quantized ThresholdReLU activation function is used for activating floating point data to be activated in the running process of the face recognition model, output data obtained by using the unquantized ThresholdReLU activation function are not 0, the quantized ThresholdReLU activation function is used for activating to obtain the floating point data to be activated, corresponding to the fixed point output data, of which the floating point data is 0, and the output data obtained by using the quantized ThresholdReLU activation function for activating are not 0 according to the floating point data corresponding to the fixed point output data, so that the accuracy of the running result of the model is improved.
For example, in the above example, taking floating point data 4.3 to be subjected to threshold relu activation operation as an example, floating point data 4.3 uses the fixed point output data obtained by using the quantized threshold relu activation function obtained by using the first quantization parameter to be 0, and since the fixed point value obtained by floating point data 4.3 according to the second quantization parameter in the quantized threshold relu activation function obtained by using the second quantization parameter is greater than the second threshold parameter a q2 Therefore, the floating point data 4.3 is not 0 using the quantized threshold relu activation function obtained by the second quantization parameter, thereby improving the accuracy of the model operation result.
It can be understood that, the above method for implementing the fixed point number obtained by quantizing the floating point type data greater than the threshold parameter by adjusting the quantization parameter and the fixed point threshold parameter obtained by quantizing the quantization parameter after the floating point type data greater than the threshold parameter is used is not limited to the symmetric quantization performed by round () rounding function, but is also limited to the symmetric quantization performed by using other rounding functions to implement the above-mentioned symmetric quantization, and is within the protection scope of the present application.
It should be noted that the quantized threshold relu activation function in the neural network model may be determined by the terminal 100, or may be determined by the server 200. The terminal 100 includes, but is not limited to, one of a mobile phone, a tablet computer, a smart screen, a wearable device (e.g., a watch, a bracelet, a helmet, a headset, etc.), an in-vehicle device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), and other electronic devices. The server 200 may be a single server or a server cluster including a plurality of servers.
Taking the server 200 to determine the quantized activation function in the neural network model as an example, the process of obtaining the quantized threshold relu activation function will be described with reference to the face recognition scenario shown in fig. 3. Specifically, referring to the flowchart shown in fig. 5, a process for obtaining a quantized threshold relu activation function using an adjusted quantization parameter according to an embodiment of the present application includes the following steps:
s501, determining a first quantization parameter based on floating point input data for activating operation in the running process of the neural network model.
It is understood that the floating point input data for the activation operation during the operation of the neural network model may be video data, audio data, image data, text data or data associated with the data. Taking the application of the neural network model to video as an example, the floating point input data for performing activation operation in the running process of the neural network model is video data, or other intermediate data obtained based on the video data, such as characteristic data, vectors and the like; when the neural network model is applied to other aspects, the floating point input data for performing the activation operation in the running process of the neural network model is the same as the floating point input data, and the details are not repeated here.
In some implementations, when the activation operation is performed according to the neural network model, the first quantization parameter is obtained by using the input data range of the threshold relu activation operation in the floating-point domain, the range of the admissible value corresponding to the data type of the fixed-point input data of the threshold relu activation operation in the fixed-point domain, and the quantization parameter formula. Wherein the input data range of the ThresholdReLU activation operation in the floating-point domain is determined in the neural network model training process; the data type of the fixed point input data for the ThresholdReLU activation operation in the fixed point domain may be int32, int16, int8, int4, uint32, uint16, uint8, or uint 4.
For example, in the scenario shown in fig. 3, the server 200 performs multiple model training in advance by using an unquantized face recognition model, obtains a maximum value of 254 and a minimum value of-6 of floating-point type input data of threshold relu activation operation in the face recognition model during training, and obtains a floating-point type input data range of [ -6, 254 ]; if the data type of the fixed point input data of the threshold ReLU activation operation in the fixed point domain is determined to be int8, the range of the desirable value corresponding to int8 is [ -128,127 ]. The maximum value 254, the minimum value-6 of the floating-point input data, and the maximum value 127 of the fixed-point input data are substituted into equation 3, resulting in the first quantization parameter S being 2.
And S502, quantizing the threshold parameter a according to the first quantization parameter to obtain a first fixed point threshold parameter aq 1.
In some implementations, the first quantization parameter is substituted into a symmetric quantization formula to obtain a first fixed point threshold parameter aq1 corresponding to the threshold parameter a.
For example, the threshold parameter a is 4.1, and the value of the first fixed-point threshold parameter aq1 corresponding to the threshold parameter a having a value of 4.1, which is obtained by substituting the first quantization parameter value 2 obtained in the above-described step into the above-described symmetric quantization formula 2, is round (4.1/2) ═ 2.
For example, the threshold parameter a is 5.8, and the value of the first fixed point threshold parameter aq1 corresponding to the threshold parameter a having a value of 5.8 obtained by substituting the first quantization parameter value 2 obtained in the above step into the symmetric quantization formula 2 is round (5.8/2) ═ 3.
And S503, adjusting the first quantization parameter according to the first fixed point threshold parameter aq1 and the threshold parameter a to obtain a second quantization parameter, wherein the maximum value of the same-fixed-value data range corresponding to the second quantization parameter is the threshold parameter a.
It is to be understood that in some implementations, the second quantization parameter may be obtained according to the parity of the value of the first fixed point threshold parameter aq1 and the quantization parameter formula corresponding to the parity, respectively, and in other implementations, the second quantization parameter may be obtained directly according to the corresponding quantization parameter formula without considering the parity, without considering the parity of the value of the first fixed point threshold parameter aq 1.
In some implementations, in a case that the obtained first fixed point threshold parameter aq1 is an even number, the first fixed point threshold parameter aq1 and the threshold parameter a are substituted into the following formula 5 to obtain a second quantization parameter, so that the maximum value of the same-value data range corresponding to the second quantization parameter is the threshold parameter a. For convenience of explanation, in the case where the value of the first fixed point threshold parameter aq1 is an even number, equation 5 used to obtain the second quantization parameter will be referred to as an even quantization parameter equation.
Figure BDA0003655022320000121
In equation 5, S' is the second quantization parameter, aq1 is the first fixed point threshold parameter, and a is the threshold parameter.
For example, in fig. 6a, the threshold parameter a is 4.1, the first fixed point threshold parameter aq1 has a value of 2 and is an even number, and the second quantization parameter value is obtained by substituting the first fixed point threshold parameter aq1 having a value of 2 and the threshold parameter a having a value of 4.1 into equation 5, where S' is 4.1/(2+0.5) ═ 1.64. At this time, the floating point input data x and the fixed point input data x obtained according to the second quantization parameter S' having a value of 1.64 q The corresponding relationship of (a) is shown in fig. 7 a.
In some implementations, in the case that the obtained first fixed point threshold parameter aq1 is an odd number, the first fixed point threshold parameter aq1 and the threshold parameter a are substituted into the following formula 6 to obtain the second quantization parameter S ', so that the maximum value of the same-value data range corresponding to the second quantization parameter S' is the threshold parameter a. For convenience of explanation, in the case where the value of the first fixed point threshold parameter aq1 is an even number, equation 6 used to obtain the second quantization parameter will be referred to as an odd quantization parameter equation.
Figure BDA0003655022320000122
In equation 6, S' is the second quantization parameter, aq1 is the first fixed point threshold parameter, and a is the threshold parameter.
For example, in fig. 6b, the threshold parameter a is 5.8, the first fixed point threshold parameter aq1 has a value of 3 and is an odd number, and the second quantization parameter value is obtained by substituting the first fixed point threshold parameter aq1 having a value of 3 and the threshold parameter a having a value of 5.8 into equation 6, where S' is 5.8/(3-0.5) ═ 2.32. At this time, the floating point input data x and the fixed point input data x obtained according to the second quantization parameter S' having a value of 2.32 q The corresponding relationship of (a) is shown in fig. 7 b.
It is to be understood that in other implementations, the first quantization parameter aq1 and the threshold parameter a are substituted into equation 7 to obtain the second quantization parameter S ', so that the maximum value of the same-value data range corresponding to the second quantization parameter S' is the threshold parameter a. For convenience of explanation, the quantization parameter formula corresponding to the parity of the value without considering the first fixed point threshold parameter aq1 is referred to as a combined quantization parameter formula for short, that is, the following formula 7 used for obtaining the second quantization parameter is referred to as a combined quantization parameter formula.
Figure BDA0003655022320000131
In formula 7, S' is the second quantization parameter, aq1 is the first fixed point threshold parameter, a is the threshold parameter, mod (aq1, 2) is the remainder function, i.e., mod (aq1, 2) has the result that the two numbers aq1 and 2 are the remainder after division, aq1 is the dividend, and 2 is the divisor.
For example, in fig. 6a, the threshold parameter a is 4.1, the first fixed point threshold parameter aq1 has a value of 2,a second quantization parameter value is obtained by substituting the first fixed point threshold parameter aq1 having a value of 2 and the threshold parameter a having a value of 4.1 into equation 7, where the remainder function mod (2,2) is 0 and the corresponding S' is 4.1/(2-0+0.5) is 1.64. At this time, the floating point input data x and the fixed point input data x obtained according to the second quantization parameter S' having a value of 1.64 q The corresponding relationship of (a) is shown in fig. 7 a.
For example, in fig. 6b, the threshold parameter a is 5.8, the first fixed point threshold parameter aq1 has a value of 3, and the first fixed point threshold parameter aq1 having a value of 3 and the threshold parameter a having a value of 5.8 are substituted into equation 7 to obtain the second quantization parameter value, where the remainder function mod (3,2) is 1, and S' is 5.8/(3-1+0.5) is 2.32. At this time, the floating point input data x and the fixed point input data x obtained according to the second quantization parameter S' having a value of 2.32 q The corresponding relationship of (c) is shown in fig. 7 b.
It will be appreciated that the values of the second quantization parameter according to the odd/even quantization parameter formula, or the sum quantization parameter formula, are the same for the first fixed point threshold parameter aq1 and the threshold parameter a.
And S504, quantizing the threshold parameter a according to the second quantization parameter to obtain a second fixed point threshold parameter aq 2.
In some implementations, the second quantization parameter obtained according to the combined quantization parameter formula or the second quantization parameter obtained according to the odd/even quantization parameter formula is substituted into the symmetric quantization formula 2, and the threshold parameter a is quantized to obtain the second fixed point threshold parameter aq 2.
For example, the threshold parameter a is 4.1, and the value of the second fixed point threshold parameter aq2 obtained by substituting the second quantization parameter value 1.64 obtained by the above steps according to the combined quantization parameter formula or the even quantization parameter formula into the symmetric quantization formula 2 is round (4.1/1.64) ═ 2. Fig. 7a shows that when the value of the threshold parameter a is 4.1, the corresponding value of the second fixed point threshold parameter aq2 is 2, and at this time, the maximum value of the floating point input data range having the same value of the quantized fixed point value as the second fixed point threshold parameter aq2 is 4.1 of the threshold parameter a, so that when the value of the second quantization parameter value is 1.64, the maximum value of the corresponding same-value data range is 4.1 of the threshold parameter a.
For another example, the threshold parameter a is 5.8, and the value of the second fixed point threshold parameter aq2 obtained by substituting the second quantization parameter value of 2.32 obtained by the above steps according to the combined quantization parameter formula or the odd quantization parameter formula into the symmetric quantization formula 2 is round (5.8/2.32) ═ 2. Fig. 7b shows that when the value of the threshold parameter a is 5.8, the corresponding value of the second fixed point threshold parameter aq2 is 2, and at this time, the maximum value of the floating point input data range with the quantized fixed point value equal to the value of the second fixed point threshold parameter aq2 is 5.8 of the value of the threshold parameter a, so that when the value of the second quantized parameter is 2.32, the maximum value of the corresponding fixed point data range is 5.8 of the value of the threshold parameter a.
And S505, obtaining a quantified ThresholdReLU activation function for activation operation in the operation process of the neural network model according to the second fixed point threshold parameter aq 2.
It can be understood that, during the operation of the neural network model, the activation operation is performed on the data to be activated by using the quantized threshold relu activation function.
In some implementations, the second fixed point threshold parameter aq2 obtained from the second quantization parameter is substituted into equation 4 to obtain a quantized ThresholdReLU activation function.
For example, if the value of the second fixed point threshold parameter aq2 obtained when the threshold parameter a is 4.1 in the above example is 2, then the second fixed point threshold parameter aq2 with the value of 2 is substituted into equation 4 to obtain the quantized threshold relu activation function, as shown in equation 8 below. The neural network model performs activation operation on the data to be activated by using the operation logic shown in formula 8 during the operation process.
Figure BDA0003655022320000141
For another example, if the value of the second fixed point threshold parameter aq2 obtained when the threshold parameter a is 5.8 in the above example is 2, then the second fixed point threshold parameter aq2 with the value of 2 is substituted into formula 4 to obtain the quantized threshold relu activation function, which is shown in the following formula 9. The neural network model performs activation operation on the data to be activated by using the operation logic as shown in formula 9 during the operation process.
Figure BDA0003655022320000142
It is to be understood that, for the above method for obtaining a quantized threshold relu activation function, the execution sequence of steps S501 to S505 is only an example, and in other embodiments, other execution sequences may also be adopted, and some steps may also be split or merged, which is not limited herein.
It can be understood that, after the quantized threshold relu activation function is obtained, in the operation process of the neural network model, the quantized threshold relu activation function is used to perform a threshold relu activation operation, so as to obtain fixed point output data, where the fixed point output data may be data obtained by performing a threshold relu activation operation on video data, performing a threshold relu activation operation on audio data, performing a threshold relu activation operation on image data, performing a threshold relu activation operation on text data, and the like, and for example, in the scenario shown in fig. 3, in the process of performing data processing on a face image by using a face recognition model, the fixed point output data may be image data obtained after performing a threshold relu activation operation on face image data.
In order to better understand the technical solution of the embodiment of the present application, an example in which the terminal 100 performs the activation operation in the process of operating the neural network model to process data will be described. Taking the flowchart shown in fig. 8 as an example, after the server 200 creates and obtains the quantized threshold relu activation function by using the flowchart shown in fig. 5, and sends the quantized threshold relu activation function to the terminal 100, when the terminal 100 runs the deployed neural network model, in a process of processing data by the model, according to floating point input data to be subjected to threshold relu activation operation, the fixed point output result is obtained by using the determined quantized threshold relu activation function, and the specific flow includes the following steps:
s801, floating point data to be subjected to ThresholdReLU activation operation in the operation process of the neural network model is received.
It is understood that the floating point data to be subjected to the threshold relu activation operation during the operation of the receiving neural network model may be video data, audio data, image data, text data or data related to the data, and the like, which are of a floating point type.
For example, in the face recognition scenario shown in fig. 3, the floating point data to be subjected to the threshold relu activation operation in the face recognition model operation process is received as image data, and the value of the floating point data is 4.3.
S802, quantizing the floating point data to obtain fixed point numbers, wherein the fixed point numbers of the floating point data which are larger than the threshold parameter after quantization are larger than the fixed point numbers of the threshold parameter in the ThresholdReLU activating function after quantization.
It is understood that the fixed point obtained by quantizing the floating point data is fixed point type video data, audio data, image data, character data, or the like.
For example, in the flow of obtaining the quantized threshold relu activation function shown in fig. 5, if the threshold parameter a is 4.1 and the corresponding second quantization parameter is 1.64, the floating-point image data having the value of 4.3 in step S801 is quantized according to the symmetric quantization formula 2 to obtain a fixed-point number, and if the value is round (4.3/1.64) ═ 3, the obtained fixed-point image data has a value of 3.
And S803, taking the obtained fixed point number as fixed point input data of the quantized ThresholdReLU activation function to obtain fixed point output data obtained after activation operation in the running process of the neural network model.
It can be understood that, in the operation process of the neural network model, the fixed-point type video, audio, image, text and other data are used as the fixed-point input data of the quantized threshold relu activation function to perform activation operation, so as to obtain the fixed-point output data of the activation operation. The resulting fixed point output data for the activation operation is video data, audio data, image data, text data, or the like. When the input data for performing the activation operation is image data, the obtained fixed point output data is also image data; when the input data for performing the activation operation is video data, the obtained fixed point output data is also the video data; when the input data for performing the activation operation is audio data, the obtained fixed-point output data is also audio data; when the input data for performing the activation operation is text data, the obtained fixed point output data is also text data.
For example, according to the flow shown in fig. 5, the quantized threshold relu activation function corresponding to the threshold parameter a of 4.1 is formula 8, and the fixed-point-type image data value 3 is substituted into formula 8, and it is found that the fixed-point-type image data value 3 is greater than the fixed-point threshold parameter a q Value 2, so fixed point output data y q If the number is 3, the value of the image data obtained by performing the activation operation is 3.
It is understood that the execution sequence of steps S801 to S803 is only an example, and in other embodiments, other execution sequences may also be adopted, and some steps may also be split or merged, which is not limited herein.
It will be appreciated that the neural network model deployed to the terminal 100 may also be run by the server 200.
FIG. 9 illustrates a data processing apparatus diagram of a neural network model, the apparatus comprising, as shown:
the first determining unit is used for determining floating point data to be processed in the neural network model;
the first acquisition unit is used for quantizing the determined floating point data to obtain a fixed point number through a second quantization parameter, wherein the second quantization parameter is determined according to a floating point threshold parameter;
and the second obtaining unit is used for taking the obtained fixed point number as fixed point input data of a ThresholdReLU activation operation included in the neural network model to obtain fixed point output data, wherein the ThresholdReLU activation operation is associated with the second quantization parameter.
To facilitate understanding of the technical solutions of the embodiments of the present application, a hardware structure of the terminal 100 is described below.
Further, fig. 10 illustrates a schematic structural diagram of a terminal 100, according to some embodiments of the present application. As shown in fig. 10, terminal 100 includes one or more processors 101, a system Memory 102, a Non-Volatile Memory (NVM) 103, a communication interface 104, an input/output (I/O) device 105, system control logic 106, and instructions 107.
Wherein: the processor 101 may include one or more Processing units, for example, Processing modules or Processing circuits that may include a central Processing Unit (cpu), (central Processing Unit), an image processor (gpu), (graphics Processing Unit), a digital Signal processor (dsp), (digital Signal processor), a microprocessor MCU (Micro-programmed Control Unit), an AI (Artificial Intelligence) processor, or a Programmable logic device (fpga), (field Programmable Gate array) may include one or more single or multi-core processors. The AI (Artificial Intelligence) processor includes a Neural Network Processing Unit (NPU), an AIPU (international advanced Processing Unit), and the like.
The system Memory 102 is a volatile Memory, such as a Random-Access Memory (RAM), a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and the like. The system memory is used for temporarily storing data and/or instructions, for example, in some embodiments, the system memory 102 may be used for storing the related instructions for performing the neural network model data processing method, and the like.
Non-volatile memory 103 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. In some embodiments, the non-volatile memory 103 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device, such as a Hard Disk Drive (HDD), Compact Disc (CD), Digital Versatile Disc (DVD), Solid-State Drive (SSD), and the like. In some embodiments, the non-volatile memory 103 may also be a removable storage medium, such as a Secure Digital (SD) memory card or the like.
In particular, system memory 102 and non-volatile storage 103 may each include: a temporary copy and a permanent copy of instruction 107. The instructions 107 may include: the data processing method for the terminal 100 to implement the neural network model provided in the embodiments of the present application is performed by at least one of the processors 101.
The communication interface 104 may include a transceiver to provide a wired or wireless communication interface for the terminal 100 to communicate with any other suitable device over one or more networks. In some embodiments, the communication interface 104 may be integrated with other components of the terminal 100, for example the communication interface 104 may be integrated in the processor 101. In some embodiments, the terminal 100 may communicate with other devices through the communication interface 104.
Input/output (I/O) devices 105 can include input devices such as a keyboard, mouse, etc., output devices such as a display, etc., and a user can interact with terminal 100 through input/output (I/O) devices 105.
System control logic 106 may include any suitable interface controllers to provide any suitable interfaces with the other modules of terminal 100. For example, in some embodiments, system control logic 106 may include one or more memory controllers to provide an interface to system memory 102 and non-volatile memory 103.
In some embodiments, at least one of the processors 101 may be packaged together with logic for one or more controllers of the System control logic 106 to form a System In Package (SiP). In other embodiments, at least one of the processors 101 may also be integrated on the same Chip with logic for one or more controllers of the System control logic 106 to form a System-on-Chip (SoC).
It is understood that the terminal 100 may be any electronic device capable of operating a neural network model, including but not limited to a mobile phone, a wearable device (e.g., a smart watch, etc.), a tablet, a desktop, a laptop, a handheld computer, a notebook, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a Personal Digital Assistant (PDA), an Augmented Reality (AR)/Virtual Reality (VR) device, etc., and the embodiments of the present application are not limited thereto.
It will be appreciated that the configuration of terminal 100 shown in fig. 10 is merely an example, and in other embodiments, terminal 100 may include more or fewer components than shown, or some components may be combined, or some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this Application, a processing system includes any system having a Processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.
The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.
In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, Read-Only memories (CD-ROMs), magneto-optical disks, Read-Only memories (ROMs), Random Access Memories (RAMs), Erasable Programmable Read-Only memories (EPROMs), Electrically Erasable Programmable Read-Only memories (EEPROMs), magnetic or optical cards, flash Memory, or tangible machine-readable memories for transmitting information (e.g., carrier waves, infrared digital signals, etc.) using the Internet to transmit information in an electrical, optical, acoustical or other form of propagated signals. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).
In the drawings, some features of structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, may not be included or may be combined with other features.
It should be noted that, in the embodiments of the apparatuses in the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or may be a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solve the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules which are not so closely related to solve the technical problems presented in the present application, which does not indicate that no other units/modules exist in the above-mentioned device embodiments.
It is noted that in the examples and specification of this patent, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.
While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims (16)

1. A data processing method of a neural network model is applied to electronic equipment and is characterized by comprising the following steps:
determining floating point type data to be processed in a neural network model;
quantizing the floating point type data through a post-quantization parameter to obtain a fixed point number, wherein the post-quantization parameter is determined according to a preset floating point threshold parameter;
and taking the fixed point number as fixed point input data of a first threshold correction linear unit activation operation included in the neural network model to obtain fixed point output data, wherein the first threshold correction linear unit activation operation is associated with the post-quantization parameter.
2. The method of claim 1, wherein the post-quantization parameter is determined according to the floating-point threshold parameter and a pre-quantization parameter.
3. The method of claim 1 or 2, wherein the post-quantization parameter is determined by a method comprising,
determining a pre-quantization parameter according to the range of floating point type data to be processed in the neural network model and the data type of the fixed point number;
and obtaining the post-quantization parameter according to the pre-quantization parameter and the floating point threshold parameter.
4. The method of claim 3, wherein deriving the post-quantization parameter from the pre-quantization parameter and the floating-point threshold parameter comprises,
quantizing the floating point threshold parameter according to the pre-quantization parameter to obtain a first fixed point threshold parameter;
and obtaining the post-quantization parameter according to the first fixed point threshold parameter and the floating point threshold parameter.
5. The method of claim 4, wherein deriving the post-quantization parameter from the first floating-point threshold parameter and the floating-point threshold parameter comprises,
under the condition that the first fixed point threshold parameter is an odd number, obtaining the post-quantization parameter according to an odd number quantization parameter formula, wherein the odd number quantization parameter formula comprises the first fixed point threshold parameter and the floating point threshold parameter;
and under the condition that the first fixed point threshold parameter is an even number, obtaining the post-quantization parameter according to an even number quantization parameter formula, wherein the even number quantization parameter formula comprises the first fixed point threshold parameter and the floating point threshold parameter.
6. The method of claim 4, wherein deriving the post-quantization parameter from the first floating-point threshold parameter and the floating-point threshold parameter comprises,
and obtaining the post-quantization parameter according to a combined quantization parameter formula, wherein the combined quantization parameter formula comprises the first floating point threshold parameter and the floating point threshold parameter.
7. The method of claim 5 or 6, wherein the first threshold modifying linear unit activation operation is determined based on the post-quantization parameter and the floating-point threshold parameter.
8. The method of claim 1 or 7, wherein the first threshold modifying linear cell activation operation is determined by a method comprising,
quantizing the floating point threshold parameter according to the post-quantization parameter to obtain a second fixed point threshold parameter;
and obtaining a threshold correction linear unit activation function for performing the first threshold correction linear unit activation operation according to the second fixed point threshold parameter.
9. The method of claim 3, wherein the range of floating point data to be processed in the neural network model is determined during operation of the neural network model.
10. The method of claim 3, wherein the data type of the fixed-point number comprises at least one of: int32, int16, int8, int4, uint32, uint16, uint8 or uint 4.
11. The method of claim 1, wherein the floating point type data to be processed in the neural network model is obtained based on at least one of image data, audio data, text data, and video data.
12. The method of claim 1, wherein the quantization is a symmetric quantization.
13. A data processing apparatus of a neural network model, comprising,
the first determining unit is used for determining floating point type data to be processed in the neural network model;
the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for quantizing the floating point type data to obtain fixed point numbers through post-quantization parameters, and the post-quantization parameters are determined according to preset floating point threshold parameters;
and the second acquisition unit is used for taking the fixed point number as fixed point input data of activation operation of a first threshold correction linear unit included in the neural network model to obtain fixed point output data, wherein the activation operation of the first threshold correction linear unit is associated with the post-quantization parameter.
14. A computer-readable storage medium having stored thereon instructions for performing the method of any one of claims 1-12 on an electronic device.
15. A computer program product comprising instructions for implementing the method of any one of claims 1-12 when executed by one or more processors.
16. An electronic device, comprising:
a memory for storing instructions, an
One or more processors that, when executed by the one or more processors, perform the method recited by any of claims 1-12.
CN202210556167.0A 2022-05-20 2022-05-20 Data processing method and medium of neural network model and electronic device Pending CN114841325A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210556167.0A CN114841325A (en) 2022-05-20 2022-05-20 Data processing method and medium of neural network model and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210556167.0A CN114841325A (en) 2022-05-20 2022-05-20 Data processing method and medium of neural network model and electronic device

Publications (1)

Publication Number Publication Date
CN114841325A true CN114841325A (en) 2022-08-02

Family

ID=82572384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210556167.0A Pending CN114841325A (en) 2022-05-20 2022-05-20 Data processing method and medium of neural network model and electronic device

Country Status (1)

Country Link
CN (1) CN114841325A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115294108A (en) * 2022-09-29 2022-11-04 深圳比特微电子科技有限公司 Target detection method, target detection model quantification device, and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115294108A (en) * 2022-09-29 2022-11-04 深圳比特微电子科技有限公司 Target detection method, target detection model quantification device, and medium
CN115294108B (en) * 2022-09-29 2022-12-16 深圳比特微电子科技有限公司 Target detection method, target detection model quantification device, and medium

Similar Documents

Publication Publication Date Title
US20240104378A1 (en) Dynamic quantization of neural networks
US11393492B2 (en) Voice activity detection method, method for establishing voice activity detection model, computer device, and storage medium
US10713818B1 (en) Image compression with recurrent neural networks
WO2021179587A1 (en) Neural network model quantification method and apparatus, electronic device and computer-readable storage medium
EP4087239A1 (en) Image compression method and apparatus
CN110929865B (en) Network quantification method, service processing method and related product
US20200302283A1 (en) Mixed precision training of an artificial neural network
CN112687266B (en) Speech recognition method, device, computer equipment and storage medium
CN110728350A (en) Quantification for machine learning models
CN114612996A (en) Method for operating neural network model, medium, program product, and electronic device
CN109960484B (en) Audio volume acquisition method and device, storage medium and terminal
CN114841325A (en) Data processing method and medium of neural network model and electronic device
US10318891B1 (en) Geometry encoder
WO2021012148A1 (en) Data processing method and apparatus based on deep neural network, and mobile device
CN114328898A (en) Text abstract generating method and device, equipment, medium and product thereof
CN112580805A (en) Method and device for quantizing neural network model
CN110570877B (en) Sign language video generation method, electronic device and computer readable storage medium
CN112101543A (en) Neural network model determination method and device, electronic equipment and readable storage medium
US10891758B2 (en) Geometry encoder
CN112561050B (en) Neural network model training method and device
CN115705486A (en) Method and device for training quantitative model, electronic equipment and readable storage medium
US11861452B1 (en) Quantized softmax layer for neural networks
CN116306709A (en) Data processing method, medium and electronic equipment
CN110852202A (en) Video segmentation method and device, computing equipment and storage medium
US20210303975A1 (en) Compression and decompression of weight values

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination