CN114444686A - Method and device for quantizing model parameters of convolutional neural network and related device - Google Patents

Method and device for quantizing model parameters of convolutional neural network and related device Download PDF

Info

Publication number
CN114444686A
CN114444686A CN202111676325.8A CN202111676325A CN114444686A CN 114444686 A CN114444686 A CN 114444686A CN 202111676325 A CN202111676325 A CN 202111676325A CN 114444686 A CN114444686 A CN 114444686A
Authority
CN
China
Prior art keywords
layer
neural network
model parameters
convolutional neural
convolutional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111676325.8A
Other languages
Chinese (zh)
Inventor
温东超
梁玲燕
赵雅倩
史宏志
崔星辰
张英杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN202111676325.8A priority Critical patent/CN114444686A/en
Publication of CN114444686A publication Critical patent/CN114444686A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a method for quantizing model parameters of a convolutional neural network, which comprises the following steps: acquiring a data set of a digital classification task; constructing a convolutional neural network of a digital classification task; training the convolutional neural network by using the data set, wherein the trained convolutional neural network is used for executing a digital classification task; wherein model parameters are quantized based on a sign function in a forward propagation process of the quantized convolutional layer; the model parameters include weights and activation values; quantifying the model parameters based on a parameterized hyperbolic tangent function in a back propagation process of the quantified convolutional layer; the parameters in the hyperbolic tangent function are determined based on the data distribution of the model parameters of the quantized convolutional layer, so that the model precision of digital classification can be improved. The application also discloses a model parameter quantization device of the convolutional neural network, a computer readable storage medium and an electronic device, which have the above beneficial effects.

Description

Method and device for quantizing model parameters of convolutional neural network and related device
Technical Field
The present disclosure relates to the field of neural network technologies, and in particular, to a method and an apparatus for quantizing a model parameter of a convolutional neural network, and a related apparatus.
Background
In recent years, deep learning techniques have been greatly developed in the field of computer vision (e.g., digital classification, object detection, instance segmentation, face recognition, etc.), and are widely applied to various practical scenes. To achieve high performance (high accuracy), engineers typically design various computer vision algorithms using neural networks (e.g., residual networks ResNet50 or ResNet100) as backbone networks, which are up to tens or even hundreds of layers deep. In practical application, a high-performance GPU card is carried on an application server to complete a model reasoning task. However, without the support of high-performance GPU cards, the application of these computer vision algorithms will be limited.
How to reduce storage and calculation amount, that is, model lightweight, is a technology field that is receiving great attention. The typical deep convolutional neural network model lightweight method comprises the following steps: model pruning (pruning), low rank matrix decomposition, low bit quantization, etc. Among these methods, a low bit quantization technique (particularly, a binary quantization technique) is the most effective model weight reduction technique. The binary convolution neural network replaces floating-point operation with bit-wise operation (bit-wise operation), and the model inference speed can be remarkably accelerated by using bit operation instructions of a CPU (central processing unit), a GPU (graphic processing unit) or other special equipment.
Although the problem of gradient vanishing during training can be solved by using the above approximation function instead of the derivative of the sign function in the back propagation process, such a design results in a mismatch between the forward and reverse functions. The mismatch between the forward and reverse functions results in a poor model accuracy.
Therefore, how to improve the model accuracy of the digital classification is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The present application aims to provide a method and an apparatus for quantizing model parameters of a convolutional neural network, a computer-readable storage medium, and an electronic device, which can improve the model accuracy of digital classification.
In order to solve the above technical problem, the present application provides a method for quantizing a model parameter of a convolutional neural network, including:
acquiring a data set of a digital classification task; wherein the data set contains digital images and corresponding digital labels;
constructing a convolutional neural network of a digital classification task; the convolutional neural network comprises a floating point convolutional layer, a quantization convolutional layer, a pooling layer, a normalization layer, a nonlinear layer, a full-connection layer, a classification layer and a loss calculation layer;
training the convolutional neural network by using the data set, wherein the trained convolutional neural network is used for executing a digital classification task;
wherein model parameters are quantized based on a sign function in a forward propagation process of the quantized convolutional layer; wherein the model parameters include weights and activation values;
quantifying the model parameters based on a parameterized hyperbolic tangent function in a back propagation process of the quantified convolutional layer; wherein parameters in the hyperbolic tangent function are determined based on a data distribution of model parameters of the quantized convolutional layer.
Optionally, the convolutional neural network includes a plurality of structural units, a full connection layer, a classification layer, and a loss calculation layer, which are sequentially connected;
the structural units comprise a first type structural unit, a second type structural unit and a third type structural unit; the first type structure unit comprises a floating point convolution layer, a normalization layer and a nonlinear layer which are sequentially connected; the second type structural unit comprises a quantization convolution layer, a normalization layer and a nonlinear layer which are connected in sequence; the third type of structural unit comprises a quantization convolution layer, a pooling layer, a normalization layer and a nonlinear layer which are connected in sequence.
Optionally, in the forward propagation process of the quantized convolutional layer, quantizing the model parameter based on a sign function, including:
calculating an intermediate quantization value corresponding to the model parameter based on a symbolic function, and calculating a scaling factor corresponding to the model parameter;
and determining the product of the intermediate quantization value and the scaling factor as the final quantization value of the model parameter.
Optionally, after quantizing the model parameters based on the sign function, the method further includes:
and carrying out inner product operation of bit operation on the quantized value corresponding to the weight and the quantized value corresponding to the activation value to obtain the output of the quantized convolutional layer.
Optionally, the quantizing the model parameters based on the parameterized hyperbolic tangent function includes:
calculating a quantized value corresponding to the model parameter based on a target hyperbolic tangent function;
the target hyperbolic tangent function specifically includes: y (x) ═ c0*tanh(c1*x);
Wherein x is the model parameter, y (x) is the quantization value, c0And c1Is a hyper-parameter, c1And determining data distribution based on the model parameters of the quantized convolutional layer.
Optionally, the method further includes:
calculating c under target constraint1(ii) a Wherein the target constraint condition is that a ratio of the number of target model parameters satisfying a preset condition to the total number of model parameters is greater than or equal to a preset ratio r, and the preset condition is specifically tanh (c)1*x)∈(-1,-r)∪(1,r)。
Optionally, the method further includes:
and determining the corresponding data distribution of the model parameters of the quantized convolutional layer in different intervals.
The present application also provides a model parameter quantization apparatus of a convolutional neural network, the apparatus including:
the acquisition module is used for acquiring a data set of the digital classification task; wherein the data set contains digital images and corresponding digital labels;
the building module is used for building a convolutional neural network of the digital classification task; the convolutional neural network comprises a floating point convolutional layer, a quantization convolutional layer, a pooling layer, a normalization layer, a nonlinear layer, a full-connection layer, a classification layer and a loss calculation layer;
a first quantization module for quantizing model parameters based on a sign function in a forward propagation process of the quantized convolutional layer; wherein the model parameters include weights and activation values;
a second quantization module for quantizing the model parameters based on a hyperbolic tangent function in a back propagation process of the quantized convolutional layer; wherein parameters in the hyperbolic tangent function are determined based on a distribution of model parameters of the quantized convolutional layer;
and the training module is used for training the convolutional neural network by using the data set, and the trained convolutional neural network is used for executing a digital classification task.
The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed, implements the steps performed by the above-described method for quantifying model parameters of a convolutional neural network.
The application also provides an electronic device, which comprises a memory and a processor, wherein the memory is stored with a computer program, and the processor realizes the steps executed by the model parameter quantization method of the convolutional neural network when calling the computer program in the memory.
The application provides a method for quantizing model parameters of a convolutional neural network, which comprises the following steps: acquiring a data set of a digital classification task; wherein the data set contains digital images and corresponding digital labels; constructing a convolutional neural network of a digital classification task; the convolutional neural network comprises a floating point convolutional layer, a quantization convolutional layer, a pooling layer, a normalization layer, a nonlinear layer, a full-connection layer, a classification layer and a loss calculation layer; training the convolutional neural network by using the data set, wherein the trained convolutional neural network is used for executing a digital classification task; wherein model parameters are quantized based on a sign function in a forward propagation process of the quantized convolutional layer; wherein the model parameters include weights and activation values; quantifying the model parameters based on a parameterized hyperbolic tangent function in a back propagation process of the quantified convolutional layer; wherein parameters in the hyperbolic tangent function are determined based on a data distribution of model parameters of the quantized convolutional layer.
The method and the device for the digital classification task are used for obtaining a data set of the digital classification task and constructing a convolutional neural network of the digital classification task. And after the convolutional neural network is trained, performing a digital classification task by using the trained convolutional neural network. In the forward propagation process of the quantization convolution layer, the method quantizes model parameters based on a symbolic function; and quantizing the model parameters based on a parameterized hyperbolic tangent function in the back propagation process of the quantized convolutional layer. The quantization function is dynamically determined according to the data distribution of the weighted value and the activation value in the training process, and different quantization functions are adopted for the weighted value and the activation value in each layer, so that the model precision of digital classification can be improved. The application also provides a model parameter quantization device of the convolutional neural network, a computer readable storage medium and an electronic device, which have the beneficial effects and are not repeated herein.
Drawings
In order to more clearly illustrate the embodiments of the present application, the drawings needed for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
Fig. 1 is a flowchart of a method for quantizing a model parameter of a convolutional neural network according to an embodiment of the present disclosure;
FIG. 2 is a schematic structural diagram of a first type of structural unit (F-type unit) provided in an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a second type structural unit (type A unit) provided in an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a third type of structural unit (type B unit) provided in an embodiment of the present application;
FIG. 5 is a block diagram of a convolutional neural network for training a digital classification system according to an embodiment of the present application;
FIG. 6 shows the result when c0And c1A schematic diagram of the y (x) function when different values are taken;
FIG. 7 shows the result when c0And c1A schematic diagram of the function y' (x) when different values are taken;
fig. 8 is a schematic structural diagram of a model parameter quantization apparatus of a convolutional neural network according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart of a method for quantizing model parameters of a convolutional neural network according to an embodiment of the present disclosure.
The specific steps may include:
s101: acquiring a data set of a digital classification task;
wherein the data set contains digital images and corresponding digital labels. For the numerical classification task, the data sets are images and class labels (label truth). For example: 0 to 9, the input set contains 0 to 9 color or grayscale images, each image is 32 x 32 pixels in size, and the label of each image is the corresponding number. The training sample is divided into 10 subsets, each subset corresponding to a number, and each subset containing at least one image. In order to train a highly accurate model, each subset should contain enough samples (images), e.g. thousands of samples or tens of thousands of samples. Furthermore, training samples should be diversified, for example: including background changes, font changes, lighting changes, etc.
S102: constructing a convolutional neural network of a digital classification task;
the convolutional neural network comprises a floating point convolutional layer, a quantization convolutional layer, a pooling layer, a normalization layer, a nonlinear layer, a full-connection layer, a classification layer and a loss calculation layer.
As a preferred embodiment, the convolutional neural network constructed in this embodiment includes a plurality of structural units, a full connection layer, a classification layer, and a loss calculation layer, which are sequentially connected; the structural units comprise a first type structural unit, a second type structural unit and a third type structural unit; the first type structure unit comprises a floating point convolution layer, a normalization layer and a nonlinear layer which are sequentially connected; the second type structural unit comprises a quantization convolution layer, a normalization layer and a nonlinear layer which are connected in sequence; the third type of structural unit comprises a quantization convolution layer, a pooling layer, a normalization layer and a nonlinear layer which are connected in sequence.
First type of building Block referring to FIG. 2, a first type of building block (F-cell) includes a floating-point convolution layer, a Batch Normalization layer, and a non-linear layer. The arrows in fig. 2 indicate the flow of data. Referring to fig. 3, a schematic structural diagram of a second-type structural unit (a-type unit), a basic structural unit (a-type unit) includes a quantization convolution layer, a Batch Normalization layer and a non-linear layer. The arrows in fig. 3 indicate the flow of data. Third-type structural unit referring to fig. 4, a schematic structural diagram of a third-type structural unit (B-type unit) includes a quantization convolution layer, a pooling layer (Pooling), a normalization layer (Batch normalization) and a non-linear layer. The arrows in fig. 4 indicate the flow of data.
S103: training the convolutional neural network by using the data set, wherein the trained convolutional neural network is used for executing a digital classification task;
wherein model parameters are quantized based on a sign function in a forward propagation process of the quantized convolutional layer; wherein the model parameters include weights and activation values. Quantifying the model parameters based on a parameterized hyperbolic tangent function in a back propagation process of the quantified convolutional layer; wherein parameters in the hyperbolic tangent function are determined based on a data distribution of model parameters of the quantized convolutional layer.
Specifically, the process of quantizing the model parameters based on the symbolic function includes: calculating an intermediate quantization value corresponding to the model parameter based on a symbolic function, and calculating a scaling factor corresponding to the model parameter; and determining the product of the intermediate quantization value and the scaling factor as the final quantization value of the model parameter.
After quantizing the model parameters based on the sign function, an inner product operation of bit operation may be performed on the quantized values corresponding to the weights and the quantized values corresponding to the activation values to obtain an output of the quantized convolutional layer.
In a specific implementation, the forward propagation process of the quantized convolutional layer is as follows: this embodiment calculates a binary quantization weight by formula (1), where W represents the weight of a certain layer (in floating point numbers), and BwRepresenting a binary quantization weight corresponding to the weight W, BwIs the symbol of each element in W. In order to improve the expression accuracy of the neural network, the present embodiment calculates the weight scaling factor s, | W |, by equation (2)1Representing the 1-norm of W, n being the total number of weights. Q (W) represents an approximation of W. sign () is a sign function.
BwSign (w); formula (1)
Figure BDA0003452068540000071
Q(W)=s*Bw(ii) a Formula (3)
The present embodiment calculates a binary quantized activation value by formula (4), where a denotes an activation value of a certain layer, and BARepresenting a binary quantized activation value corresponding to the activation value A, BAIs the symbol of each element in a.
Q(A)=BASign (a); formula (4)
Based on the above equations (3) and (4), the present embodiment calculates the convolution operation by equation (5):
Z=Q(W)⊙Q(A)=Bw⊙BAs; formula (5)
Where Z represents the convolution output and the inner product operation with bit manipulation indicates that the quantization of the activate value also increases the scaling factor.
Further, the present embodiment may quantify the model parameters based on a parameterized hyperbolic tangent function in the following manner:
calculating a quantized value corresponding to the model parameter based on a target hyperbolic tangent function; the target hyperbolic tangent function specifically includes: y (x) ═ c0*tanh(c1X); wherein x is the model parameter, y (x) is the quantization value, c0And c1Is a hyper-parameter, c1And determining data distribution based on the model parameters of the quantized convolutional layer.
In a specific implementation, the back propagation process of neural network training is as follows: the present embodiment proposes to approximate the derivative of the sign function sign () with the derivative of equation (6) (equation (9)). When equation (6) is used for weight quantization, x represents W; when equation (6) is used for activation value quantization, x represents a.
y(x)=c0*tan h(c1X); formula (6)
Figure BDA0003452068540000081
c0=θ(c1) (ii) a Formula (5)
y'(x)=c0*c1*(1-tanh2(c1X)); formula (9)
In the above equation (6), x represents the input of the quantization function,y (x) represents the output of the quantization function of input x. tanh () is a hyperbolic tangent function. "" denotes multiplication by element. c. C0And c1Is a hyper-parameter.
In equation (7), t represents the number of iterations of training, e.g., from the first training iteration to the last training iteration. For example: the total number of iterations is 500 and t ranges from an integer of 1 to 500 or from 0 to 499.
Figure BDA0003452068540000082
Indicating that c is determined from the distribution of data x at the time t of the training iteration1
As a possible implementation, the embodiment may also calculate c under the target constraint condition1(ii) a Wherein the target constraint condition is that a ratio of the number of target model parameters satisfying a preset condition to the total number of model parameters is greater than or equal to a preset ratio r, and the preset condition is specifically tanh (c)1*x)∈(-1,-r)∪(1,r)。
In one embodiment, a ratio r is determined based on the total number of experimental iterations and the current number of iterations (e.g., r is 5% at the beginning of training, 95% at the end of training, and r is increased from 5% to 95% during training). C is determined based on the ratio r and the data distribution1So that at least the elements of ratio r are mapped between (-1.0, -0.95) and (0.95, 1.0) by the function tanh (). 0.95 is an empirical value, other values may be used, such as: 0.98. this embodiment recommends the use of values between (0.95, 0.99).
Further, in this embodiment, the data distribution of the model parameter of the quantized convolutional layer corresponding to different intervals may also be determined.
One way of calculating the above data distribution is as follows:
StepA 1: calculating the maximum value v _ max of the input x of the quantization function;
StepA2, dividing the interval [0, v _ max ] into a plurality of bins according to the maximum value v _ max;
StepA3 a number of about zero in the input x is projected into the bin in StepA2 to obtain a histogram of the data distribution.
Another way of calculating the above data distribution is as follows:
StepB 1: calculating an absolute value of each element in an input x of a quantization function;
StepB 2: calculating a maximum value v _ max of the absolute value of each element in the input x of the quantization function;
StepB3, dividing the interval [0, v _ max ] into a plurality of bins according to the maximum value v _ max;
StepB4 the absolute value of each element in input x is projected into the bin in StepB3, thus obtaining a histogram of the data distribution.
In the above equation (8), a simplified form is c0=1/c1. Some expanded forms c0=k/c1. k is a spreading factor, for example: k is 2 or 3.
Equation (9) represents the derivative of equation (6) y (x). Please refer to fig. 6 and 7, wherein fig. 6 is a graph of c0And c1Schematic of the function y (x) when different values are taken, y (x) c0*tanh(c1X); FIG. 7 shows the result when c0And c1Schematic diagram of y '(x) function when taking different values, y' (x) being c0*c1*(1-tanh2(c1*x))。
The embodiment acquires a data set of the digital classification task and constructs a convolutional neural network of the digital classification task. And after the convolutional neural network is trained, performing a digital classification task by using the trained convolutional neural network. In the forward propagation process of the quantized convolutional layer, the present embodiment quantizes model parameters based on a sign function; and quantizing the model parameters based on a parameterized hyperbolic tangent function in the back propagation process of the quantized convolutional layer. The quantization function of the present embodiment is dynamically determined according to the data distribution of the weight value and the activation value in the training process, and different quantization functions are applied to the weight value and the activation value in each layer, so that the present embodiment can improve the model accuracy of the digital classification.
The flow described in the above embodiment is explained below by an embodiment in practical use.
In a general convolutional neural network, model parameters of the convolutional neural network (such as weight values of convolutional layers in the convolutional neural network) are expressed by floating point numbers; the activation values generated by the convolutional neural network on the fly are also expressed in floating point numbers. The model parameters generated by the convolutional neural network quantization algorithm only comprise two values of +1 and-1; the activation values generated during the operation of the convolutional neural network are also expressed as +1 and-1. Under the configuration, the convolution operation of the convolution neural network can be realized by bit-wise operation, so that the inference operation of the convolution neural network is accelerated, and the storage overhead in the inference process of the convolution neural network is saved. Operation refers to operations such as convolution, pooling in a neural network.
In a typical binary convolutional neural network training, floating point weight values and floating point activation values are quantized to the binary values { -1, +1} using a piece-wise function during forward propagation. The most commonly used piecewise function is the sign function sign (x) as follows.
Figure BDA0003452068540000091
Because of the derivative values of the above-mentioned sign functions
Figure BDA0003452068540000101
Almost everywhere, the gradient in the counter-propagation process disappears. The disappearance of the gradient causes the training process of the neural network to be incapable of converging and obtaining a high-precision model through training.
Figure BDA0003452068540000102
To solve the gradient vanishing problem, the derivative of the sign function described above is typically replaced approximately with the following function:
Figure BDA0003452068540000103
although the problem of gradient vanishing during training can be solved by using the above approximation function instead of the derivative of the sign function in the back propagation process, such a design results in a mismatch between the forward and reverse functions. The mismatch between the forward and reverse functions results in a poor model accuracy.
In the related art, a binary convolution neural network quantization method is provided, which uses a parameterized hyperbolic tangent function tanh (v × x) as a quantization function, where v is a hyperparameter and x is an argument. At the beginning of the training, v takes a small value (e.g., v ═ 1), and during the course of the training iteration, v gradually increases to a very large value (e.g., v ═ 1000), and tanh (v ×) will approximate the sign function sign (x).
Figure BDA0003452068540000104
The derivative of the above function is:
Figure BDA0003452068540000105
when training is finished, the model parameters are quantized by using a sign function (sign function), so that a binary convolution neural network model is obtained. In the process of model inference, the above document quantizes the activation value using a sign function (sign function) to obtain a binarized activation value, and then performs convolution operation using bit operation.
The above method has the following two problems: setting a v value according to the experience of a development engineer without considering the data distribution characteristics of a weight value and an activation value; ② the above method uses the same "quantization function" in each layer of the whole network without considering the data distribution characteristics of each layer. The two problems can cause the training process of the neural network to be unstable and the model precision to be low.
In order to solve the above technical problem, an embodiment of the present application provides a convolutional neural network quantization scheme, and in a training process of a convolutional neural network, a binary convolutional neural network may be generated by quantizing a weight value and an activation value using the scheme provided in this embodiment. The present embodiment proposes an adaptive quantization function, and the parameter of the quantization function is dynamically determined according to the data distribution of the weight value of the convolutional neural network model and the data distribution of the activation value of the convolutional neural network, without depending on the personal experience of an engineer. Further, the present embodiment proposes a layer-by-layer quantization solution, and a specific quantization function is applied to each layer. By utilizing the two invention points, the precision of the binary convolution neural network obtained by training can be improved, and the convergence speed of the binary convolution neural network is accelerated. The specific implementation of this embodiment may be a digital classification task, which specifically includes the following steps:
step 1: a task related data set is input.
The data set related to the task includes a training set and a labeling truth value.
Step 2: a neural network is constructed.
The convolutional neural network of the present embodiment is composed of convolutional layers, batch layers, pooling layers, full-link layers, nonlinear layers, and the like.
Referring to fig. 5, fig. 5 is a structural diagram of a convolutional neural network for training a digital classification system according to an embodiment of the present application, and the neural network for digital classification shown in fig. 5 can be constructed by using the three basic structural units shown in fig. 2, fig. 3, and fig. 4.
The A1 structural unit comprises:
floating-point convolution layer: the convolution layer is a floating point convolution operation with a parameter of 3 x (3 x 3) x 128. The first 3 indicates the number of channels of the input image (the number of channels is set to 3 when the input image is a color image; the number of channels is set to 1 when the input image is a single-channel grayscale image); (3 x 3) represents a convolution kernel whose width is 3 and whose height is also 3; 128 denotes the total number of convolution kernels, which is also the number of channels in the output layer. In order to keep the size of the output feature map consistent with that of the input image, zero padding operation is carried out on the input image when convolution operation is carried out, and the step size of the convolution operation is 1.
A normalization layer: the Normalization layer is a Batch Normalization layer.
Non-linear layer: this example uses HardTanh as the nonlinear layer. The mathematical expression for the non-linear layer is:
Figure BDA0003452068540000111
in the above formula x represents the input.
The A2 structural unit comprises:
quantifying the convolutional layer: convolution is a quantitative convolution operation with a parameter of 128 x (3 x 3) x 128, where the first 128 represents the number of channels of the previous layer of output signatures; (3 x 3) denotes the convolution kernel, which is 3 wide and 3 high. 128 denotes the total number of convolution kernels. In order to keep the size of the output characteristic diagram consistent with that of the input image, zero padding operation is carried out on the input characteristic diagram during convolution operation, and the step size of the convolution operation is 1.
A normalization layer: the Normalization layer is a Batch Normalization layer.
Non-linear layer: this example uses HardTanh as the nonlinear layer.
The B1 structural unit comprises:
quantifying the convolutional layer: convolution is a quantitative convolution operation with a parameter of 128 × 3 × 128, where the first 128 represents the number of channels of the previous layer of output feature map; (3 x 3) denotes the convolution kernel, which is 3 wide and 3 high. 128 denotes the total number of convolution kernels. In order to keep the size of the output characteristic diagram consistent with that of the input image, zero padding operation is carried out on the input characteristic diagram during convolution operation, and the step size of the convolution operation is 1.
Pooling layer (pooling): the present invention uses a maximum pooling (max-pooling) operation wherein: the pooling kernel is 2 x 2 and the step size is 2. The resolution of the feature map output through the pooling layer is 1/2 the resolution of the input feature map.
A normalization layer: the Normalization layer is a Batch Normalization layer.
Non-linear layer: this example uses HardTanh as the nonlinear layer.
The A3 structural unit comprises:
quantifying the convolutional layer: convolution is a quantitative convolution operation with a parameter of 128 × 3 × 256, where the first 128 represents the number of channels of the previous layer of output feature map; (3 x 3) denotes the convolution kernel, which is 3 wide and 3 high. 256 denotes the total number of convolution kernels. In order to keep the size of the output characteristic diagram consistent with that of the input image, zero padding operation is carried out on the input characteristic diagram during convolution operation, and the step size of the convolution operation is 1.
A normalization layer: the Normalization layer is a Batch Normalization layer.
Non-linear layer: this example uses HardTanh as the nonlinear layer.
The A4 structural unit comprises:
quantifying the convolutional layer: convolution is a quantitative convolution operation with parameters 256 x (3 x 256), where the first 256 represents the number of channels of the previous layer of output signatures; (3 x 3) denotes the convolution kernel, which is 3 wide and 3 high. 256 denotes the total number of convolution kernels. In order to keep the size of the output characteristic diagram consistent with that of the input image, zero padding operation is carried out on the input characteristic diagram during convolution operation, and the step size of the convolution operation is 1.
A normalization layer: the Normalization layer is a Batch Normalization layer.
Non-linear layer: this example uses HardTanh as the nonlinear layer.
The B2 structural unit comprises:
quantifying the convolutional layer: convolution is a quantitative convolution operation with parameters 256 × 256 (3 × 3) × 256, where the first 256 represents the number of channels of the previous layer of output feature map; (3 x 3) denotes the convolution kernel, which is 3 wide and 3 high. 256 denotes the total number of convolution kernels. In order to keep the size of the output characteristic diagram consistent with that of the input image, zero padding operation is carried out on the input characteristic diagram during convolution operation, and the step size of the convolution operation is 1.
A pooling layer: this example uses a maximum pooling operation, where: the pooling kernel is 2 x 2 and the step size is 2 the resolution of the feature map output through the pooling layer is 1/2 the resolution of the input feature map.
A normalization layer: the Normalization layer is a Batch Normalization layer.
Non-linear layer: this example uses HardTanh as the nonlinear layer.
The A5 structural unit comprises:
quantifying the convolutional layer: convolution is a quantitative convolution operation with a parameter of 256 × 3 × 512, where the first 256 represents the number of channels of the previous layer of output feature map; (3 x 3) denotes the convolution kernel, which is 3 wide and 3 high. 512 denotes the total number of convolution kernels. In order to keep the size of the output characteristic diagram consistent with that of the input image, zero padding operation is carried out on the input characteristic diagram during convolution operation, and the step size of the convolution operation is 1.
A normalization layer: the Normalization layer is a Batch Normalization layer.
Non-linear layer: this example uses HardTanh as the nonlinear layer.
The A6 structural unit comprises:
quantifying the convolutional layer: convolution is a quantitative convolution operation with a parameter of 512 × 512 (3 × 3), where the first 512 represents the number of channels of the previous layer of output feature map; (3 x 3) denotes the convolution kernel, which is 3 wide and 3 high. 512 denotes the total number of convolution kernels. In order to keep the size of the output characteristic diagram consistent with that of the input image, zero padding operation is carried out on the input characteristic diagram during convolution operation, and the step size of the convolution operation is 1.
Normalization layer: the Normalization layer is a Batch Normalization layer.
Non-linear layer: this example uses HardTanh as the nonlinear layer.
The A7 structural unit comprises:
quantifying the convolutional layer: convolution is a quantitative convolution operation with a parameter of 512 × 512 (3 × 3), where the first 512 represents the number of channels of the previous layer of output feature map; (3 x 3) denotes the convolution kernel, which is 3 wide and 3 high. 512 denotes the total number of convolution kernels. In order to keep the size of the output characteristic diagram consistent with that of the input image, zero padding operation is carried out on the input characteristic diagram during convolution operation, and the step size of the convolution operation is 1.
A normalization layer: the Normalization layer is a Batch Normalization layer.
Non-linear layer: this example uses HardTanh as the nonlinear layer.
The B3 structural unit comprises:
quantifying the convolutional layer: convolution is a quantitative convolution operation with a parameter of 512 × 512 (3 × 3), where the first 512 represents the number of channels of the previous layer of output feature map; (3 x 3) denotes the convolution kernel, which is 3 wide and 3 high. 512 denotes the total number of convolution kernels. In order to keep the size of the output characteristic diagram consistent with that of the input image, zero padding operation is carried out on the input characteristic diagram when convolution operation is carried out, and the step size of the convolution operation is 1.
A pooling layer: this example uses a maximum pooling operation, where: the pooling kernel is 2 x 2 and the step size is 2 the resolution of the feature map output through the pooling layer is 1/2 the resolution of the input feature map.
Normalization layer: the Normalization layer is a Batch Normalization layer.
Non-linear layer: this example uses HardTanh as the nonlinear layer.
The fully-connected tier is a floating-point fully-connected tier operation with a parameter of 512 x 4 x 10, where 512 corresponds to the number of channels of the output profile of the B3 tier; 4 x 4 corresponds to the width and height of the output feature map of B3 layers; 10 corresponds to the total number of digit classes (i.e., 10 digits). The Softmax layer corresponds to the Softmax function. Loss layer: for the numerical classification problem, cross-entropy loss is used here as a loss function.
And step 3: and training the neural network.
In this embodiment, sgd (m) is used to optimize the weight parameters, the initial learning rate is 0.01, momentum is 0.9, weight decay is 1 × 10-4, the total training epoch number is 500, the 250 th epoch and the 400 th epoch are determined, and the learning rate is reduced 1/10. SGD/SGD (m) refers to the random gradient descent/entrainment random gradient descent. In the embodiment, the training rules of the conventional deep neural network are used for training other operations except the quantized convolutional layer, and the model parameters are quantized based on the symbolic function in the forward propagation process of the quantized convolutional layer; wherein the model parameters include weights and activation values. Quantifying the model parameters based on a parameterized hyperbolic tangent function in a back propagation process of the quantified convolutional layer; wherein parameters in the hyperbolic tangent function are determined based on a data distribution of model parameters of the quantized convolutional layer.
And 4, step 4: neural network reasoning.
After the neural network training is finished, saving the binary quantization weight B of the quantization convolution layerwAnd S. The present invention processes other layers than the quantization convolution layer using the operation of a conventional deep neural network. In the quantization convolution layer, when neural network reasoning is carried out and the current layer outputs an activation value A, sign () is adopted to convert A into BAThen, the convolution operation is performed using equation (5).
The difference between the present embodiment and the related art is as follows: the quantization function of the embodiment is dynamically determined according to the data distribution of the weight value and the activation value in the training process; the related art is preset according to the experience of engineers. In the embodiment, different quantization functions are adopted for the weight values and the activation values in each layer; the related art employs the same quantization function. In the embodiment, the data distribution characteristics of different layers are considered, and different quantization function correlation techniques are adopted for each layer to use the same quantization function in different layers.
The embodiment provides a new binary quantization scheme, and a dynamic quantization mechanism and a layer-by-layer quantization mechanism are introduced, so that the training process is more stable, and the model precision is higher. The method provided by the invention is not limited to specific tasks, and can be used for image classification tasks, and can also be used for target detection and image segmentation tasks.
Referring to fig. 8, fig. 8 is a schematic structural diagram of a model parameter quantization apparatus of a convolutional neural network according to an embodiment of the present disclosure;
the apparatus may include:
an obtaining module 801, configured to obtain a data set of a digital classification task; wherein the data set contains digital images and corresponding digital labels;
a building module 802 for building a convolutional neural network of the digital classification task; the convolutional neural network comprises a floating point convolutional layer, a quantization convolutional layer, a pooling layer, a normalization layer, a nonlinear layer, a full-connection layer, a classification layer and a loss calculation layer;
a first quantization module 803, configured to quantize the model parameters based on a sign function in a forward propagation process of the quantized convolutional layer; wherein the model parameters include weights and activation values;
a second quantization module 804, configured to quantize the model parameter based on a hyperbolic tangent function in a back propagation process of the quantized convolutional layer; wherein parameters in the hyperbolic tangent function are determined based on a distribution of model parameters of the quantized convolutional layer;
a training module 805 configured to train the convolutional neural network using the data set, where the trained convolutional neural network is used to perform a digital classification task.
The embodiment acquires a data set of the digital classification task and constructs a convolutional neural network of the digital classification task. And after the convolutional neural network is trained, performing a digital classification task by using the trained convolutional neural network. In the forward propagation process of the quantized convolutional layer, the present embodiment quantizes model parameters based on a sign function; and quantizing the model parameters based on a parameterized hyperbolic tangent function in the back propagation process of the quantized convolutional layer. The quantization function of the embodiment is dynamically determined according to the data distribution of the weight value and the activation value in the training process, and different quantization functions are adopted for the weight value and the activation value in each layer, so that the embodiment can improve the model accuracy.
Further, the convolutional neural network comprises a plurality of structural units, a full connection layer, a classification layer and a loss calculation layer which are sequentially connected;
the structural units comprise a first type structural unit, a second type structural unit and a third type structural unit; the first type structure unit comprises a floating point convolution layer, a normalization layer and a nonlinear layer which are sequentially connected; the second type structural unit comprises a quantization convolution layer, a normalization layer and a nonlinear layer which are connected in sequence; the third type of structural unit comprises a quantization convolution layer, a pooling layer, a normalization layer and a nonlinear layer which are connected in sequence.
Further, the first quantization module 803 is configured to calculate an intermediate quantization value corresponding to the model parameter based on a sign function, and calculate a scaling factor corresponding to the model parameter; and for determining a product of the intermediate quantization value and the scaling factor as a final quantization value of the model parameter.
Further, the method also comprises the following steps:
and the quantized convolutional layer output module is used for performing inner product operation of bit operation on the quantized value corresponding to the weight and the quantized value corresponding to the activation value after the model parameters are quantized based on the symbolic function to obtain the output of the quantized convolutional layer.
Further, the second quantization module 804 is configured to calculate a quantization value corresponding to the model parameter based on a target hyperbolic tangent function; the target hyperbolic tangent function is specifically as follows: y (x) ═ c0*tanh(c1X); wherein x is the model parameter, y (x) is the quantization value, c0And c1Is a hyper-parameter, c1And determining data distribution based on the model parameters of the quantized convolutional layer.
Further, the method also comprises the following steps:
a calculation module for calculating c under the target constraint condition1(ii) a Wherein the target constraint condition is that a ratio of the number of target model parameters satisfying a preset condition to the total number of model parameters is greater than or equal to a preset ratio r, and the preset condition is specifically tanh (c)1*x)∈(-1,-r)∪(1,r)。
Further, the method also comprises the following steps:
and the data distribution determining module is used for determining the corresponding data distribution of the model parameters of the quantized convolutional layer in different intervals.
Since the embodiments of the apparatus portion and the method portion correspond to each other, please refer to the description of the embodiments of the method portion for the embodiments of the apparatus portion, which is not repeated here.
The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed, may implement the steps provided by the above-described embodiments. The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The application further provides an electronic device, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided by the foregoing embodiments when calling the computer program in the memory. Of course, the electronic device may also include various network interfaces, power supplies, and the like.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A method for quantizing model parameters of a convolutional neural network, comprising:
acquiring a data set of a digital classification task; wherein the data set contains digital images and corresponding digital labels;
constructing a convolutional neural network of a digital classification task; the convolutional neural network comprises a floating point convolutional layer, a quantization convolutional layer, a pooling layer, a normalization layer, a nonlinear layer, a full-connection layer, a classification layer and a loss calculation layer;
training the convolutional neural network by using the data set, wherein the trained convolutional neural network is used for executing a digital classification task;
wherein model parameters are quantized based on a sign function in a forward propagation process of the quantized convolutional layer; wherein the model parameters include weights and activation values;
quantifying the model parameters based on a parameterized hyperbolic tangent function in a back propagation process of the quantified convolutional layer; wherein parameters in the hyperbolic tangent function are determined based on a data distribution of model parameters of the quantized convolutional layer.
2. The model parameter quantization method of claim 1, wherein the convolutional neural network comprises a plurality of structural units, a fully-connected layer, a classification layer, and a loss computation layer, which are sequentially connected;
the structural units comprise a first type structural unit, a second type structural unit and a third type structural unit; the first type structure unit comprises a floating point convolution layer, a normalization layer and a nonlinear layer which are sequentially connected; the second type structural unit comprises a quantization convolution layer, a normalization layer and a nonlinear layer which are connected in sequence; the third type of structural unit comprises a quantization convolution layer, a pooling layer, a normalization layer and a nonlinear layer which are connected in sequence.
3. The method of claim 1, wherein quantizing the model parameters based on a sign function during the forward propagation of the quantized convolutional layer comprises:
calculating an intermediate quantization value corresponding to the model parameter based on a symbolic function, and calculating a scaling factor corresponding to the model parameter;
and determining the product of the intermediate quantization value and the scaling factor as the final quantization value of the model parameter.
4. The method of claim 1, wherein after quantizing the model parameters based on the symbolic function, the method further comprises:
and carrying out inner product operation of bit operation on the quantized value corresponding to the weight and the quantized value corresponding to the activation value to obtain the output of the quantized convolutional layer.
5. The method of claim 1, wherein the quantizing the model parameters based on the parameterized hyperbolic tangent function comprises:
calculating a quantized value corresponding to the model parameter based on a target hyperbolic tangent function;
the target hyperbolic tangent function specifically includes: y (x) ═ c0*tanh(c1*x);
Wherein x is the model parameter, y (x) is the quantization value, c0And c1Is a hyper-parameter, c1And determining data distribution based on the model parameters of the quantized convolutional layer.
6. The method of claim 5, further comprising:
calculating c under target constraint1(ii) a Wherein the target constraint condition is that a ratio of the number of target model parameters satisfying a preset condition to the total number of model parameters is greater than or equal to a preset ratio r, and the preset condition is specifically tanh (c)1*x)∈(-1,-r)∪(1,r)。
7. The method of model parameter quantization of claim 1, further comprising:
and determining the corresponding data distribution of the model parameters of the quantized convolutional layer in different intervals.
8. An apparatus for quantizing a model parameter of a convolutional neural network, comprising:
the acquisition module is used for acquiring a data set of the digital classification task; wherein the data set contains digital images and corresponding digital labels;
the building module is used for building a convolutional neural network of the digital classification task; the convolutional neural network comprises a floating point convolutional layer, a quantization convolutional layer, a pooling layer, a normalization layer, a nonlinear layer, a full-connection layer, a classification layer and a loss calculation layer;
a first quantization module for quantizing model parameters based on a sign function in a forward propagation process of the quantized convolutional layer; wherein the model parameters include weights and activation values;
a second quantization module for quantizing the model parameters based on a hyperbolic tangent function in a back propagation process of the quantized convolutional layer; wherein parameters in the hyperbolic tangent function are determined based on a distribution of model parameters of the quantized convolutional layer;
and the training module is used for training the convolutional neural network by using the data set, and the trained convolutional neural network is used for executing a digital classification task.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the method of model parameter quantification of a convolutional neural network as claimed in any one of claims 1 to 7 when said computer program is executed.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the method for model parameter quantization of a convolutional neural network as claimed in any one of claims 1 to 7.
CN202111676325.8A 2021-12-31 2021-12-31 Method and device for quantizing model parameters of convolutional neural network and related device Pending CN114444686A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111676325.8A CN114444686A (en) 2021-12-31 2021-12-31 Method and device for quantizing model parameters of convolutional neural network and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111676325.8A CN114444686A (en) 2021-12-31 2021-12-31 Method and device for quantizing model parameters of convolutional neural network and related device

Publications (1)

Publication Number Publication Date
CN114444686A true CN114444686A (en) 2022-05-06

Family

ID=81365438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111676325.8A Pending CN114444686A (en) 2021-12-31 2021-12-31 Method and device for quantizing model parameters of convolutional neural network and related device

Country Status (1)

Country Link
CN (1) CN114444686A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116468079A (en) * 2023-04-13 2023-07-21 上海处理器技术创新中心 Method for training deep neural network model and related product
WO2024012171A1 (en) * 2022-07-15 2024-01-18 华为技术有限公司 Binary quantization method, neural network training method, device and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024012171A1 (en) * 2022-07-15 2024-01-18 华为技术有限公司 Binary quantization method, neural network training method, device and storage medium
CN116468079A (en) * 2023-04-13 2023-07-21 上海处理器技术创新中心 Method for training deep neural network model and related product
CN116468079B (en) * 2023-04-13 2024-05-24 上海处理器技术创新中心 Method for training deep neural network model and related product

Similar Documents

Publication Publication Date Title
Wan et al. Tbn: Convolutional neural network with ternary inputs and binary weights
Fang et al. Post-training piecewise linear quantization for deep neural networks
Micikevicius et al. Mixed precision training
US20200302276A1 (en) Artificial intelligence semiconductor chip having weights of variable compression ratio
CN110659725B (en) Neural network model compression and acceleration method, data processing method and device
CN110378383B (en) Picture classification method based on Keras framework and deep neural network
US11481613B2 (en) Execution method, execution device, learning method, learning device, and recording medium for deep neural network
CN113424202A (en) Adjusting activation compression for neural network training
KR20190050698A (en) Method for optimizing neural networks
WO2018140294A1 (en) Neural network based on fixed-point operations
CN110852439A (en) Neural network model compression and acceleration method, data processing method and device
CN114444686A (en) Method and device for quantizing model parameters of convolutional neural network and related device
CN111783974A (en) Model construction and image processing method and device, hardware platform and storage medium
Langroudi et al. Positnn framework: Tapered precision deep learning inference for the edge
CN111160000B (en) Composition automatic scoring method, device terminal equipment and storage medium
CN110874625A (en) Deep neural network quantification method and device
CN113632106A (en) Hybrid precision training of artificial neural networks
CN113222102A (en) Optimization method for neural network model quantification
CN114091650A (en) Searching method and application of deep convolutional neural network architecture
Floropoulos et al. Complete vector quantization of feedforward neural networks
CN115080139A (en) Efficient quantization for neural network deployment and execution
CN112686384A (en) Bit-width-adaptive neural network quantization method and device
US11429771B2 (en) Hardware-implemented argmax layer
CN112381147A (en) Dynamic picture similarity model establishing method and device and similarity calculating method and device
Choi et al. Approximate computing techniques for deep neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination