CN109344893B

CN109344893B - Image classification method based on mobile terminal

Info

Publication number: CN109344893B
Application number: CN201811119618.4A
Authority: CN
Inventors: 陈靓影; 徐如意; 饶川; 刘乐元; 张坤; 彭世新; 刘小迪
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2018-09-25
Filing date: 2018-09-25
Publication date: 2021-01-01
Anticipated expiration: 2038-09-25
Also published as: CN109344893A

Abstract

The invention discloses an image classification method based on a mobile terminal. The weight in the deep neural network model is quantized into an index of 2 in training, so that the shift operation of the embedded system can be effectively improved, the quantization codebook is dynamically updated, the quantization error can be effectively reduced, and the prediction performance of the model and the operation efficiency of the model in a moving state are improved. The invention also provides a system for realizing the method. The deep neural network compression method has an obvious compression effect on the deep neural network, can reduce the consumption of a large deep neural network model on storage resources and calculation resources, promotes the deployment of the deep neural network on resource-limited mobile terminals such as smart phones and the like, and has extremely strong practical application.

Description

Image classification method based on mobile terminal

Technical Field

The invention belongs to the field of image processing and pattern recognition, and particularly relates to an image classification method and system based on a mobile terminal.

Background

With the rapid development of internet technology, the realization of high-pixel photographing function in smart phones and the universal coverage of mobile communication networks make a large amount of image information rush into our lives. Therefore, how to distinguish massive image data into different categories on a mobile terminal such as a smart phone is a technical problem which needs to be solved urgently.

In recent years, the deep neural network is distinguished from a plurality of machine learning methods, and the performance of image classification is remarkably broken through, so that the deep neural network attracts wide attention. In order to obtain better characteristics and improve the performance of the neural network, a deep multi-layer network structure is often constructed. This results in deep neural networks with millions of parameters, consuming a significant amount of computing and memory resources. And the application of the deep neural network to mobile terminals such as smart phones and the like is greatly difficult.

In order to deploy a deep neural network to embedded devices such as a smart phone, a currently common method is to compress a depth model, so that the classification performance is ensured, and meanwhile, the storage space required by the depth model is reduced as much as possible. At present, many researches have been carried out in the field, but the problems of difficult convergence of compression network training, low classification precision, low operation efficiency in a mobile terminal and the like still exist.

Disclosure of Invention

Aiming at the problems and improvement requirements in the prior art, the invention provides an image classification method and system based on a mobile terminal, which quantizes the weight in a deep neural network model into an index of 2 and can effectively improve the shift operation of an embedded system. Different from the existing method adopting static quantization coding, the method provided by the invention can be used for dynamically updating the quantization codebook in the model training process, effectively reducing the quantization error and improving the prediction performance of the model and the operation efficiency in the moving state.

An image classification method based on a mobile terminal comprises an off-line training stage and an on-line classification stage:

the off-line training stage specifically comprises:

s1 formulating a codebook:

acquiring the maximum value of the weight absolute value in each layer of the deep neural network model by adopting a deep neural network model, and quantizing the maximum value into an exponential form with the base number of 2, thereby obtaining the quantization upper limit of the codebook; determining a codebook quantized by a current model under the limit of an upper quantization limit;

s2 quantization weight:

carrying out exponential quantization on the weight in the deep neural network model, and quantizing the weight into a value closest to a codebook;

s3 retraining the network model:

inputting a sample image, training the quantized deep neural network model, obtaining the cross entropy loss of the deep network in the forward process of training, and updating the weight parameters in the network according to the cross entropy loss in the backward process.

S4 iteration and termination:

iteratively executing steps S2 and S3 until the deep neural network model converges or reaches the set training times, terminating the iteration and obtaining a final classifier;

the online classification stage specifically comprises: and sending the image to be classified into a classifier to obtain a classification result.

Further, the quantization upper limit of the codebook is expressed as:

which is composed of n₂＝floor(log₂(max(|W_lI))) floor (. cndot.) is a downward rounding operation, w_lFor the l-th layer weight of the deep neural network, max () represents the maximum value, and | represents the absolute value.

Further, when quantized to b bits, the codebook is expressed as: p_l＝{±2ⁿ}，n∈[n₁，n₂]N ∈ Z, where l represents the l-th layer of the deep neural network, n₁And n₂Is two integers satisfying n₁＜n₂，n₁＝n₂-2^b-1+1, Z represents a positive integer.

Further, when quantized to b bits, the codebook is expressed as:

P_l＝{±2ⁿ，0}，n∈[n₁，n₂]n ∈ Z, where l represents the l-th layer of the deep neural network, n₁And n₂Is two integers satisfying n₁＜n₂，n₁＝n₂-2^b-2+1, Z represents a positive integer.

Further, the layers of the deep neural network are quantified as follows:

wherein the content of the first and second substances,

for quantized weights, 2^kA quantized value that is the absolute value | w | of the weight w;

indicating function W ∈ W_l。

Further, the layers of the deep neural network are quantified as follows:

wherein the content of the first and second substances,

indicating function W ∈ W_l。

Further, the cross entropy loss of the deep network is obtained in the forward process of training, and the cross entropy loss is expressed as:

wherein the content of the first and second substances,

is the loss of the network and is,

is a regularization term, adopts L₂Norm regularization term, λ is the coefficient of the regularization term,

is the network weight after the model compression,

is the total loss function.

Further, the weight parameters in the network are updated in the reverse process according to the cross entropy loss:

wherein the content of the first and second substances,

is the weight of the network at the kth iteration, gamma is the learning rate,

is the gradient of the loss function over the network weights.

An image classification system based on a mobile terminal comprises an offline training module and an online classification module:

the offline training phase is to:

s1 formulating a codebook:

s2 quantization weight:

s3 retraining the network model:

S4 iteration and termination:

iteratively executing steps S2 and S3, and when the deep neural network model converges or reaches the set training times, terminating iteration to obtain a final classifier;

the online classification phase is to: and sending the image to be classified into a classifier to obtain a classification result.

Compared with the prior art, the invention has the advantages and effects that:

1. the invention provides a method for dynamically updating weight parameters with larger absolute values in a codebook self-adaptive network, and reduces the influence of the quantization of the parameters on the model precision as much as possible;

2. the invention provides an alternative iteration algorithm for model training, so that weight parameters and a codebook are alternately updated, and the convergence speed of the training process is higher.

Drawings

FIG. 1 is a flowchart illustrating an implementation of an image classification method based on a mobile terminal according to the present invention;

FIG. 2 is a diagram illustrating the quantization rules of network weights

Fig. 3 is a block diagram of image classification based on a mobile terminal according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

FIG. 1 is a flow chart of an implementation of the image classification method based on the mobile terminal according to the present invention. The method comprises two stages of off-line training and on-line classification:

the off-line training stage specifically comprises:

s1 formulating a codebook: and acquiring the maximum value of the weight absolute value in each layer of the randomly initialized or pre-trained deep neural network model so as to determine the quantization upper limit of the codebook, and acquiring the codebook quantized by the current model according to the quantization digit.

In step S1, the maximum value of the absolute value of the weight in each layer of the initial unquantized depth neural network model is obtained and quantized into an exponential form with a base number of 2,the concrete quantification is shown as follows:

wherein n is₂＝floor(log₂(max(|W_lI))) floor (. cndot.) is a downward rounding operation, w_lIs the l-th layer weight of the deep neural network, and the l represents the absolute value. And quantizing the maximum value of the absolute value of the weight of each layer of the deep neural network to obtain the upper limit of the codebook.

In step S1, when the quantization is b bits, the codebook may be expressed as: p_l＝{±2ⁿ}，n∈[n₁，n₂]N ∈ Z, where l represents the l-th layer of the deep neural network, n₁And n₂Is two integers satisfying n₁＜n₂And Z represents a positive integer. Since n is₁And n₂Between is provided with n₂-n₁+1 integers and the positive and negative integers in the codebook are equal in number, so that the total value in the codebook is 2 (n)₂-n₁+1)＝2^bI.e. with n₁＝n₂-2^b-1+1, thereby determining the codebook P_l。

In step S1, 0 may also be introduced as a quantization value into the formulated codebook, which may be denoted as P_l＝{±2ⁿ，0}，n∈[n₁，n₂]N is equal to Z. Since 0 cannot represent the power of 2 to the n (n is an integer), an additional bit is required to represent the quantized value of 0, where n is an integer₁＝n₂- 2^b-2+1, other processing is unchanged.

S2 quantization weight: and quantizing the weights in the deep neural network model according to the codebook established in the step S1, and quantizing the weights into the closest value in the codebook.

In step S2, the weights are quantized to quantize the weights in the deep neural network to the value nearest to the codebook, and the specific quantization rule is as shown in fig. 2, and the quantization rules for the layers of the deep neural network are as follows:

wherein the content of the first and second substances,

w∈W_lis an indication function for distinguishing positive and negative weights in the network.

In step S2, 0 is introduced into the codebook as a quantization value, and the lower limit of the codebook is truncated to obtain a 0 value, and the corresponding quantization method is as follows:

s3 retraining the network model: and (5) retraining the deep neural network quantified in the step S2. Inputting training images with class labels, obtaining cross entropy loss of the depth network in the forward process of training, and updating weight parameters in the network according to the cross entropy loss in the backward process.

In step S3, after the weights in the deep neural network are quantized in step S2, the deep neural network is retrained, which is divided into two procedures: forward propagation and backward propagation. In the forward propagation process, training data is input, and the cross entropy loss of the network is obtained in the deep neural network, which is defined as follows:

wherein the content of the first and second substances,

is a loss of the network and is,

is L₂Norm regularization term, the invention adopts L₂A regularization term, λ being a coefficient of the regularization term，

Is the network weight after the model compression,

is the total loss function. In the back propagation process, the residual error of the network is transmitted from the next layer to the front layer, the weight of the network is updated according to the gradient calculated by the residual error, and the updating mode is as follows:

wherein the content of the first and second substances,

is the updated weight, gamma is the learning rate,

is the gradient of the loss function over the network weights. For the quantization model, the derivation of the indicator function i (w) results in a gradient of 0, and the parameters cannot be updated. In the process of inverse derivation, the weights in the model can be processed in this way

The treatment method comprises the following steps:

therefore, in the actual back propagation process, the weight update mode is as follows:

s4 iteration and termination: the updating of the weight in step S3 destroys the original quantization, so that steps S2 and S3 are iteratively performed, and when the deep neural network model converges or reaches the set training times, the iteration is terminated to obtain the final quantization compression model.

S5 image object classification: the unclassified images are sent to the quantization compression model obtained in step S4 for prediction, and the images are classified according to the prediction result.

The image classification method based on the mobile terminal quantizes the weight in the deep neural network model into an index of 2, and can effectively improve the shift operation of the embedded system. Different from the existing method adopting static quantization coding, the method provided by the invention can be used for dynamically updating the quantization codebook in the model training process, effectively reducing the quantization error and improving the prediction performance of the model and the operation efficiency in the moving state.

Example (c):

this example is to propose an image classification device based on a mobile terminal, which includes three modules: an image reading module, an image classification module and an image sorting module, as shown in fig. 3.

This example was tested on the standard data set CIFAR-10. CIFAR-10 is an image classification dataset comprising 10 classes, respectively: airplanes, cars, birds, cats, deer, dogs, frogs, horses, boats, and trucks. All images are three-channel color images of size 32 × 32, containing 60000 pictures, of which the training set is 50000 and the validation set is 10000. The deep neural network employed in the experiment of this example was the residual network ResNet. The method comprises the following specific steps:

1. image reading

The test data were read one by one and the size was scaled to 32 x 32.

2. Image classification

And predicting the read image by using a dynamic quantized compressed depth neural network.

The training process of the compressed deep neural network comprises the following steps: performing enhancement preprocessing on the training data, namely filling images with 0 expansion of 36 × 36 on the original 32 × 32 image boundary, randomly cutting the images into 32 × 32 images, and then randomly turning the images left and right; the model is dynamically quantized and coded until the model converges, and in the training process, the model is dynamically quantized and coded80000 rounds of iteration are carried out, data fed into the network for one batch in each round is 128, the initial learning rate is 0.1, when training reaches 40000 times, the learning rate is 0.01, and after 60000 times, the learning rate is 0.001, and L is used in training₂The coefficient of the regularization term is set to 0.001.

The prediction results of image classification are shown in table 1, and the cases of introducing 0 and not introducing 0 into the codebook are compared respectively.

Table 1 ResNet at different depths introduces 0 and no 0 effect on quantization in the codebook at different bit widths. The accuracy of the pre-trained 32-bit wide model on the verification set under ResNet-20, ResNet-32, ResNet-44 and ResNet-56 is 0.9212, 0.9246, 0.9332 and 0.9323 in sequence, and the fifth column and the seventh column in the table represent the accuracy of the quantized model on the verification set minus the accuracy of the pre-trained 32-bit wide model.

TABLE 1

As can be seen from Table 1, the method provided by the invention can effectively compress the deep neural network model to a very high multiple, the model can ensure higher performance, and even when the original model is compressed to 10.67 times, the performance of the model is only slightly reduced.

3. Image arrangement

And arranging the pictures into folders of corresponding categories according to the prediction result.

The deep neural network compression method has an obvious compression effect on the deep neural network, can reduce the consumption of a large deep neural network model on storage resources and calculation resources, promotes the deployment of the deep neural network on resource-limited mobile terminals such as smart phones and the like, and has extremely strong practical application.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An image classification method based on a mobile terminal is characterized by comprising an off-line training stage and an on-line classification stage:

the off-line training stage specifically comprises:

s1 formulating a codebook:

s2 quantization weight:

s3 retraining the network model:

inputting a sample image, training a quantized deep neural network model, obtaining cross entropy loss of the deep network in the forward process of training, and updating weight parameters in the network according to the cross entropy loss in the backward process;

s4 iteration and termination:

iteratively executing steps S1, S2 and S3, dynamically updating the quantization codebook in the model training process until the deep neural network model converges or reaches the set training times, terminating iteration and obtaining a final classifier;

the online classification stage specifically comprises: sending the image to be classified into a classifier to obtain a classification result;

wherein the quantization upper limit of the codebook is represented as:

wherein n is₂＝floor(log₂(max(|W_lI))) floor (. cndot.) is a downward rounding operation, w_lFor the l-th layer weight of the deep neural network, max () represents the maximum value, and | represents the absolute value;

when quantized to b bits, the codebook is expressed as: p_l＝{±2ⁿ},n∈[n₁₁,n₁₂]N ∈ Z, where l represents the l-th layer of the deep neural network, n₁₁And n₁₂Is two integers satisfying n₁₁＜n₁₂，n₁₁＝n₁₂-2^b-1+1, Z represents a positive integer, or is represented by: p_l＝{±2ⁿ,0},n∈[n₂₁,n₂₂]N ∈ Z, where l represents the l-th layer of the deep neural network, n₂₁And n₂₂Is two integers satisfying n₂₁＜n₂₂，n₂₁＝n₂₂-2^b-2+1, Z represents a positive integer;

wherein, if the codebook is expressed as: p_l＝{±2ⁿ},n∈[n₁₁,n₁₂]And n belongs to Z, quantizing each layer of the deep neural network as follows: