CN113392899B

CN113392899B - Image classification method based on binary image classification network

Info

Publication number: CN113392899B
Application number: CN202110650074.XA
Authority: CN
Inventors: 刘启和; 王钰涵; 周世杰; 张准; 董婉祾; 但毅; 严张豹
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-06-10
Filing date: 2021-06-10
Publication date: 2022-05-10
Anticipated expiration: 2041-06-10
Also published as: CN113392899A

Abstract

The invention discloses an image classification method based on a binarization image classification network, which comprises the following steps: s1: collecting an original image, and initializing the original image; s2: building an image classification network according to the initialized original image; s3: and (4) carrying out image classification by utilizing a softmax classifier of the image classification network. The image classification method carries out binarization processing on the convolution kernel of the convolution operation module with the largest operation amount in the traditional image classification, uses 4 binarization convolution kernels with the same specification to carry out linear approximation, and saves the cost of algorithm storage space.

Description

Image classification method based on binary image classification network

Technical Field

The invention belongs to the technical field of image classification, and particularly relates to an image classification method based on a binarization image classification network.

Background

In recent years, Deep Neural Networks (DNNs) have revolutionized the field of machine learning and pattern recognition. However, most existing DNN models are computationally expensive and memory intensive, which hinders their deployment in devices with low memory resources or applications with stringent latency requirements.

Deep Neural Networks (DNNs) have revolutionized the field of machine learning and pattern recognition. Taking image classification as an example: classical network structures such as LeNet, AlexNet, ResNet, VggNet, etc. are proposed in succession. The structures mainly aim at a server side, and have higher requirements on GPU computing power and storage space under the hardware environment with sufficient computing power in training and reasoning. This is difficult to the deployment of mobile terminal, for example the terminal that hardware condition is limited such as unmanned aerial vehicle, smart car.

Disclosure of Invention

The invention aims to solve the problem of image classification and provides an image classification method based on a binarization image classification network.

The technical scheme of the invention is as follows: an image classification method based on a binarization image classification network comprises the following steps:

s1: collecting an original image, and initializing the original image;

s2: building an image classification network according to the initialized original image;

s3: and (4) carrying out image classification by utilizing a softmax classifier of the image classification network.

Further, step S1 includes the following sub-steps:

s11: collecting an original image with the size of 224 × 3, and adding 0 elements with the width of 3 to the periphery of the original image respectively to obtain a first output image with the size of 230 × 3;

s12: performing convolution operation on the first output image by using a convolution kernel with the size of 7 × 7 and the step size of 1 to obtain a second output image with the size of 224 × 64, and performing batch normalization on the second output image to obtain a third output image with the size of 224 × 64;

s13: activating the third output image by using a nonlinear activation function H (x), and performing maximum pooling on the activated third output image to obtain a fourth output image with the size of 112 x 64;

s14: and (4) carrying out binarization on the fourth output image by using a sign function S (x') to obtain a fifth output image with the size of 112 × 64, and finishing the initialization of the original image.

Further, in step S14, the expression of the symbolic function S (x') is:

where x denotes an input image of the sign function and α denotes a first parameter to be learned.

Further, step S2 includes the following sub-steps:

s21: adding 0 elements with the width of 1 to the periphery of the fifth output image respectively to obtain a sixth output image with the size of 114 x 64;

s22: performing convolution operation on the sixth output image by using a binarization convolution kernel with the size of 3 × 3 and the step size of 1 to obtain a seventh output image with the size of 112 × 128;

s23: activating the seventh output image by using a nonlinear activation function H (x), and performing maximum pooling on the activated seventh output image to obtain an eighth output image with the size of 56 × 128;

s23: adding 0 elements with the width of 1 to the periphery of the eighth output image respectively to obtain a ninth output image with the size of 58 × 128;

s24: performing convolution operation on the ninth output image by using the binarization convolution kernel with the size of 3 × 3 and the step size of 1 to obtain a tenth output image with the size of 56 × 256;

s25: activating the tenth output image by using a nonlinear activation function H (x), and performing maximum pooling on the activated tenth output image to obtain an eleventh output image with the size of 28 × 256;

s26: adding 0 elements with the width of 1 to the periphery of the eleventh output image respectively to obtain a twelfth output image with the size of 30 x 256;

s27: performing convolution operation on the twelfth output image by using the binarization convolution kernel with the size of 3 × 3 and the step size of 1 to obtain a thirteenth output image with the size of 28 × 512;

s28: activating the thirteenth output image by using a nonlinear activation function H (x), and performing maximum pooling on the activated thirteenth output image to obtain a fourteenth output image with the size of 14 × 512;

s29: adding 0 elements with the width of 1 to the periphery of the fourteenth output image respectively to obtain a fifteenth output image with the size of 16 x 512;

s210: performing convolution operation on the fifteenth output image by using a binarization convolution kernel with the size of 3 × 3 and the step size of 1 to obtain a sixteenth output image with the size of 14 × 512;

s211: activating the sixteenth output image by using a nonlinear activation function H (x), and performing maximum pooling on the activated sixteenth output image to obtain a seventeenth output image with the size of 7 × 512;

s212: adding 0 elements with the width of 1 to the periphery of the seventeenth output image respectively to obtain an eighteenth output image with the size of 9 x 512;

s213: performing convolution operation on the eighteenth output image by using a binarization convolution kernel with the size of 3 × 3 and the step size of 1 to obtain a nineteenth output image with the size of 7 × 512;

s214: activating the nineteenth output image by using a nonlinear activation function H (x), and stretching the activated nineteenth output image to obtain a twentieth output image with the size of 1 × 25088;

s215: and inputting the twentieth output image to a full connection layer with 4096 neurons in two layers to complete the construction of the image classification network.

Further, in step S13, step S23, step S25, step S28, step S211, and step S214, the size of the pooled pool subjected to the maximum pooling is 2 × 2, and the step size is 2 × 2.

Further, in step S13, step S23, step S25, step S28, and step S211, the expression of the nonlinear activation function h (x) is:

wherein x represents an input image of the nonlinear activation function, β represents a second parameter to be learned, γ represents a third parameter to be learned, and τ represents a fourth parameter to be learned.

Further, the binarization of the convolution kernel with a size of 3 × 3 in step S2 includes the following sub-steps:

a21: using a size of 3 x C_in*C_outFirst binarized convolution kernel B of_i1A second binary convolution kernel B_i2The third binary convolution kernel B_i3And a fourth binary convolution kernel B_i4Performing linear approximation on convolution kernels with the size of 3 x 3 respectively, wherein C_inIndicates the number of input channels, C_outRepresenting the number of output channels;

a22: carrying out normalization processing on each element in the convolution kernel matrix after linear approximation to obtain a convolution kernel after normalization processing;

a23: setting a first binary convolution kernel B_i1A second binary convolution kernel B_i2The third binary convolution kernel B_i3And a fourth binary convolution kernel B_i4Corresponding activation thresholds, respectively b_i1、b_i2、b_i3And b_i4；

b_i1＝0.2493，b_i2＝0.4987，b_i3＝0.7480，b_i4＝0.9973。

A24: at the first binarization convolution kernel B_i1A second binary convolution kernel B_i2The third binary convolution kernel B_i3And a fourth binary convolution kernel B_i4Respectively reducing the normalized convolution kernel matrix to be less than an activation threshold b_i1、b_i2、b_i3And b_i4Is determined to be 0 and is greater than an activation threshold b_i1、b_i2、b_i3And b_i4Is determined to be 1, and a first binarized convolution kernel B is initialized randomly_i1A second binary convolution kernel B_i2The third binary convolution kernel B_i3And a fourth binary convolution kernel B_i4The binarization of the convolution kernel of size 3 x 3 is completed.

Further, in step a21, the calculation formula for performing linear approximation is:

W_i≈α_i1*B_i1+α_i2*B_i2+α_i3*B_i3+α_i4*B_i4

wherein, W_iRepresenting the convolution kernel after a linear approximation, alpha_i1Representing a first binarized convolution kernel B_i1Weight of (a), a_i2Representing a second binary convolution kernel B_i2Weight of (a), a_i3Representing a third binary convolution kernel B_i3Weight of (a), a_i4Representing a fourth binary convolution kernel B_i4The weight of (c);

in step A22, the linear approximation is performed on each element a in the convolution kernel matrix_ijThe formula for normalization is:

wherein, a'_ijRepresents each element after normalization, min represents element a_ijMax represents the element a_ijMaximum value of (2).

The invention has the beneficial effects that:

(1) the image classification method carries out binarization processing on the convolution kernel of the convolution operation module with the largest operation amount in the traditional image classification, uses 4 binarization convolution kernels with the same specification to carry out linear approximation, and saves the cost of algorithm storage space.

(2) The image classification method uses the nonlinear activation function with displacement to process the related output, enhances the representation capability of the extracted features under the limitation of binarization, and uses the sign function with displacement to process the related output, so that two sides of convolution operation, namely, a convolution kernel element and an input element, are binarized, addition and subtraction, even binary logic operation replaces the traditional floating point number multiplication, the operation speed is greatly increased, and the dependence of the algorithm on hardware is reduced.

Drawings

FIG. 1 is a flow chart of an image classification method;

FIG. 2 is a diagram of a network structure for initializing an image after binarization;

FIG. 3 is a diagram of a binarized feature extraction network architecture;

FIG. 4 is a diagram of the nonlinear activation function H (x).

Detailed Description

The embodiments of the present invention will be further described with reference to the accompanying drawings.

As shown in fig. 1, the present invention provides an image classification method based on a binarization image classification network, which comprises the following steps:

s1: collecting an original image, and initializing the original image;

In the embodiment of the present invention, as shown in fig. 2, step S1 includes the following sub-steps:

In the embodiment of the present invention, in step S14, the expression of the symbolic function S (x') is:

where x denotes an input image of the sign function, and α denotes a first parameter to be learned.

The parameter-corresponding gradient is calculated as:

in the embodiment of the present invention, as shown in fig. 3, step S2 includes the following sub-steps:

In the embodiment of the present invention, in step S13, step S23, step S25, step S28, step S211, and step S214, the size of the pooled cell subjected to the maximum pooling is 2 × 2, and the step size is 2 × 2.

In the embodiment of the present invention, as shown in fig. 4, in step S13, step S23, step S25, step S28, and step S211, the expression of the nonlinear activation function h (x) is:

I_{·}An operation is defined as

In the embodiment of the present invention, the step S2, the binarization of the convolution kernel with a size of 3 × 3 includes the following sub-steps:

a23: setting a first binary convolution kernel B_i1And a second binary convolution kernel B_i2The third binary convolution kernel B_i3And a fourth binary convolution kernel B_i4Corresponding activation thresholds, respectively b_i1、b_i2、b_i3And b_i4；

b_i1＝0.2493，b_i2＝0.4987，b_i3＝0.7480，b_i4＝0.9973。

The approximated forward propagation output O of the convolution kernel is as follows:

where a is the input to the convolution kernel.

The approximated convolution kernel backpropagates are as follows:

as can be seen by the straight-through estimator STE:

in the embodiment of the present invention, in step a21, the calculation formula for performing linear approximation is:

W_i≈α_i1*B_i1+α_i2*B_i2+α_i3*B_i3+α_i4*B_i4

The working principle and the process of the invention are as follows:

the invention has the beneficial effects that: the invention researches an image classification neural network binarization method, adopts a binarization convolution kernel to replace a conventional convolution kernel, not only reduces the storage cost, but also changes the conventional floating point number multiplication operation into a binary addition and subtraction method in the aspect of convolution calculation, and even can replace the binary logic operation by matching with hardware design, thereby improving the calculation efficiency and reducing the dependence of the algorithm on the hardware calculation power.

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. An image classification method based on a binarization image classification network is characterized by comprising the following steps:

s1: collecting an original image, and initializing the original image;

s3: utilizing a softmax classifier of an image classification network to classify images;

the step S1 includes the following sub-steps:

s14: carrying out binarization on the fourth output image by using a sign function S (x') to obtain a fifth output image with the size of 112 × 64, and finishing initialization of the original image;

the step S2 includes the following sub-steps:

s22: performing convolution operation on the sixth output image by using the binarization convolution kernel with the size of 3 × 3 and the step size of 1 to obtain a seventh output image with the size of 112 × 128;

s215: inputting the twentieth output image to a full connection layer with 4096 neurons in two layers to complete the construction of an image classification network;

in step S2, the binarization of the convolution kernel with a size of 3 × 3 includes the following sub-steps:

a21: using a size of 3 x C_in*C_outFirst binarized convolution kernel B of_i1And a second binary convolution kernel B_i2And the third binary convolution kernel B_i3And a fourth binary convolution kernel B_i4Performing linear approximation on convolution kernels with the size of 3 x 3 respectively, wherein C_inIndicates the number of input channels, C_outRepresenting the number of output channels;

a23: setting a first binary convolution kernel B_i1A second binary convolution kernel B_i2And the third binary convolution kernel B_i3And a fourth binary convolution kernel B_i4Corresponding activation thresholds, respectively b_i1、b_i2、b_i3And b_i4；

2. The image classification method based on the binary image classification network according to claim 1, characterized in that in said step S14, the expression of the symbolic function S (x') is:

3. The image classification method based on the binarization image classification network of claim 1, wherein in the step S13, the step S23, the step S25, the step S28, the step S211 and the step S214, the pooling pool size for performing the maximal pooling is 2 x 2, and the step size is 2 x 2.

4. The image classification method based on the binarization image classification network as claimed in claim 1, wherein in the step S13, the step S23, the step S25, the step S28 and the step S211, the expression of the nonlinear activation function H (x) is as follows:

5. The image classification method based on the binary image classification network according to claim 1, characterized in that in said step a21, the calculation formula for linear approximation is:

W_i≈α_i1*B_i1+α_i2*B_i2+α_i3*B_i3+α_i4*B_i4

wherein, W_iRepresenting the convolution kernel after a linear approximation, alpha_i1Representing a first binarized convolution kernelB_i1Weight of (a), a_i2Representing a second binary convolution kernel B_i2Weight of (a), a_i3Representing a third binary convolution kernel B_i3Weight of (a), a_i4Represents the fourth binary convolution kernel B_i4The weight of (c);

in the step a22, the linear approximation is performed on each element a in the convolution kernel matrix_ijThe formula for normalization is: