CN115861700A

CN115861700A - Image classification identification method, device and equipment

Info

Publication number: CN115861700A
Application number: CN202211600625.2A
Authority: CN
Inventors: 刘兆伟; 王莹洁; 王占宇; 赵勇; 徐金东; 阎维青
Original assignee: Yantai University
Current assignee: Yantai University
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2023-03-28

Abstract

A classification and identification method, a device and equipment for an image are provided, wherein the classification and identification method for the image comprises the following steps: processing the image through a first convolution operation, a first pooling operation and a first full-connection operation to obtain a classification identification parameter; processing the image through a second convolution operation, a second pooling operation and a second full-connection operation based on the classification identification parameters to obtain a classification identification result; weight initialization operation is set in the first convolution operation, the first pooling operation, the first full-connection operation, the second convolution operation, the second pooling operation and the second full-connection operation; the weight initialization operation keeps consistent convolution variance and gradient variance in the first convolution operation, the first pooling operation and the first full-connection operation and the second convolution operation, the second pooling operation and the second full-connection operation; the method effectively solves the problems of gradient explosion and gradient disappearance in the image processing process, and the obtained classification recognition result has high accuracy.

Description

Image classification identification method, device and equipment

Technical Field

The invention relates to the technical field of neural networks, in particular to a method, a device and equipment for classifying and identifying images.

Background

Data classification recognition is an important and classical problem in natural language processing, and is a process of mapping pieces of data with information on a computer to a given category or a plurality of categories of subjects. The data classification identification can be divided into a plurality of complex scene classification problems according to different task types. Data classification and identification are commonly used in the fields of digital dynamic scene construction, social interaction case analysis, dynamic real-time hotspot problem pushing, garbage information or garbage file filtering and the like, provide technical support for searching and researching a dynamic semantic library, and are one of the current main hotspots.

The CNN model (Convolutional Neural Network) can be well used for data classification recognition, especially the Lenet-5 model in the CNN model. The Lenet-5 model is commonly used in handwritten data classification and recognition work because it can use a back propagation algorithm to train data, but in the training process, especially in the networking training of Mnist handwritten character data sets, the problem of gradient disappearance or gradient explosion can occur, which can cause most of the gradients obtained by the back propagation algorithm to be ineffective or have adverse effect, resulting in lower accuracy of the final classification and recognition result.

Disclosure of Invention

The invention provides a method, a device and equipment for classifying and identifying images.

A classification recognition method of an image comprises the following steps:

processing the image through a first convolution operation, a first pooling operation and a first full-connection operation to obtain a classification identification parameter;

processing the image through a second convolution operation, a second pooling operation and a second full-connection operation based on the classification identification parameters to obtain a classification identification result;

weight initialization operation is set in the first convolution operation, the first pooling operation, the first full-connection operation, the second convolution operation, the second pooling operation and the second full-connection operation;

the weight initialization operation keeps consistent the convolution variance and gradient variance in the first convolution operation, first pooling operation and first fully-connected operation and second convolution operation, second pooling operation and second fully-connected operation.

In the method for classifying and identifying an image, the weight initialization operation makes the convolution variance 1 when the image is forward propagated in the first convolution operation, the first pooling operation, the first full-join operation, the second convolution operation, the second pooling operation and the second full-join operation, and makes the gradient variance 1 when the image is backward propagated.

In the method for classifying and identifying images, the first convolution operation, the first pooling operation and the first full-link operation are realized by sequentially passing a convolution layer 1, a pooling layer 1, a convolution layer 2, a pooling layer 2, a full-link layer 1 and a full-link layer 2;

the second convolution operation, the second pooling operation, and the second fully-connected operation are implemented sequentially through the convolution layer 1, the pooling layer 1, the convolution layer 2, the pooling layer 2, the fully-connected layer 1, and the fully-connected layer 2.

In the method for identifying the classification of the image, the second convolution operation is preceded by a prediction operation, and the prediction operation is used for predicting the accuracy of the image;

if the accuracy is larger than or equal to a first threshold value, directly outputting a result;

if the accuracy is less than the first threshold, performing a second convolution operation, a second pooling operation, and a second fully-connected operation to process the image.

The prediction operation comprises a formerry function, a cvtColor function, a softmax function and a squeeze function;

the formerry function and the cvtColor function are used for realizing image style conversion before correct rate operation of the predicted image;

the softmax function is used for realizing the operation of the predicted image correct rate;

the squeeze function is used to implement output expansion and scaling operations of the image.

Before the first convolution operation, the first pooling operation, the first full-connection operation, the second convolution operation, the second pooling operation and the second full-connection operation, the pre-processing operation is performed, and the pre-processing operation is used for realizing data expansion of the image.

Wherein the preprocessing operation comprises processing the image by using a center cropping operation, a Resize operation, a ToTensor operation and a normalization operation in a transform network.

The invention also provides a device for classifying and identifying the images, which comprises the following components:

the preprocessing module is used for preprocessing the image and performing data expansion;

the convolution module is used for carrying out convolution operation processing on the image and carrying out feature extraction;

the pooling module is used for processing the images through pooling operation and reducing the output size;

the full-connection module is used for processing the images through full-connection operation, classifying and outputting a recognition result;

a weight initialization module for controlling convolution variance and gradient variance in the convolution module, pooling module and fully connected module;

and the prediction module is used for predicting the accuracy of the image and deciding whether to start the convolution module, the pooling module and the full-connection module.

The invention also provides a device for classifying and identifying images, which comprises a processor and a memory, wherein the processor realizes any one of the image classification and identification methods when executing the computer program stored in the memory.

The present invention also provides a computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements a method for classifying and identifying an image according to any one of the above aspects.

The invention has the beneficial effects that:

(1) According to the image classification and identification method provided by the invention, weight initialization operation is set in convolution operation, pooling operation and full connection operation, and the convolution variance and gradient variance in the convolution operation, pooling operation and full connection operation are kept consistent by the weight initialization operation, so that the forward gradient and convolution output in forward and reverse propagation are more stable in the process of processing the image, the problems of gradient explosion and gradient disappearance in the conventional image processing process are effectively solved, and the accuracy of a classification and identification result is remarkably improved on the basis of ensuring the processing speed;

(2) According to the image classification and identification method, the prediction module is arranged before convolution operation, pooling operation and full-connection operation, the prediction module can predict the accuracy of an input image and then determine whether to continue to execute the convolution operation, the pooling operation and the full-connection operation, and the accuracy of a classification and identification result can be further improved on the basis of shortening the image classification and identification time;

(3) According to the image classification and identification method provided by the invention, before the image is formally processed, the image is preprocessed, so that a small number of images can be expanded, and the influence on the processing effect of subsequent images due to class imbalance is avoided;

(4) The image classification and identification method provided by the invention has the advantages of low training loss, inhibition performance, training time and memory requirements and high training accuracy when classifying and identifying the MNIST handwritten character data set, the Arabic handwritten number data set and the clothing classification data set.

Drawings

The aspects and advantages of the present application will become apparent to those skilled in the art from a reading of the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention.

In the drawings:

FIG. 1 is a flow chart of a classification recognition method according to an embodiment;

FIG. 2 is a schematic structural diagram of the Lenet-5 model in the embodiment;

FIG. 3 is a schematic diagram of a pooling operation in an embodiment;

FIG. 4 is a diagram illustrating an embodiment in which an Lenet-5 optimization model processes MNIST handwritten character data sets;

FIG. 5 is a schematic diagram of an embodiment in which an optimization model of Lenet-5 processes an Arabic handwritten digital data set;

FIG. 6 is a schematic diagram of an embodiment in which an Lenet-5 optimization model processes a clothing classification dataset;

FIG. 7 is a schematic structural diagram of a classification recognition apparatus according to an embodiment;

fig. 8 is a schematic structural diagram of the classification recognition apparatus in the embodiment.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings.

Examples

The method for classifying and identifying an image provided by this embodiment, referring to fig. 1, includes:

The method specifically comprises the following steps:

1 convolution operation, pooling operation and full join operation

The first convolution operation, the first pooling operation, the first full-link operation, the second convolution operation, the second pooling operation and the second full-link operation all use convolution operation, pooling operation and full-link operation; the convolution operation is to extract the characteristic information; pooling is to reduce the image data output size; the full join operation is to classify the image results.

Wherein the first convolution operation, the first pooling operation and the first full-link operation are realized by sequentially passing through the convolution layer 1, the pooling layer 1, the convolution layer 2, the pooling layer 2, the full-link layer 1 and the full-link layer 2.

The second convolution operation, the second pooling operation, and the second fully-connected operation are to implement the second convolution operation, the second pooling operation, and the second fully-connected operation sequentially through the convolution layer 1, the pooling layer 1, the convolution layer 2, the pooling layer 2, the fully-connected layer 1, and the fully-connected layer 2.

In this embodiment, the applicant of the present invention implemented convolution, pooling and full-join operations among a first convolution, a first pooling and a first full-join operation and a second convolution, a second pooling and a second full-join operation using the Lenet-5 model, see fig. 2.

The Lenet-5 model comprises 7 processing layers in total, namely a convolutional layer 1, a pooling layer 1, a convolutional layer 2, a pooling layer 2, a full-link layer 1, a full-link layer 2 and an output layer in sequence. The convolutional layer 1 and the convolutional layer 2 are used for realizing convolutional operation and extracting characteristic information; the pooling layers 1 and 2 are used for realizing pooling operation and reducing the output size of image data; the full connection layer 1 and the full connection layer 2 are used for realizing full connection operation and classifying image results, and the output layer is used for outputting classification recognition results.

The convolutional layer 1 has 5 convolutional kernels with the size of 3 × 3 × 3, features of an input image are extracted through convolution operation, and nonlinearity of an Lenet-5 model is increased through a ReLU activation function. The formula for calculating the activation function ReLU is shown in equation (1):

f(x)＝max(x,0) (1)

x is the abscissa, the range is negative infinity to positive infinity, the ordinate represents the value mapped by the f (x) function, and the range is 0 to positive infinity. In the curve of the whole ReLU function, when x is from minus infinity to 0, f (x is always 0, and when x is from 0 to plus infinity, f (x) is increased in proportion from 0 along the function of y = x, namely, the output of the ReLU function is only 0 or positive number.

The convolution operation adopts a torch.nn.Con2d () function which is based on the maximum likelihood estimation principle and is similar to a minimum mean square error function, and a calculation formula is shown as a formula (2);

the feature map output by convolutional layer 1 has a size of 8 × 18 × 18, and is transmitted to pooling layer 1 for feature selection and information filtering.

The pooling layer 1 adopts maximum pooling, the maximum pooling method is that the maximum value in the window covering elements is taken, the size of a pooling kernel is 2 x 2, the step length is 2, the output size is 8 x 9, and the pooling operation can reduce the calculation amount and the number of parameters.

In the pooling operation, referring to fig. 3, a torch.nn.maxpool2d function is adopted, and the input size is set to (M, B, K, L), where M represents the maximum pooling window size, and may be a single value or a tuple; b represents the step size, which can be a single value or a tuple; k represents padding, can be a single value, and can also be a tuple; l represents the element stride in the control window. Output size (M, B, K) _out ,L _out ) PoolingWindow size kernel _size(kH,kW) The relationship of (1) is: out (M) _i ,B _j ,K,L)＝imput(M _i ,B _j ,stride[0]*K+m,stride1]* L + n); the kernel _ size is k, which represents not a convolution kernel in the conventional sense but a sliding window size function, the window size being arbitrarily specified, e.g. 3 if the input is a single value, e.g. 3, the window size is 3 x 3, and 3 x 2 if the input is a tuple, e.g. (3, 20). H is a Boolean type category, and a maximum value position index is returned; w is a Boolean type state, is True, and calculates an output shape by an upward rounding method; the default is to round down. m represents the number of sliding windows; n represents this also boolean value that determines whether to use a round-up or round-down and how to compute the output shape when computing the output shape.

The data volume of the image can be reduced to 164 multiplied by 4 after the image is activated, convolved and pooled, the image with the size of 16 multiplied by 4 is tiled into a 1-dimensional vector, and the result is obtained through a 2-layer full connection layer and a 1-layer output layer. Each neuron in the fully-connected layer is fully connected with all neurons in the previous layer, and local information with category distinction in the convolutional layer or the pooling layer can be integrated.

The full-connection operation adopts a torch.nn.Linear function, and can perform linear transformation on an input image in the form of y = xF ^T + G, wherein F is a weight matrix, T is a transpose matrix, G is an offset, a ReLU function is also adopted as an excitation function of each neuron of the full-junction layer, and the number of neurons of the 2-layer full-junction layer is respectively as follows: 120. 140 of the wafer.

The output layer outputs the correct probability value of the classification identification corresponding to the image, namely the output of the image is the classification identification result, the value is calculated by a softmax function, and the calculation formula is shown as formula (3):

wherein, y _i Is the output value of the ith node, generalThe output value can be converted to [0,1 ] by the softmax function]In the meantime.

2, adding weight initialization operation in convolution operation, pooling operation and full-connection operation to obtain an Lenet-5 optimization model

In order to obtain a more accurate image processing result, the applicant of the present invention sets a weight initialization operation in each of the first convolution operation, the first pooling operation and the first full-join operation and the second convolution operation, the second pooling operation and the second full-join operation; the weight initialization operation can keep the convolution variance and the gradient variance consistent in the first convolution operation, the first pooling operation, the first full-join operation, the second convolution operation, the second pooling operation and the second full-join operation, and make the convolution variance 1 when the image is forward-propagated in the first convolution operation, the first pooling operation, the first full-join operation, the second convolution operation, the second pooling operation and the second full-join operation, and the gradient variance 1 when the image is backward-propagated.

In this embodiment, the applicant of the present invention adds a weight initialization module based on the Lenet-5 model, and the weight initialization module is used for implementing weight initialization operation, so as to obtain a Lenet-5 optimization model. The Lenet-5 optimization model is used to process images to implement the first convolution operation, the first pooling operation, and the first fully-connected operation described above.

The weight initialization module is applied in the whole Lenet-5 model structure, and the main control changes are convolution operation, pooling operation and full-connection operation in the Lenet-5 model. The weight initialization module is activated when the image is input until the whole image classification recognition is finished and then is closed. The weight initialization module has the function of keeping the convolution variance and the gradient variance in the convolution operation, the pooling operation and the full-connection operation consistent, so that the values of convolution output and forward gradient tend to be in a stable state when the image in the Lenet-5 optimization model is propagated forwards or backwards, the problems of gradient explosion and gradient disappearance in the existing image processing process are effectively solved, and the obtained final classification recognition result is more accurate on the basis of ensuring the image processing speed.

The parameter of the weight initialization module still satisfies that the mean value is 0, and the mean value of the weight in the updating process is always 0. When the image is transmitted forward in the weight initialization module, the variance of the convolution calculation result of each layer is 1; when the image reversely propagates, the gradient variance of each layer which continuously forwards propagates is 1.

In the weight initialization module, the calculation formula is as shown in formula (4):

Y _l ＝W _l X _l +B _l (4)

Y _l an output value, X, representing a certain position _l Representing the convolved input, has a k × k × c shape, with k representing the size of the convolution kernel and c representing the input channel. Let n = k × k × c, the size of n indicates how many input values an output value is calculated from. W has a d × n shape, d denotes the number of output channels. The subscript l denotes the layer number X _l ＝f(Y _l-1 ) And f denotes the activation function ReLU. c. C _l ＝d _l-1 The output of the previous layer is changed into the input of the next layer through the activation function, and the number of input channels of the next layer of the network is equal to the number of output channels of the previous layer.

In this embodiment, var (X) ₁ +…+X _k )＝var(X ₁ )+…var(X _k ) Var () is the variance, X represents a random variable, and the subscript k represents the kth. In the weight initialization module, the variance of each convolutional layer is 1 when the image is transmitted forward, y of one output is obtained by multiplying x of n inputs by n weights thereof and then summing, the weights are independently and identically distributed, the values are also independently and identically distributed, the values and the weights are mutually independent, and the obtained calculation formula is shown as a formula (5):

var(y _p )＝n _p var(x _p ·w _p ) (5)

y, w and x respectively represent three different random variables, the reason that the three random variables are different is that the weight initialization module only comprises a convolution sum and a ReLU activation function, the default is a vgg network, no residual error and concat structure exist, no BN layer exists, and p represents the number of layers where the output of the weight value is located, namely the p-th layer.

To facilitate the understanding of the operation mechanism of weight initialization by those skilled in the art, we here exemplify:

y＝w ₁ ×x ₁ +w ₂ ×x ₂ +w ₃ ×x ₃ +w ₄ ×x ₄ +w ₅ ×x ₅ +w ₆ ×x ₆ (6)

wherein, w _{Tea table} ×x _{Tea table} Regarding as a whole, and 1 to 6 are independent of each other, the calculation formula can be further obtained as shown in formula (7):

var(y)＝var(w ₁ ×x ₁ )+var(w ₂ ×c ₂ )+var(w ₃ ×c ₃ )+var(w ₄ ×x ₄ )+var(w ₅ ×x ₅ )+var(w ₆ ×x ₆ ) (7)

when w is _{Tea table} ×x _{Tea table} They are distributed identically, and their variances are identical, the calculation formula can be obtained as shown in formula (8):

var(y)＝6var(w×x) (8)

further, because w ₁ And x ₁ Are independent of each other, so that the calculation formula is obtained as shown in formula (9):

var(y _l )＝n _l [var(w _l )var(x _l )+var(w _l )(Ex _l )2 ² +(Ew _l ) ² var(x _l )] (9)

let the mean value of the weights be 0 in weight initialization, and assuming that the mean value of the weights is always 0 in the updating process, E (w) _l ) =0, but x ₁ Is the upper layer obtained by the ReLU activation function, so E (x) _l ) Not equal to 0, then the calculation formula is obtained as shown in formula (10):

var(y _l )＝n _l [var(w _l )var(x _l )+var(w _l )(Ex _l ) ² ]＝n _l var(w _l )(var(x _l )+(Ex _l ) ² ) (10)

wherein the content of the first and second substances,

equation (10) changes to:

then, the method obtains

This expectation is found by the output of the l-1 th layer, x _l ＝f ₀ (y _l-1 ) Where f represents the ReLU activation function, the resulting calculation formula is shown in equation (12):

y _l-1 e (- ∞, 0), f (y) _l-1 ) And =0. Removing the interval smaller than 0 and f (y) larger than 0 _l-1 )＝y _l-1 The available calculation formula is shown in formula (13):

assuming a symmetrical distribution around 0 and an average of 0, so y _l-1 Also around 0 the distribution is symmetric and the mean is 0 (assuming a bias of 0), then

Further, a calculation formula can be obtained as shown in formula (14): />

Middle y _l-1 Is 0, then->

Then equation (14) can be further derived as:

substituting equation (15) into equation (11) results in a calculation equation as shown in equation (16):

then, forward propagation is performed from the first layer all the way forward, and the variance of a certain layer can be obtained as follows:

here y _l Is an input image sample, and the normalization processing is carried out to obtain var (y) ₁ ) =1, then, each layer output variance is equal to 1, resulting calculation formulas are shown as formula (18) and formula (19)

In the weight initialization operation, the input size of the convolutional layer is 32 × 16 × 16, which indicates the number of channels, height, and width, respectively, and the convolution kernel size is 64 × 32 × 3 × 3, which indicates the number of output channels, the number of input channels, the height of the convolution kernel, and the width of the convolution kernel, respectively. The weight of the layer is then:

the offset initialization is 0.64 × 32 × 3=184.32.

In this embodiment, the variance of the gradient is 1 when the weight initialization propagates backward, then

Where Δ represents the derivative of the loss function, unlike the normal back propagation derivation, where Δ Y is assumed _l Denotes d channels, each of which is k × k large, and/or is selected>

Δ Y as in forward propagation _l Having c channels, Δ X _l Has d channels, is selected and/or selected>

Is greater or less than>

So Δ X _l The shape is c × l. />

And W differ by only one transposition. Likewise, a Δ X _l Is a plurality of Δ y _l And obtaining, namely continuously obtaining the variance of one variable (gradient) through a plurality of independent same-distribution variables. Hypothesis random variable>

And Δ y _l Are all independently and equally distributed>

Is symmetrical around 0, Δ x _l For each layer, the L mean value is 0, i.e., E (Δ x) _l ) And =0. Because of the time of forward propagation x _l+1 ＝f(y _l ) Therefore, the backward propagation is Δ y _l ＝f'(y _l )Δx _l+1 . And because f is the ReLU activation function, the

derivative isEither

0 or 1, then assume that both are in half each, while assuming f' (y) _l ) And Δ x _l+1 Independent of each other, a calculation formula is obtained as shown in formula (20):

the probability is divided into two parts, where one part of the derivative of the corresponding ReLU activation function is 0 and one part of the derivative of the corresponding ReLU activation function is 1 (and both parts are assumed to be 50% possible). Equation (20) shows for a Δ y _l If half of the probability corresponds to that the derivative of the ReLU activation function is 0 and half corresponds to 1, the calculation formula is obtained as shown in formula (21):

equation (22) can also be obtained by equation (7) in a similar manner, and the calculation equation is obtained as shown in equation (23):

therefore, according to the forward derivation method, the final calculation formula is:

according to the last example of forward propagation, here we shall be

3, adding the Lenet-5 optimization model into the prediction operation to obtain a Lenet-5 better model

In order to shorten the image processing time and further improve the accuracy of the image processing result, the applicant of the present invention sets a prediction operation before the second convolution operation, wherein the prediction operation is used for predicting the accuracy of the input image, if the accuracy is larger than or equal to a first threshold, the result is directly output, and if the accuracy is smaller than the first threshold, the second convolution operation, the second pooling operation and the second full-connection operation are performed to process the image.

In this embodiment, the applicant of the present invention adds a prediction module based on the above-mentioned Lenet-5 optimization model, specifically, adds a prediction module before the Lenet-5 optimization model, and the prediction module is used to implement prediction operation, so as to obtain a more optimal Lenet-5 model. The Lenet-5 preferred model is used to process images implementing the second convolution operation, the second pooling operation, and the second fully-connected operation described above.

It can be seen that the first convolution operation, pooling operation and full join operation are different from the second convolution operation, pooling operation and full join operation in that: the first convolution operation, pooling operation and full join operation (Lenet-5 optimization model) do not contain a prediction module, but contain a weight initialization module; the second convolution operation, pooling operation and full join operation (Lenet-5 preferred model) contain both the weight initialization module and the prediction module.

The essence of the prediction module is used for predicting the accuracy of an input image, if the accuracy requirement is met, the processing steps can be reduced, the result is directly output, and if the accuracy requirement is not met, the next Lenet-5 optimization model processing is carried out.

The processing steps of the prediction module are that when the image enters the Lenet-5 better model, the image firstly enters the prediction module, and in order to ensure that the style of the image meets the processing standard of the prediction module when the image enters the prediction module and is convenient to be processed by the prediction module, the input image needs to be subjected to style conversion, and then the accuracy prediction operation is performed.

Specifically, the prediction module reads an image, realizes conversion from the image to an array by a formrary function, and calls a cvtColor function in an opencv function library for conversion in different spaces in the image. After data style conversion, carrying out correct rate operation of a prediction image of a prediction module, calling a formula (3) to carry out correct rate prediction operation of the image, if the predicted correct rate is larger than or equal to a first threshold value, skipping weight initial operation and convolution operation, pooling operation and full connection operation (namely Lenet-5 optimization model) processing, if skipping the step, calling a Numpy function library and an srueze function to execute output expansion and scaling operation of the image, ensuring that the size of the image output is exactly consistent with the format required by the Lenet-5 optimization model, and then outputting a classification identification result by utilizing an output layer; if the predicted accuracy rate is less than the first threshold value, the image is processed by the following weight initialization operation and convolution operation, pooling operation and full connection operation (namely, lenet-5 optimization model), and then the classification recognition result is output from the output layer. To ensure the accuracy of the output classification recognition result, the inventors of the present application set the first threshold to 99.8%. NumPy is a third party library in Python language that supports a large number of high-dimensional array and matrix operations, and NumPy also provides a large number of mathematical functions for array operations.

The prediction module function operation code is as follows:

for image in os.listdir(args.data_folder):

image＝cv2.imread(os.path.join(args.data_folder,image)image＝cv2.cvtcolor(imag e,cv2.COLOR_BGR2GRAY)

image.IS＝Image.fromarray(image)

image I＝data_transform(image_IS)

image I＝image_I.to(device)

image.I＝image_I.unsqueeze(0)

output＝model(image_)

output＝torch.softmax(output,dim＝1)

output＝output.squeezeo

predict_idx＝torch.argmax(output).cpu().numpyO

name＝class_names[str(predict_idx)]

score＝output[predict_idx]

res = "category { } probability {: 3f }". Format (name, score)

plt.imshow(image_IS)

plt.title(res)

plt.show()

4 preprocessing images

In order to avoid the influence of the class imbalance on the image processing effect, the inventor of the present application performs preprocessing on the image before the image is subjected to formal processing, i.e., before the image is processed by the first convolution operation, the first pooling operation and the first full join operation (Lenet-5 optimization model), and before the image is processed by the second convolution operation, the second pooling operation and the second full join operation (Lenet-5 better model), so as to achieve data expansion of the image through preprocessing and ensure that the preprocessed image is not influenced by the class imbalance.

In this embodiment, the applicant of the present invention performs a preprocessing operation on an image based on a transform network module under a Pytorch. Specifically, center clipping, resize (), toTensor () and normalization operations are performed on the image, thereby realizing data of the extended image.

In the process of center cropping the image, for the Chinese data set, setting the center cropping parameters as (20, 12); for non-Chinese datasets, the center clipping parameter is set to (20, 15).

5Lenet-5 optimizing model processes the preprocessed image to obtain classification identification parameters

In this embodiment, the applicant of the present invention performs a first convolution operation, a pooling operation, and a full join operation on the preprocessed image to obtain a classification identification parameter; specifically, the preprocessed image is processed by utilizing a Lenet-5 optimization model containing a weight initialization module to obtain a classification identification parameter.

Because the difference between the Chinese data set and the non-Chinese data set is large, the Chinese data set and the non-Chinese data set are not suitable for training together. In order to obtain a good classification recognition effect, different images are separately placed into an Lenet-5 optimization model for training, so that different classification recognition parameters are obtained. Wherein, the training round is set to be 20, and the learning rate is set to be 0.001, so that better classification and identification parameters can be obtained.

6, combining the classification identification parameters with the Lenet-5 better model to obtain a classification identification model

In this embodiment, the inventors of the present application combine the classification recognition parameters with the second convolution operation, the pooling operation, and the full join operation, whereby the accuracy of classifying the recognition image can be improved. Specifically, different classification identification parameters are respectively combined with the Lenet-5 better model comprising the weight initialization module and the prediction module, and specifically, related parameters in the corresponding Lenet-5 better model are adjusted according to the obtained different classification identification parameters, so that the purpose of matching and combining the different classification identification parameters with the corresponding Lenet-5 better model is achieved, and then different classification identification models are obtained, and the identification accuracy of the classification identification models is improved.

7, processing the preprocessed image by the classification recognition model to obtain a classification recognition result

In order to obtain a better classification recognition result, different classification recognition models are used for recognizing corresponding images during image recognition, the images processed by the classification recognition models are all preprocessed, the preprocessing step is detailed in the section 4, and the classification recognition result is finally obtained.

In order to verify the effect of the classification identification method provided in this embodiment, the inventors of the present application performed a related experiment. The classification and identification method provided by the embodiment is different from other existing classification and identification methods.

The images used in the experiment are three known data sets, namely an MNIST handwritten character data set, an Arabic handwritten number data set and a clothing classification data set, and in order to enable each image to have persuasiveness and identification, each image is obtained from different organizations and personnel, and the organizations comprise hospitals, factories, universities, government organizations and the like.

Firstly, respectively putting the three preprocessed data sets into an Lenet-5 optimization model for training to obtain three classification identification parameters: model 1, model 2, model 3. Then, when image classification recognition is carried out, referring to fig. 4-6, after combining the model 1 and the Lenet-5 better model, classification recognition is carried out on the preprocessed MNIST handwritten character data set, after combining the model 2 and the Lenet-5 better model, classification recognition is carried out on the preprocessed Arabic handwritten number character data set, and after combining the model 3 and the Lenet-5 better model, classification recognition is carried out on the preprocessed clothing classification data set. Meanwhile, the existing other baseline models are used for carrying out classification identification on the three data sets, and the experimental results are shown in tables 1-3. The Improved LeNet-5 model is an abbreviation of the classification recognition model obtained by the classification recognition method provided in this embodiment, and will not be described repeatedly below.

Table 1: comparison of Improved LeNet-5 model with other baseline models under MNIST handwritten character data set

/>

Table 2: comparison of Improved LeNet-5 model with other baseline models under Arabic handwritten digital data set

Table 3: comparison of Improved LeNet-5 model with other baseline models under apparel classification dataset

The experimental results in tables 1 to 3 show that the classification and recognition method provided by the embodiment has the advantages of low training loss, suppression performance, training time and memory requirements and high training accuracy when classifying and recognizing the MNIST handwritten character data set, the Arabic handwritten number data set and the clothing classification data set, and each performance is obviously superior to other model baselines, so that the classification and recognition method provided by the embodiment has higher classification and recognition speed and higher classification and recognition accuracy.

As described above, the present invention discloses an image classification and recognition apparatus, referring to fig. 7, including: the preprocessing module is used for preprocessing the image and performing data expansion; the convolution module is used for carrying out convolution operation processing on the image and carrying out feature extraction; the pooling module is used for processing the images through pooling operation, reducing the output size and fully connecting the images, classifying and outputting the identification result; the weight initialization module is used for controlling convolution variance and gradient variance in the convolution module, the pooling module and the full-connection module; and the prediction module is used for predicting the accuracy of the image and deciding whether to enable the convolution module, the pooling module and the full-connection module, and the device performs all the steps of the image classification identification method disclosed by the embodiment and can be independently arranged in hardware.

The invention also discloses a device for classifying and identifying images, which is shown in fig. 8 and comprises a processor and a memory, wherein the processor executes a computer program stored in the memory to realize the method for classifying and identifying images disclosed by the embodiment.

Meanwhile, the invention also provides a computer readable storage medium for storing a computer program, wherein the computer program is executed by a processor to realize the image classification and identification method disclosed by the embodiment.

According to the image classification and identification method provided by the embodiment, the weight initialization operation is set in the convolution operation, the pooling operation and the full-connection operation, and the convolution variance and the gradient variance in the convolution operation, the pooling operation and the full-connection operation are kept consistent through the weight initialization operation, so that the forward gradient and the convolution output in forward and reverse propagation are more stable in the process of processing the image, the problems of gradient explosion and gradient disappearance in the existing image processing process are effectively solved, and the accuracy of a classification and identification result is remarkably improved on the basis of ensuring the processing speed; meanwhile, a prediction module is arranged before convolution operation, pooling operation and full-connection operation, the prediction module can predict the accuracy of an input image and then determine whether to continue to execute the convolution operation, the pooling operation and the full-connection operation, and the accuracy of a classification recognition result can be further improved on the basis of shortening the image classification recognition time; in addition, before the images are formally processed, the images are preprocessed, so that the images with small quantity can be expanded, the processing effect of the subsequent images is prevented from being influenced due to class imbalance, in the method, the classification identification parameters obtained by preprocessing the images are combined with convolution operation, pooling operation and full-connection operation, and then the preprocessed images are formally processed, so that the accuracy of the classification identification result can be improved; in addition, the method has the advantages of low training loss, suppression performance, training time and memory requirements and high training accuracy when the MNIST handwritten character data set, the Arabic handwritten number data set and the clothing classification data set are classified and recognized.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A classification recognition method of an image is characterized by comprising the following steps:

the weight initialization operation keeps consistent convolution variance and gradient variance in the first convolution operation, the first pooling operation and the first full-join operation and the second convolution operation, the second pooling operation and the second full-join operation.

2. The classification recognition method according to claim 1, wherein the weight initialization operation is performed such that the convolution variance is 1 when the image is forward-propagated in the first convolution operation, the first pooling operation, the first full-join operation, and the second convolution operation, the second pooling operation, and the second full-join operation, and the gradient variance is 1 when the image is backward-propagated.

3. The classification recognition method according to claim 1, wherein the first convolution operation, the first pooling operation, and the first full-link operation are implemented sequentially by a convolution layer 1, a pooling layer 1, a convolution layer 2, a pooling layer 2, a full-link layer 1, and a full-link layer 2;

4. The classification recognition method according to claim 1, wherein the second convolution operation is preceded by a prediction operation for predicting a correct rate of the image;

5. The classification recognition method according to claim 4, wherein the prediction operation includes a formerry function, a cvtColor function, a softmax function, and a squeeze function;

the softmax function is used for realizing the prediction image correct rate operation;

6. The classification recognition method according to claim 1, wherein the image is subjected to a preprocessing operation before being subjected to the first convolution operation, the first pooling operation and the first full-join operation, and the second convolution operation, the second pooling operation and the second full-join operation, and the preprocessing operation is used for realizing data expansion of the image.

7. The classification recognition method according to claim 6, wherein the preprocessing operation comprises processing the image by using a center cropping operation, a Resize operation, a ToTensor operation and a normalization operation in a transforms network.

8. An apparatus for classifying and recognizing an image, comprising:

the weight initialization module is used for controlling convolution variance and gradient variance in the convolution module, the pooling module and the full-connection module;

9. An image classification recognition device, comprising a processor and a memory, wherein the processor implements the image classification recognition method according to any one of claims 1 to 7 when executing a computer program stored in the memory.

10. A computer-readable storage medium for storing a computer program, wherein the computer program is adapted to implement the method for classifying and identifying an image according to any one of claims 1-7 when executed by a processor.