CN116824239A

CN116824239A - Image recognition method and system based on transfer learning and ResNet50 neural network

Info

Publication number: CN116824239A
Application number: CN202310722593.1A
Authority: CN
Inventors: 臧建东; 沈骞; 吴金花; 徐寅; 胡婷
Original assignee: Huaiyin Institute of Technology
Current assignee: Huaiyin Institute of Technology
Priority date: 2023-06-19
Filing date: 2023-06-19
Publication date: 2023-09-29

Abstract

The invention discloses an image recognition method and system based on transfer learning and a ResNet50 neural network, wherein the method comprises the steps of constructing the ResNet50 neural network as a reference model, and optimizing and improving the model; training the optimized and improved reference model by adopting a transfer learning mode, and establishing an image pre-recognition model; establishing a sample image data set, and preprocessing images in the data set; dividing a training set and a testing set for the preprocessed sample image data set by adopting a five-fold cross validation method based on an image pre-recognition model; training the image pre-recognition model by using a training set to obtain an image recognition model; testing the trained image recognition model by using a test set; the system comprises a data processing module, a model training module and a model analysis module. The invention solves the problems of poor recognition performance, low accuracy and difficult accurate detection and positioning of small target objects and the need of a large amount of data in model training in the prior art image recognition method.

Description

Image recognition method and system based on transfer learning and ResNet50 neural network

Technical Field

The invention relates to the technical field of image recognition, in particular to an image recognition method and system based on transfer learning and ResNet50 neural network.

Background

Image recognition refers to the analysis, processing and understanding of an input digital image by a computer, thereby classifying the image, detecting an object, segmenting a scene, or the like. In the prior art, a multi-layer Convolutional Neural Network (CNN) is often adopted to extract and classify the characteristics of the image, so that high-precision identification, such as GoogLeNet, VGGNet, resNet, can be realized on a large data set.

The CNN uses multi-layer convolution operation and nonlinear activation functions to extract the characteristics of input data, so that more abstract and advanced characteristic representation is learned, and the classification and recognition accuracy of the model is improved. However, in practical applications, the following drawbacks exist: CNN is sensitive to deformation conditions of input images, and can influence classification and recognition accuracy of the model; for images with unclear boundaries and uneven illumination, the recognition performance of CNN is poor; at the same time, a large number of data sets are required for the network model training to obtain a reliable model. In addition, due to the small size of the small target object, the information of the small target object in the image is sparse, and the small target object is difficult to capture by the shallow characteristics of the CNN, and the detection of the small target object by the CNN is usually realized by the convolution operation with the small receptive field through the shallow characteristics layer, so that the small target object is difficult to accurately detect and position by the CNN.

Disclosure of Invention

The invention aims to: the invention aims to provide an image recognition method and system based on transfer learning and ResNet50 neural network, which are high in accuracy and speed for recognizing images.

The technical scheme is as follows: in order to achieve the above purpose, the image recognition method based on the transfer learning and ResNet50 neural network of the present invention comprises the following steps:

step S1: constructing a ResNet50 neural network as a reference model, and optimizing and improving the model to obtain an optimized and improved ResNet50 neural network model;

step S2: training the optimized and improved ResNet50 neural network model by adopting a transfer learning mode, and establishing an image pre-recognition model;

step S3: establishing a sample image data set, and preprocessing images in the sample image data set;

step S4: dividing a training set and a testing set for the preprocessed sample image data set by adopting a five-fold cross validation method based on an image pre-recognition model;

step S5: training the image pre-recognition model by utilizing the training set in the step S4, and fine-tuning model parameters again to obtain the image recognition model;

step S6: and (3) testing the image recognition model trained in the step (S5) by using the test set in the step (S4) to obtain an image recognition result.

The step S1 is to select a res Net50 neural network as a reference model, optimize and improve the model, obtain an optimized and improved res Net50 neural network model, that is, improve a Huber loss function, introduce an ECA-Net attention mechanism into the res Net50 neural network, and construct a bidirectional pyramid structure to improve the model, and then select an optimizer to optimize the improved res Net50 neural network, thereby obtaining an optimized and improved res Net50 neural network model, which comprises the following sub-steps:

step S101: the ResNet50 neural network is constructed as a reference model, and comprises five stages, namely:

the first stage: the pixel values of the input image are sequentially output through a convolution layer, a BN layer, a ReLU activation function and a MaxPooling layer;

the second stage to the fifth stage are composed of Bottleneck layers, namely, bottleneck layers Bottleneck, and the Bottleneck layers Bottleneck respectively comprise 3, 4, 6 and 3 Bottleneck layers Bottleneck;

step S102: selecting the improved Huber loss function as a ResNet50 neural network loss function;

the expression of the improved Huber loss function is as follows:

wherein E (x) represents the modified loss function, delta represents the residual critical value, y represents the actual value, and f (x) represents the predicted value;

step S103: introducing an ECA-Net attention mechanism into a ResNet50 neural network to improve;

the ECA-Net attention mechanism generates weights for each channel by one-dimensional convolution of size k, namely:

ω＝δ(C1D _k (y))

wherein C1D _k Representing one-dimensional convolution with a convolution kernel k, y representing the channel, and δ representing the sigmoid activation function; k is related to the channel dimension, the larger the range k of local cross-channel interactions;

the k value is determined by an adaptive function C related to the channel dimension, namely:

in the I _odd The values of γ and b are set to 2 and 1, respectively, expressed as the nearest odd number; c is an adaptive function;

step S104: the method comprises the steps that a constructed bidirectional pyramid structure is introduced into a ResNet50 neural network to improve the ResNet50 neural network, and high-resolution shallow features and deep features are fused in a ResNet50 neural network feature layer through the bidirectional pyramid structure;

step S105: and selecting an optimizer to optimize the improved ResNet50 neural network, namely taking a novel Ranger optimizer as an optimizer trained by the improved ResNet50 model, so as to obtain an optimized and improved ResNet50 neural network model.

In the first stage of step S101, the input image pixel value convolution layer calculation process is:

wherein x represents an input image sample pixel value array; padding indicates the number of layers that each side of the input is supplemented with 0, in order to maintain consistent feature map size before and after the convolutional layer,

kernel represents the size of the convolution kernel, stride represents the convolution step size;

the process of calculating the mean value of the characteristic diagram generated by the BN layer on the convolution layer is as follows:

wherein m represents the total amount of input image samples, x _i An array of pixel values representing a set of input image samples, i=1, 2, m;

the process of calculating standard deviation of the characteristic diagram generated by the BN layer on the convolution layer is as follows:

the normalization processing process of the BN layer on the characteristic map generated by the convolution layer comprises the following steps:

wherein ε represents the offset;

and carrying out reconstruction change on the normalized characteristic diagram:

y _i ＝γ×x _i +β，

wherein, gamma, x _i Beta is respectively

The ReLU activation function formula is:

f(x)＝max(0,x)

MaxPooling layer: the whole image is divided into a plurality of small blocks with the same size by non-overlapping, and each small block only takes the largest number, and after other nodes are discarded, the original plane structure is maintained to obtain an output result.

In the ResNet50 neural network feature layer described in step S104, the fusion process of the high resolution shallow features and the deep features is as follows:

carrying out pooling operation on the input image, and obtaining a characteristic layer Conv7-2 after carrying out pooling operation; profile P for Conv7-2 _7-2 Upsampling to generate a feature map P 'with the same height and width as the Conv6-2 layer feature map' _7-2 The dimension is 10 multiplied by 256; the number of channels in Conv6-2 layer was adjusted to 256 by 1X 1 convolution to generate a feature map P' _6-2 So that the characteristic map P' _7-2 The dimension fused with the side edge remains unchanged; feature map P 'is subjected to Concat feature fusion' _6-2 And P' _7-2 Splicing to generate a feature map P _6-2 The method comprises the steps of carrying out a first treatment on the surface of the After the two up-sampling and side fusion processes, an output characteristic diagram P 'of a pyramid from top to bottom can be obtained at the characteristic layer Conv 4-3' _4-3 The method comprises the steps of carrying out a first treatment on the surface of the Then the channel number of the feature layers pool1, pool2 and pool3 is changed by utilizing 1X 1 convolution, bilinear interpolation is used for downsampling, and feature fusion is carried out by adopting an Add mode, so that a feature map P', which contains position and detail information, is obtained at the feature layer Conv4-3 _4-3 And then the characteristic diagram P _4-3 ，P′ _4-3 ，P″ _4-3 Summing the corresponding elements to obtain a final fused characteristic diagram P '' _4-3 。

The training of the optimized and improved ResNet50 neural network model by adopting a transfer learning mode in the step S2 is performed, and an image pre-recognition model is built, specifically: randomly selecting a large number of pictures from the ImageNet dataset, and dividing the pictures into a training set and a testing set according to the ratio of 4:1;

pre-training the optimized and improved ResNet50 neural network model by using a training set, namely freezing a convolution block close to an input end in the pre-training model, keeping the weight of an initial layer unchanged, training the convolution block and a full-connection classifier which are remained close to an output end by using the training set to obtain a new weight, wherein the new weight is obtained by subtracting a counter-propagating error from the initial weight, reducing the value of the weight when the counter-propagating error is positive, increasing the value of the weight when the counter-propagating error is negative, and obtaining the optimized and improved ResNet50 neural network model after training and fine-tuning the weight value as an image pre-recognition model; and checking the performance of the image pre-recognition model by using a test set, wherein the performance comprises the accuracy rate and the loss rate of image recognition.

The step S3 of establishing a sample image dataset refers to acquiring a plurality of groups of sample images by using an image acquisition device; preprocessing operations are performed on images within the sample image dataset, including cropping, flipping, rotating, and color enhancement operations on the images.

The method for classifying the preprocessed sample image data set into a training set and a testing set by adopting a five-fold cross validation method based on the image pre-recognition model in the step S4 is used for avoiding the over-fitting phenomenon of the model on a specific data set, and specifically comprises the following steps:

dividing the preprocessed sample image data set into five subsets with the same size, sequentially taking one subset as a verification set, taking the other four subsets as training sets, cycling for five times, taking different subsets as the verification sets each time, and finally obtaining an average value of evaluation results of five recognition models; and comparing the data of each group, and taking the group with the best data as a final training set and a test set dividing standard.

The training of the image pre-recognition model by using the training set in the step S4 and fine tuning of model parameters again to obtain the image recognition model in the step S5 refers to inputting the image values in the training set into the image pre-recognition model, obtaining the image predicted value through the internal convolution layer, BN layer, reLU activation function and MaxPooling layer, calculating the deviation between the predicted value and the true value through the Huber loss function, and if the deviation is greater than the set threshold, fine tuning the weight parameters of the res net50 neural network, iterating until the deviation is less than or equal to the threshold, and completing training.

The step S5 of testing the image recognition model trained in the step S5 by using the test set in the step S4 to obtain an image recognition result specifically includes: inputting the image values in the test set into the image recognition model trained in the step S5, adding a flexible maximum exponential function Softmax classifier into the model, and obtaining the probability of each image classification result by the classifier, so that the value range of each probability value is [0,1], and the sum of all probability values is 1, wherein the maximum probability is the final image recognition result.

The invention also provides an image recognition system based on the transfer learning and ResNet50 neural network, which comprises a data processing module, a model training module and a model analysis module;

the data processing module is used for acquiring a plurality of groups of sample images by using the image acquisition equipment and preprocessing the images;

the model training module comprises the steps of training and fine-tuning a ResNet-50 network structure and constructing an image recognition model on the basis of the ResNet-50 network structure;

the model analysis module is used for performing performance analysis on the image recognition model.

The beneficial effects are that: the invention has the following advantages:

1. the image recognition method takes the ResNet50 neural network as a reference model, and introduces the improved Huber loss function, ECA-Net attention mechanism, bidirectional pyramid structure and other technologies to improve and optimize the ResNet50 neural network, so that the performance and robustness of the model are improved, the classification capacity and recognition accuracy of the model are enhanced, and the model has higher recognition degree on small target objects;

2. according to the image recognition method, the Ranger optimizer is selected to optimize the improved reference model, so that the problems of gradient disappearance and gradient explosion can be effectively solved, and the accuracy and training speed of the model are further improved;

3. the image recognition method uses a transfer learning technology, namely, the existing data is utilized to train the model after improvement and optimization, an image pre-recognition model is established, the problem caused by initializing parameters when the model is trained from the beginning is avoided, meanwhile, the problem of insufficient sample number is solved, the model training amount is reduced, and the training efficiency is improved;

4. according to the invention, the performance evaluation is carried out on the image pre-recognition model by adopting a five-fold cross validation method, so that the preprocessed sample image dataset is divided into the test set and the training set, the overfitting of the model on a specific dataset is avoided, and the generalization capability of the model is improved.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention;

fig. 2 is a schematic flow chart of training an image pre-recognition model.

Detailed Description

The technical scheme of the present invention will be described in detail with reference to the following examples and the accompanying drawings.

As shown in fig. 1, the image recognition method based on the transfer learning and the res net50 neural network of the present invention includes the following steps:

the expression of the improved Huber loss function is as follows:

ω＝δ(C1D _k (y))

wherein ε represents the offset;

y _i ＝γ×x _i +β，

wherein, gamma, x _i Beta is respectively

The ReLU activation function formula is:

f(x)＝max(0,x)

The training of the image pre-recognition model by using the training set in the step S4 and fine tuning of model parameters again to obtain the image recognition model in the step S5 refers to inputting the image values in the training set into the image pre-recognition model, obtaining the image predicted value through the internal convolution layer, BN layer, reLU activation function and MaxPooling layer, calculating the deviation between the predicted value and the true value through the Huber loss function, and if the deviation is greater than the set threshold, fine tuning the weight parameters of the res net50 neural network, iterating until the deviation is less than or equal to the threshold, and completing training. As shown in fig. 2, a flow chart for training an image recognition model is shown.

The image recognition method takes the ResNet50 neural network as a reference model, and introduces the improved Huber loss function, ECA-Net attention mechanism, bidirectional pyramid structure and other technologies to improve and optimize the ResNet50 neural network, so that the performance and robustness of the model are improved, the classification capacity and recognition accuracy of the model are enhanced, and the model has higher recognition degree on small target objects; the Huber loss function can enable the model to be converged more smoothly, so that the training speed is increased, and noise and fluctuation in the training process are reduced; the ECA-Net attention mechanism can help the model to pay attention to important features better, and improves the accuracy and robustness of the model; the bidirectional pyramid structure can realize bidirectional fusion of high-low layer features, and further improve feature extraction and expression capability of the model.

The image recognition method uses a transfer learning technology, namely, the existing data is utilized to train the model after improvement and optimization, an image pre-recognition model is built, the problem caused by initializing parameters when the model is trained from the beginning is avoided, meanwhile, the problem of insufficient sample number is solved, the model training amount is reduced, and the training efficiency is improved.

According to the invention, the performance evaluation is carried out on the image pre-recognition model by adopting a five-fold cross validation method, so that the preprocessed sample image dataset is divided into the test set and the training set, the overfitting of the model on a specific dataset is avoided, and the generalization capability of the model is improved.

Claims

1. An image recognition method based on transfer learning and ResNet50 neural network is characterized by comprising the following steps:

2. The image recognition method based on transfer learning and res Net50 neural network according to claim 1, wherein in step S1, the res Net50 neural network is selected as a reference model, and the model is optimized and improved to obtain an optimized and improved res Net50 neural network model, that is, a Huber loss function is improved, an ECA-Net attention mechanism is introduced into the res Net50 neural network, and a bidirectional pyramid structure is constructed to improve the model, and then an optimizer is selected to optimize the improved res Net50 neural network, thereby obtaining an optimized and improved res Net50 neural network model, comprising the following sub-steps:

the expression of the improved Huber loss function is as follows:

ω＝δ(C1D _k (y))

3. The image recognition method based on the transfer learning and the ResNet50 neural network according to claim 2, wherein in the first stage of step S101, the input image pixel value convolution layer calculation process is as follows:

wherein ε represents the offset;

y _i ＝γ×x _i +β，

wherein, gamma, x _i Beta is respectively

The ReLU activation function formula is:

f(x)＝max(0,x)

4. The image recognition method based on transfer learning and a res net50 neural network according to claim 2, wherein in the res net50 neural network feature layer in step S104, a high resolution shallow feature and deep feature fusion process is as follows:

5. The image recognition method based on transfer learning and res net50 neural network according to claim 1, wherein in step S2, the optimized and improved res net50 neural network model is trained by adopting a transfer learning manner, and an image pre-recognition model is built, which specifically comprises: randomly selecting a large number of pictures from the ImageNet dataset, and dividing the pictures into a training set and a testing set according to the ratio of 4:1;

6. The image recognition method based on the transfer learning and the ResNet50 neural network according to claim 1, wherein the step S3 of creating the sample image data set is to acquire a plurality of groups of sample images by using an image acquisition device; preprocessing operations are performed on images within the sample image dataset, including cropping, flipping, rotating, and color enhancement operations on the images.

7. The image recognition method based on the transfer learning and the ResNet50 neural network according to claim 1, wherein the image pre-recognition model in step S4 is characterized in that a five-fold cross-validation method is adopted to divide the preprocessed sample image data set into a training set and a testing set so as to avoid the over-fitting phenomenon of the model on a specific data set, and specifically comprises the following steps:

8. The image recognition method based on the transfer learning and the ResNet50 neural network according to claim 1, wherein in the step S5, the training set in the step S4 is utilized to train the image pre-recognition model, fine tuning is performed on model parameters again to obtain the image recognition model, namely, the image values in the training set are input into the image pre-recognition model, the image predicted value is obtained through an internal convolution layer, a BN layer, a ReLU activation function and a MaxPooling layer, the deviation between the predicted value and the true value is calculated through a Huber loss function, if the deviation is larger than a set threshold, the ResNet50 neural network weight parameter is fine-tuned, and repeated iteration is performed until the deviation is smaller than or equal to the threshold, so that training is completed.

9. The image recognition method based on the transfer learning and the ResNet50 neural network according to claim 1, wherein the step S5 is characterized in that the image recognition model trained in the step S5 is tested by using the test set in the step S4 to obtain an image recognition result, and specifically comprises the following steps: inputting the image values in the test set into the image recognition model trained in the step S5, adding a flexible maximum exponential function Softmax classifier into the model, and obtaining the probability of each image classification result by the classifier, so that the value range of each probability value is [0,1], and the sum of all probability values is 1, wherein the maximum probability is the final image recognition result.

10. An image recognition system based on a transfer learning and ResNet50 neural network, which is suitable for the method of any one of claims 1-9, and is characterized by comprising a data processing module, a model training module and a model analysis module;