CN116824239A - Image recognition method and system based on transfer learning and ResNet50 neural network - Google Patents

Image recognition method and system based on transfer learning and ResNet50 neural network Download PDF

Info

Publication number
CN116824239A
CN116824239A CN202310722593.1A CN202310722593A CN116824239A CN 116824239 A CN116824239 A CN 116824239A CN 202310722593 A CN202310722593 A CN 202310722593A CN 116824239 A CN116824239 A CN 116824239A
Authority
CN
China
Prior art keywords
model
neural network
image
training
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310722593.1A
Other languages
Chinese (zh)
Inventor
臧建东
沈骞
吴金花
徐寅
胡婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaiyin Institute of Technology
Original Assignee
Huaiyin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaiyin Institute of Technology filed Critical Huaiyin Institute of Technology
Priority to CN202310722593.1A priority Critical patent/CN116824239A/en
Publication of CN116824239A publication Critical patent/CN116824239A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses an image recognition method and system based on transfer learning and a ResNet50 neural network, wherein the method comprises the steps of constructing the ResNet50 neural network as a reference model, and optimizing and improving the model; training the optimized and improved reference model by adopting a transfer learning mode, and establishing an image pre-recognition model; establishing a sample image data set, and preprocessing images in the data set; dividing a training set and a testing set for the preprocessed sample image data set by adopting a five-fold cross validation method based on an image pre-recognition model; training the image pre-recognition model by using a training set to obtain an image recognition model; testing the trained image recognition model by using a test set; the system comprises a data processing module, a model training module and a model analysis module. The invention solves the problems of poor recognition performance, low accuracy and difficult accurate detection and positioning of small target objects and the need of a large amount of data in model training in the prior art image recognition method.

Description

Image recognition method and system based on transfer learning and ResNet50 neural network
Technical Field
The invention relates to the technical field of image recognition, in particular to an image recognition method and system based on transfer learning and ResNet50 neural network.
Background
Image recognition refers to the analysis, processing and understanding of an input digital image by a computer, thereby classifying the image, detecting an object, segmenting a scene, or the like. In the prior art, a multi-layer Convolutional Neural Network (CNN) is often adopted to extract and classify the characteristics of the image, so that high-precision identification, such as GoogLeNet, VGGNet, resNet, can be realized on a large data set.
The CNN uses multi-layer convolution operation and nonlinear activation functions to extract the characteristics of input data, so that more abstract and advanced characteristic representation is learned, and the classification and recognition accuracy of the model is improved. However, in practical applications, the following drawbacks exist: CNN is sensitive to deformation conditions of input images, and can influence classification and recognition accuracy of the model; for images with unclear boundaries and uneven illumination, the recognition performance of CNN is poor; at the same time, a large number of data sets are required for the network model training to obtain a reliable model. In addition, due to the small size of the small target object, the information of the small target object in the image is sparse, and the small target object is difficult to capture by the shallow characteristics of the CNN, and the detection of the small target object by the CNN is usually realized by the convolution operation with the small receptive field through the shallow characteristics layer, so that the small target object is difficult to accurately detect and position by the CNN.
Disclosure of Invention
The invention aims to: the invention aims to provide an image recognition method and system based on transfer learning and ResNet50 neural network, which are high in accuracy and speed for recognizing images.
The technical scheme is as follows: in order to achieve the above purpose, the image recognition method based on the transfer learning and ResNet50 neural network of the present invention comprises the following steps:
step S1: constructing a ResNet50 neural network as a reference model, and optimizing and improving the model to obtain an optimized and improved ResNet50 neural network model;
step S2: training the optimized and improved ResNet50 neural network model by adopting a transfer learning mode, and establishing an image pre-recognition model;
step S3: establishing a sample image data set, and preprocessing images in the sample image data set;
step S4: dividing a training set and a testing set for the preprocessed sample image data set by adopting a five-fold cross validation method based on an image pre-recognition model;
step S5: training the image pre-recognition model by utilizing the training set in the step S4, and fine-tuning model parameters again to obtain the image recognition model;
step S6: and (3) testing the image recognition model trained in the step (S5) by using the test set in the step (S4) to obtain an image recognition result.
The step S1 is to select a res Net50 neural network as a reference model, optimize and improve the model, obtain an optimized and improved res Net50 neural network model, that is, improve a Huber loss function, introduce an ECA-Net attention mechanism into the res Net50 neural network, and construct a bidirectional pyramid structure to improve the model, and then select an optimizer to optimize the improved res Net50 neural network, thereby obtaining an optimized and improved res Net50 neural network model, which comprises the following sub-steps:
step S101: the ResNet50 neural network is constructed as a reference model, and comprises five stages, namely:
the first stage: the pixel values of the input image are sequentially output through a convolution layer, a BN layer, a ReLU activation function and a MaxPooling layer;
the second stage to the fifth stage are composed of Bottleneck layers, namely, bottleneck layers Bottleneck, and the Bottleneck layers Bottleneck respectively comprise 3, 4, 6 and 3 Bottleneck layers Bottleneck;
step S102: selecting the improved Huber loss function as a ResNet50 neural network loss function;
the expression of the improved Huber loss function is as follows:
wherein E (x) represents the modified loss function, delta represents the residual critical value, y represents the actual value, and f (x) represents the predicted value;
step S103: introducing an ECA-Net attention mechanism into a ResNet50 neural network to improve;
the ECA-Net attention mechanism generates weights for each channel by one-dimensional convolution of size k, namely:
ω=δ(C1D k (y))
wherein C1D k Representing one-dimensional convolution with a convolution kernel k, y representing the channel, and δ representing the sigmoid activation function; k is related to the channel dimension, the larger the range k of local cross-channel interactions;
the k value is determined by an adaptive function C related to the channel dimension, namely:
in the I odd The values of γ and b are set to 2 and 1, respectively, expressed as the nearest odd number; c is an adaptive function;
step S104: the method comprises the steps that a constructed bidirectional pyramid structure is introduced into a ResNet50 neural network to improve the ResNet50 neural network, and high-resolution shallow features and deep features are fused in a ResNet50 neural network feature layer through the bidirectional pyramid structure;
step S105: and selecting an optimizer to optimize the improved ResNet50 neural network, namely taking a novel Ranger optimizer as an optimizer trained by the improved ResNet50 model, so as to obtain an optimized and improved ResNet50 neural network model.
In the first stage of step S101, the input image pixel value convolution layer calculation process is:
wherein x represents an input image sample pixel value array; padding indicates the number of layers that each side of the input is supplemented with 0, in order to maintain consistent feature map size before and after the convolutional layer,
kernel represents the size of the convolution kernel, stride represents the convolution step size;
the process of calculating the mean value of the characteristic diagram generated by the BN layer on the convolution layer is as follows:
wherein m represents the total amount of input image samples, x i An array of pixel values representing a set of input image samples, i=1, 2, m;
the process of calculating standard deviation of the characteristic diagram generated by the BN layer on the convolution layer is as follows:
the normalization processing process of the BN layer on the characteristic map generated by the convolution layer comprises the following steps:
wherein ε represents the offset;
and carrying out reconstruction change on the normalized characteristic diagram:
y i =γ×x i +β,
wherein, gamma, x i Beta is respectively
The ReLU activation function formula is:
f(x)=max(0,x)
MaxPooling layer: the whole image is divided into a plurality of small blocks with the same size by non-overlapping, and each small block only takes the largest number, and after other nodes are discarded, the original plane structure is maintained to obtain an output result.
In the ResNet50 neural network feature layer described in step S104, the fusion process of the high resolution shallow features and the deep features is as follows:
carrying out pooling operation on the input image, and obtaining a characteristic layer Conv7-2 after carrying out pooling operation; profile P for Conv7-2 7-2 Upsampling to generate a feature map P 'with the same height and width as the Conv6-2 layer feature map' 7-2 The dimension is 10 multiplied by 256; the number of channels in Conv6-2 layer was adjusted to 256 by 1X 1 convolution to generate a feature map P' 6-2 So that the characteristic map P' 7-2 The dimension fused with the side edge remains unchanged; feature map P 'is subjected to Concat feature fusion' 6-2 And P' 7-2 Splicing to generate a feature map P 6-2 The method comprises the steps of carrying out a first treatment on the surface of the After the two up-sampling and side fusion processes, an output characteristic diagram P 'of a pyramid from top to bottom can be obtained at the characteristic layer Conv 4-3' 4-3 The method comprises the steps of carrying out a first treatment on the surface of the Then the channel number of the feature layers pool1, pool2 and pool3 is changed by utilizing 1X 1 convolution, bilinear interpolation is used for downsampling, and feature fusion is carried out by adopting an Add mode, so that a feature map P', which contains position and detail information, is obtained at the feature layer Conv4-3 4-3 And then the characteristic diagram P 4-3 ,P′ 4-3 ,P″ 4-3 Summing the corresponding elements to obtain a final fused characteristic diagram P '' 4-3
The training of the optimized and improved ResNet50 neural network model by adopting a transfer learning mode in the step S2 is performed, and an image pre-recognition model is built, specifically: randomly selecting a large number of pictures from the ImageNet dataset, and dividing the pictures into a training set and a testing set according to the ratio of 4:1;
pre-training the optimized and improved ResNet50 neural network model by using a training set, namely freezing a convolution block close to an input end in the pre-training model, keeping the weight of an initial layer unchanged, training the convolution block and a full-connection classifier which are remained close to an output end by using the training set to obtain a new weight, wherein the new weight is obtained by subtracting a counter-propagating error from the initial weight, reducing the value of the weight when the counter-propagating error is positive, increasing the value of the weight when the counter-propagating error is negative, and obtaining the optimized and improved ResNet50 neural network model after training and fine-tuning the weight value as an image pre-recognition model; and checking the performance of the image pre-recognition model by using a test set, wherein the performance comprises the accuracy rate and the loss rate of image recognition.
The step S3 of establishing a sample image dataset refers to acquiring a plurality of groups of sample images by using an image acquisition device; preprocessing operations are performed on images within the sample image dataset, including cropping, flipping, rotating, and color enhancement operations on the images.
The method for classifying the preprocessed sample image data set into a training set and a testing set by adopting a five-fold cross validation method based on the image pre-recognition model in the step S4 is used for avoiding the over-fitting phenomenon of the model on a specific data set, and specifically comprises the following steps:
dividing the preprocessed sample image data set into five subsets with the same size, sequentially taking one subset as a verification set, taking the other four subsets as training sets, cycling for five times, taking different subsets as the verification sets each time, and finally obtaining an average value of evaluation results of five recognition models; and comparing the data of each group, and taking the group with the best data as a final training set and a test set dividing standard.
The training of the image pre-recognition model by using the training set in the step S4 and fine tuning of model parameters again to obtain the image recognition model in the step S5 refers to inputting the image values in the training set into the image pre-recognition model, obtaining the image predicted value through the internal convolution layer, BN layer, reLU activation function and MaxPooling layer, calculating the deviation between the predicted value and the true value through the Huber loss function, and if the deviation is greater than the set threshold, fine tuning the weight parameters of the res net50 neural network, iterating until the deviation is less than or equal to the threshold, and completing training.
The step S5 of testing the image recognition model trained in the step S5 by using the test set in the step S4 to obtain an image recognition result specifically includes: inputting the image values in the test set into the image recognition model trained in the step S5, adding a flexible maximum exponential function Softmax classifier into the model, and obtaining the probability of each image classification result by the classifier, so that the value range of each probability value is [0,1], and the sum of all probability values is 1, wherein the maximum probability is the final image recognition result.
The invention also provides an image recognition system based on the transfer learning and ResNet50 neural network, which comprises a data processing module, a model training module and a model analysis module;
the data processing module is used for acquiring a plurality of groups of sample images by using the image acquisition equipment and preprocessing the images;
the model training module comprises the steps of training and fine-tuning a ResNet-50 network structure and constructing an image recognition model on the basis of the ResNet-50 network structure;
the model analysis module is used for performing performance analysis on the image recognition model.
The beneficial effects are that: the invention has the following advantages:
1. the image recognition method takes the ResNet50 neural network as a reference model, and introduces the improved Huber loss function, ECA-Net attention mechanism, bidirectional pyramid structure and other technologies to improve and optimize the ResNet50 neural network, so that the performance and robustness of the model are improved, the classification capacity and recognition accuracy of the model are enhanced, and the model has higher recognition degree on small target objects;
2. according to the image recognition method, the Ranger optimizer is selected to optimize the improved reference model, so that the problems of gradient disappearance and gradient explosion can be effectively solved, and the accuracy and training speed of the model are further improved;
3. the image recognition method uses a transfer learning technology, namely, the existing data is utilized to train the model after improvement and optimization, an image pre-recognition model is established, the problem caused by initializing parameters when the model is trained from the beginning is avoided, meanwhile, the problem of insufficient sample number is solved, the model training amount is reduced, and the training efficiency is improved;
4. according to the invention, the performance evaluation is carried out on the image pre-recognition model by adopting a five-fold cross validation method, so that the preprocessed sample image dataset is divided into the test set and the training set, the overfitting of the model on a specific dataset is avoided, and the generalization capability of the model is improved.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
fig. 2 is a schematic flow chart of training an image pre-recognition model.
Detailed Description
The technical scheme of the present invention will be described in detail with reference to the following examples and the accompanying drawings.
As shown in fig. 1, the image recognition method based on the transfer learning and the res net50 neural network of the present invention includes the following steps:
step S1: constructing a ResNet50 neural network as a reference model, and optimizing and improving the model to obtain an optimized and improved ResNet50 neural network model;
step S2: training the optimized and improved ResNet50 neural network model by adopting a transfer learning mode, and establishing an image pre-recognition model;
step S3: establishing a sample image data set, and preprocessing images in the sample image data set;
step S4: dividing a training set and a testing set for the preprocessed sample image data set by adopting a five-fold cross validation method based on an image pre-recognition model;
step S5: training the image pre-recognition model by utilizing the training set in the step S4, and fine-tuning model parameters again to obtain the image recognition model;
step S6: and (3) testing the image recognition model trained in the step (S5) by using the test set in the step (S4) to obtain an image recognition result.
The step S1 is to select a res Net50 neural network as a reference model, optimize and improve the model, obtain an optimized and improved res Net50 neural network model, that is, improve a Huber loss function, introduce an ECA-Net attention mechanism into the res Net50 neural network, and construct a bidirectional pyramid structure to improve the model, and then select an optimizer to optimize the improved res Net50 neural network, thereby obtaining an optimized and improved res Net50 neural network model, which comprises the following sub-steps:
step S101: the ResNet50 neural network is constructed as a reference model, and comprises five stages, namely:
the first stage: the pixel values of the input image are sequentially output through a convolution layer, a BN layer, a ReLU activation function and a MaxPooling layer;
the second stage to the fifth stage are composed of Bottleneck layers, namely, bottleneck layers Bottleneck, and the Bottleneck layers Bottleneck respectively comprise 3, 4, 6 and 3 Bottleneck layers Bottleneck;
step S102: selecting the improved Huber loss function as a ResNet50 neural network loss function;
the expression of the improved Huber loss function is as follows:
wherein E (x) represents the modified loss function, delta represents the residual critical value, y represents the actual value, and f (x) represents the predicted value;
step S103: introducing an ECA-Net attention mechanism into a ResNet50 neural network to improve;
the ECA-Net attention mechanism generates weights for each channel by one-dimensional convolution of size k, namely:
ω=δ(C1D k (y))
wherein C1D k Representing one-dimensional convolution with a convolution kernel k, y representing the channel, and δ representing the sigmoid activation function; k is related to the channel dimension, the larger the range k of local cross-channel interactions;
the k value is determined by an adaptive function C related to the channel dimension, namely:
in the I odd The values of γ and b are set to 2 and 1, respectively, expressed as the nearest odd number; c is an adaptive function;
step S104: the method comprises the steps that a constructed bidirectional pyramid structure is introduced into a ResNet50 neural network to improve the ResNet50 neural network, and high-resolution shallow features and deep features are fused in a ResNet50 neural network feature layer through the bidirectional pyramid structure;
step S105: and selecting an optimizer to optimize the improved ResNet50 neural network, namely taking a novel Ranger optimizer as an optimizer trained by the improved ResNet50 model, so as to obtain an optimized and improved ResNet50 neural network model.
In the first stage of step S101, the input image pixel value convolution layer calculation process is:
wherein x represents an input image sample pixel value array; padding indicates the number of layers that each side of the input is supplemented with 0, in order to maintain consistent feature map size before and after the convolutional layer,
kernel represents the size of the convolution kernel, stride represents the convolution step size;
the process of calculating the mean value of the characteristic diagram generated by the BN layer on the convolution layer is as follows:
wherein m represents the total amount of input image samples, x i An array of pixel values representing a set of input image samples, i=1, 2, m;
the process of calculating standard deviation of the characteristic diagram generated by the BN layer on the convolution layer is as follows:
the normalization processing process of the BN layer on the characteristic map generated by the convolution layer comprises the following steps:
wherein ε represents the offset;
and carrying out reconstruction change on the normalized characteristic diagram:
y i =γ×x i +β,
wherein, gamma, x i Beta is respectively
The ReLU activation function formula is:
f(x)=max(0,x)
MaxPooling layer: the whole image is divided into a plurality of small blocks with the same size by non-overlapping, and each small block only takes the largest number, and after other nodes are discarded, the original plane structure is maintained to obtain an output result.
In the ResNet50 neural network feature layer described in step S104, the fusion process of the high resolution shallow features and the deep features is as follows:
carrying out pooling operation on the input image, and obtaining a characteristic layer Conv7-2 after carrying out pooling operation; profile P for Conv7-2 7-2 Upsampling to generate a feature map P 'with the same height and width as the Conv6-2 layer feature map' 7-2 The dimension is 10 multiplied by 256; the number of channels in Conv6-2 layer was adjusted to 256 by 1X 1 convolution to generate a feature map P' 6-2 So that the characteristic map P' 7-2 The dimension fused with the side edge remains unchanged; feature map P 'is subjected to Concat feature fusion' 6-2 And P' 7-2 Splicing to generate a feature map P 6-2 The method comprises the steps of carrying out a first treatment on the surface of the After the two up-sampling and side fusion processes, an output characteristic diagram P 'of a pyramid from top to bottom can be obtained at the characteristic layer Conv 4-3' 4-3 The method comprises the steps of carrying out a first treatment on the surface of the Then the channel number of the feature layers pool1, pool2 and pool3 is changed by utilizing 1X 1 convolution, bilinear interpolation is used for downsampling, and feature fusion is carried out by adopting an Add mode, so that a feature map P', which contains position and detail information, is obtained at the feature layer Conv4-3 4-3 And then the characteristic diagram P 4-3 ,P′ 4-3 ,P″ 4-3 Summing the corresponding elements to obtain a final fused characteristic diagram P '' 4-3
The training of the optimized and improved ResNet50 neural network model by adopting a transfer learning mode in the step S2 is performed, and an image pre-recognition model is built, specifically: randomly selecting a large number of pictures from the ImageNet dataset, and dividing the pictures into a training set and a testing set according to the ratio of 4:1;
pre-training the optimized and improved ResNet50 neural network model by using a training set, namely freezing a convolution block close to an input end in the pre-training model, keeping the weight of an initial layer unchanged, training the convolution block and a full-connection classifier which are remained close to an output end by using the training set to obtain a new weight, wherein the new weight is obtained by subtracting a counter-propagating error from the initial weight, reducing the value of the weight when the counter-propagating error is positive, increasing the value of the weight when the counter-propagating error is negative, and obtaining the optimized and improved ResNet50 neural network model after training and fine-tuning the weight value as an image pre-recognition model; and checking the performance of the image pre-recognition model by using a test set, wherein the performance comprises the accuracy rate and the loss rate of image recognition.
The step S3 of establishing a sample image dataset refers to acquiring a plurality of groups of sample images by using an image acquisition device; preprocessing operations are performed on images within the sample image dataset, including cropping, flipping, rotating, and color enhancement operations on the images.
The method for classifying the preprocessed sample image data set into a training set and a testing set by adopting a five-fold cross validation method based on the image pre-recognition model in the step S4 is used for avoiding the over-fitting phenomenon of the model on a specific data set, and specifically comprises the following steps:
dividing the preprocessed sample image data set into five subsets with the same size, sequentially taking one subset as a verification set, taking the other four subsets as training sets, cycling for five times, taking different subsets as the verification sets each time, and finally obtaining an average value of evaluation results of five recognition models; and comparing the data of each group, and taking the group with the best data as a final training set and a test set dividing standard.
The training of the image pre-recognition model by using the training set in the step S4 and fine tuning of model parameters again to obtain the image recognition model in the step S5 refers to inputting the image values in the training set into the image pre-recognition model, obtaining the image predicted value through the internal convolution layer, BN layer, reLU activation function and MaxPooling layer, calculating the deviation between the predicted value and the true value through the Huber loss function, and if the deviation is greater than the set threshold, fine tuning the weight parameters of the res net50 neural network, iterating until the deviation is less than or equal to the threshold, and completing training. As shown in fig. 2, a flow chart for training an image recognition model is shown.
The step S5 of testing the image recognition model trained in the step S5 by using the test set in the step S4 to obtain an image recognition result specifically includes: inputting the image values in the test set into the image recognition model trained in the step S5, adding a flexible maximum exponential function Softmax classifier into the model, and obtaining the probability of each image classification result by the classifier, so that the value range of each probability value is [0,1], and the sum of all probability values is 1, wherein the maximum probability is the final image recognition result.
The invention also provides an image recognition system based on the transfer learning and ResNet50 neural network, which comprises a data processing module, a model training module and a model analysis module;
the data processing module is used for acquiring a plurality of groups of sample images by using the image acquisition equipment and preprocessing the images;
the model training module comprises the steps of training and fine-tuning a ResNet-50 network structure and constructing an image recognition model on the basis of the ResNet-50 network structure;
the model analysis module is used for performing performance analysis on the image recognition model.
The image recognition method takes the ResNet50 neural network as a reference model, and introduces the improved Huber loss function, ECA-Net attention mechanism, bidirectional pyramid structure and other technologies to improve and optimize the ResNet50 neural network, so that the performance and robustness of the model are improved, the classification capacity and recognition accuracy of the model are enhanced, and the model has higher recognition degree on small target objects; the Huber loss function can enable the model to be converged more smoothly, so that the training speed is increased, and noise and fluctuation in the training process are reduced; the ECA-Net attention mechanism can help the model to pay attention to important features better, and improves the accuracy and robustness of the model; the bidirectional pyramid structure can realize bidirectional fusion of high-low layer features, and further improve feature extraction and expression capability of the model.
The image recognition method uses a transfer learning technology, namely, the existing data is utilized to train the model after improvement and optimization, an image pre-recognition model is built, the problem caused by initializing parameters when the model is trained from the beginning is avoided, meanwhile, the problem of insufficient sample number is solved, the model training amount is reduced, and the training efficiency is improved.
According to the invention, the performance evaluation is carried out on the image pre-recognition model by adopting a five-fold cross validation method, so that the preprocessed sample image dataset is divided into the test set and the training set, the overfitting of the model on a specific dataset is avoided, and the generalization capability of the model is improved.

Claims (10)

1. An image recognition method based on transfer learning and ResNet50 neural network is characterized by comprising the following steps:
step S1: constructing a ResNet50 neural network as a reference model, and optimizing and improving the model to obtain an optimized and improved ResNet50 neural network model;
step S2: training the optimized and improved ResNet50 neural network model by adopting a transfer learning mode, and establishing an image pre-recognition model;
step S3: establishing a sample image data set, and preprocessing images in the sample image data set;
step S4: dividing a training set and a testing set for the preprocessed sample image data set by adopting a five-fold cross validation method based on an image pre-recognition model;
step S5: training the image pre-recognition model by utilizing the training set in the step S4, and fine-tuning model parameters again to obtain the image recognition model;
step S6: and (3) testing the image recognition model trained in the step (S5) by using the test set in the step (S4) to obtain an image recognition result.
2. The image recognition method based on transfer learning and res Net50 neural network according to claim 1, wherein in step S1, the res Net50 neural network is selected as a reference model, and the model is optimized and improved to obtain an optimized and improved res Net50 neural network model, that is, a Huber loss function is improved, an ECA-Net attention mechanism is introduced into the res Net50 neural network, and a bidirectional pyramid structure is constructed to improve the model, and then an optimizer is selected to optimize the improved res Net50 neural network, thereby obtaining an optimized and improved res Net50 neural network model, comprising the following sub-steps:
step S101: the ResNet50 neural network is constructed as a reference model, and comprises five stages, namely:
the first stage: the pixel values of the input image are sequentially output through a convolution layer, a BN layer, a ReLU activation function and a MaxPooling layer;
the second stage to the fifth stage are composed of Bottleneck layers, namely, bottleneck layers Bottleneck, and the Bottleneck layers Bottleneck respectively comprise 3, 4, 6 and 3 Bottleneck layers Bottleneck;
step S102: selecting the improved Huber loss function as a ResNet50 neural network loss function;
the expression of the improved Huber loss function is as follows:
wherein E (x) represents the modified loss function, delta represents the residual critical value, y represents the actual value, and f (x) represents the predicted value;
step S103: introducing an ECA-Net attention mechanism into a ResNet50 neural network to improve;
the ECA-Net attention mechanism generates weights for each channel by one-dimensional convolution of size k, namely:
ω=δ(C1D k (y))
wherein C1D k Representing one-dimensional convolution with a convolution kernel k, y representing the channel, and δ representing the sigmoid activation function; k is related to the channel dimension, the larger the range k of local cross-channel interactions;
the k value is determined by an adaptive function C related to the channel dimension, namely:
in the I odd The values of γ and b are set to 2 and 1, respectively, expressed as the nearest odd number; c is an adaptive function;
step S104: the method comprises the steps that a constructed bidirectional pyramid structure is introduced into a ResNet50 neural network to improve the ResNet50 neural network, and high-resolution shallow features and deep features are fused in a ResNet50 neural network feature layer through the bidirectional pyramid structure;
step S105: and selecting an optimizer to optimize the improved ResNet50 neural network, namely taking a novel Ranger optimizer as an optimizer trained by the improved ResNet50 model, so as to obtain an optimized and improved ResNet50 neural network model.
3. The image recognition method based on the transfer learning and the ResNet50 neural network according to claim 2, wherein in the first stage of step S101, the input image pixel value convolution layer calculation process is as follows:
wherein x represents an input image sample pixel value array; padding indicates the number of layers that each side of the input is supplemented with 0, in order to maintain consistent feature map size before and after the convolutional layer,
kernel represents the size of the convolution kernel, stride represents the convolution step size;
the process of calculating the mean value of the characteristic diagram generated by the BN layer on the convolution layer is as follows:
wherein m represents the total amount of input image samples, x i An array of pixel values representing a set of input image samples, i=1, 2, m;
the process of calculating standard deviation of the characteristic diagram generated by the BN layer on the convolution layer is as follows:
the normalization processing process of the BN layer on the characteristic map generated by the convolution layer comprises the following steps:
wherein ε represents the offset;
and carrying out reconstruction change on the normalized characteristic diagram:
y i =γ×x i +β,
wherein, gamma, x i Beta is respectively
The ReLU activation function formula is:
f(x)=max(0,x)
MaxPooling layer: the whole image is divided into a plurality of small blocks with the same size by non-overlapping, and each small block only takes the largest number, and after other nodes are discarded, the original plane structure is maintained to obtain an output result.
4. The image recognition method based on transfer learning and a res net50 neural network according to claim 2, wherein in the res net50 neural network feature layer in step S104, a high resolution shallow feature and deep feature fusion process is as follows:
carrying out pooling operation on the input image, and obtaining a characteristic layer Conv7-2 after carrying out pooling operation; profile P for Conv7-2 7-2 Upsampling to generate a feature map P 'with the same height and width as the Conv6-2 layer feature map' 7-2 The dimension is 10 multiplied by 256; the number of channels in Conv6-2 layer was adjusted to 256 by 1X 1 convolution to generate a feature map P' 6-2 So that the characteristic map P' 7-2 The dimension fused with the side edge remains unchanged; feature map P 'is subjected to Concat feature fusion' 6-2 And P' 7-2 Splicing to generate a feature map P 6-2 The method comprises the steps of carrying out a first treatment on the surface of the After the two up-sampling and side fusion processes, an output characteristic diagram P 'of a pyramid from top to bottom can be obtained at the characteristic layer Conv 4-3' 4-3 The method comprises the steps of carrying out a first treatment on the surface of the Then the channel number of the feature layers pool1, pool2 and pool3 is changed by utilizing 1X 1 convolution, bilinear interpolation is used for downsampling, and feature fusion is carried out by adopting an Add mode, so that a feature map P', which contains position and detail information, is obtained at the feature layer Conv4-3 4-3 And then the characteristic diagram P 4-3 ,P′ 4-3 ,P″ 4-3 Summing the corresponding elements to obtain a final fused characteristic diagram P '' 4-3
5. The image recognition method based on transfer learning and res net50 neural network according to claim 1, wherein in step S2, the optimized and improved res net50 neural network model is trained by adopting a transfer learning manner, and an image pre-recognition model is built, which specifically comprises: randomly selecting a large number of pictures from the ImageNet dataset, and dividing the pictures into a training set and a testing set according to the ratio of 4:1;
pre-training the optimized and improved ResNet50 neural network model by using a training set, namely freezing a convolution block close to an input end in the pre-training model, keeping the weight of an initial layer unchanged, training the convolution block and a full-connection classifier which are remained close to an output end by using the training set to obtain a new weight, wherein the new weight is obtained by subtracting a counter-propagating error from the initial weight, reducing the value of the weight when the counter-propagating error is positive, increasing the value of the weight when the counter-propagating error is negative, and obtaining the optimized and improved ResNet50 neural network model after training and fine-tuning the weight value as an image pre-recognition model; and checking the performance of the image pre-recognition model by using a test set, wherein the performance comprises the accuracy rate and the loss rate of image recognition.
6. The image recognition method based on the transfer learning and the ResNet50 neural network according to claim 1, wherein the step S3 of creating the sample image data set is to acquire a plurality of groups of sample images by using an image acquisition device; preprocessing operations are performed on images within the sample image dataset, including cropping, flipping, rotating, and color enhancement operations on the images.
7. The image recognition method based on the transfer learning and the ResNet50 neural network according to claim 1, wherein the image pre-recognition model in step S4 is characterized in that a five-fold cross-validation method is adopted to divide the preprocessed sample image data set into a training set and a testing set so as to avoid the over-fitting phenomenon of the model on a specific data set, and specifically comprises the following steps:
dividing the preprocessed sample image data set into five subsets with the same size, sequentially taking one subset as a verification set, taking the other four subsets as training sets, cycling for five times, taking different subsets as the verification sets each time, and finally obtaining an average value of evaluation results of five recognition models; and comparing the data of each group, and taking the group with the best data as a final training set and a test set dividing standard.
8. The image recognition method based on the transfer learning and the ResNet50 neural network according to claim 1, wherein in the step S5, the training set in the step S4 is utilized to train the image pre-recognition model, fine tuning is performed on model parameters again to obtain the image recognition model, namely, the image values in the training set are input into the image pre-recognition model, the image predicted value is obtained through an internal convolution layer, a BN layer, a ReLU activation function and a MaxPooling layer, the deviation between the predicted value and the true value is calculated through a Huber loss function, if the deviation is larger than a set threshold, the ResNet50 neural network weight parameter is fine-tuned, and repeated iteration is performed until the deviation is smaller than or equal to the threshold, so that training is completed.
9. The image recognition method based on the transfer learning and the ResNet50 neural network according to claim 1, wherein the step S5 is characterized in that the image recognition model trained in the step S5 is tested by using the test set in the step S4 to obtain an image recognition result, and specifically comprises the following steps: inputting the image values in the test set into the image recognition model trained in the step S5, adding a flexible maximum exponential function Softmax classifier into the model, and obtaining the probability of each image classification result by the classifier, so that the value range of each probability value is [0,1], and the sum of all probability values is 1, wherein the maximum probability is the final image recognition result.
10. An image recognition system based on a transfer learning and ResNet50 neural network, which is suitable for the method of any one of claims 1-9, and is characterized by comprising a data processing module, a model training module and a model analysis module;
the data processing module is used for acquiring a plurality of groups of sample images by using the image acquisition equipment and preprocessing the images;
the model training module comprises the steps of training and fine-tuning a ResNet-50 network structure and constructing an image recognition model on the basis of the ResNet-50 network structure;
the model analysis module is used for performing performance analysis on the image recognition model.
CN202310722593.1A 2023-06-19 2023-06-19 Image recognition method and system based on transfer learning and ResNet50 neural network Pending CN116824239A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310722593.1A CN116824239A (en) 2023-06-19 2023-06-19 Image recognition method and system based on transfer learning and ResNet50 neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310722593.1A CN116824239A (en) 2023-06-19 2023-06-19 Image recognition method and system based on transfer learning and ResNet50 neural network

Publications (1)

Publication Number Publication Date
CN116824239A true CN116824239A (en) 2023-09-29

Family

ID=88123494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310722593.1A Pending CN116824239A (en) 2023-06-19 2023-06-19 Image recognition method and system based on transfer learning and ResNet50 neural network

Country Status (1)

Country Link
CN (1) CN116824239A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117130394A (en) * 2023-10-26 2023-11-28 科莱克芯电科技(深圳)有限公司 Photovoltaic equipment control method and system based on artificial intelligence
CN117237930A (en) * 2023-11-13 2023-12-15 成都大学 Etching hardware SEM image identification method based on ResNet and transfer learning
CN117475204A (en) * 2023-10-23 2024-01-30 苏州大学 Chute angle recognition method and system based on deep image learning
CN117557843A (en) * 2023-11-13 2024-02-13 江苏君立华域信息安全技术股份有限公司 Traffic identification method and system based on semi-supervised learning
CN117788957A (en) * 2024-02-23 2024-03-29 广东电网有限责任公司 Deep learning-based qualification image classification method and system
CN118706196A (en) * 2024-08-29 2024-09-27 国网山东省电力公司汶上县供电公司 Method and system for detecting loss of electric gold

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117475204A (en) * 2023-10-23 2024-01-30 苏州大学 Chute angle recognition method and system based on deep image learning
CN117475204B (en) * 2023-10-23 2024-07-09 苏州大学 Chute angle recognition method and system based on deep image learning
CN117130394A (en) * 2023-10-26 2023-11-28 科莱克芯电科技(深圳)有限公司 Photovoltaic equipment control method and system based on artificial intelligence
CN117130394B (en) * 2023-10-26 2024-09-10 科莱克芯电科技(深圳)有限公司 Photovoltaic equipment control method and system based on artificial intelligence
CN117237930A (en) * 2023-11-13 2023-12-15 成都大学 Etching hardware SEM image identification method based on ResNet and transfer learning
CN117557843A (en) * 2023-11-13 2024-02-13 江苏君立华域信息安全技术股份有限公司 Traffic identification method and system based on semi-supervised learning
CN117788957A (en) * 2024-02-23 2024-03-29 广东电网有限责任公司 Deep learning-based qualification image classification method and system
CN117788957B (en) * 2024-02-23 2024-06-07 广东电网有限责任公司 Deep learning-based qualification image classification method and system
CN118706196A (en) * 2024-08-29 2024-09-27 国网山东省电力公司汶上县供电公司 Method and system for detecting loss of electric gold

Similar Documents

Publication Publication Date Title
CN116824239A (en) Image recognition method and system based on transfer learning and ResNet50 neural network
CN110135267B (en) Large-scene SAR image fine target detection method
CN108427920B (en) Edge-sea defense target detection method based on deep learning
CN105138973B (en) The method and apparatus of face authentication
AU2018102037A4 (en) A method of recognition of vehicle type based on deep learning
CN111695467B (en) Spatial spectrum full convolution hyperspectral image classification method based on super-pixel sample expansion
CN103593670B (en) A kind of copper plate/strip detection method of surface flaw based on online limit of sequence learning machine
CN109063724B (en) Enhanced generation type countermeasure network and target sample identification method
CN111680695A (en) Semantic segmentation method based on reverse attention model
CN110930378B (en) Emphysema image processing method and system based on low data demand
CN112766283B (en) Two-phase flow pattern identification method based on multi-scale convolution network
CN113221956B (en) Target identification method and device based on improved multi-scale depth model
CN115410059B (en) Remote sensing image part supervision change detection method and device based on contrast loss
CN114926693A (en) SAR image small sample identification method and device based on weighted distance
CN111639697B (en) Hyperspectral image classification method based on non-repeated sampling and prototype network
CN115311502A (en) Remote sensing image small sample scene classification method based on multi-scale double-flow architecture
CN113344110A (en) Fuzzy image classification method based on super-resolution reconstruction
CN116342536A (en) Aluminum strip surface defect detection method, system and equipment based on lightweight model
CN111310820A (en) Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration
CN111008652A (en) Hyper-spectral remote sensing image classification method based on GAN
CN117132919A (en) Multi-scale high-dimensional feature analysis unsupervised learning video anomaly detection method
Zhao et al. Recognition and Classification of Concrete Cracks under Strong Interference Based on Convolutional Neural Network.
US20230029163A1 (en) Wafer map analysis system using neural network and method of analyzing wafer map using the same
CN115439859A (en) Self-supervision text recognition method based on character moving task
CN111753849A (en) Detection method and system based on compact aggregation feature and cyclic residual learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination