Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a fine-grained fungus phenotype identification method based on transfer learning and bilinear inclusion ResNet V2, which can be used for training a model according to a fine-grained fungus phenotype data set and identifying different types of fine-grained fungus phenotype images.
In order to achieve the purpose, the invention adopts the technical scheme that: a fine-grained fungus phenotype identification method based on transfer learning and bilinear InceptionResNet V2 comprises the following steps:
step 1, establishing a fine-grained fungus phenotype identification model based on transfer learning and bilinearity;
step 2, performing transfer learning and training based on the recognition model;
step 3, inputting the image into the recognition model and then preprocessing the image;
step 4, extracting the characteristics of the preprocessed image data; the method comprises the steps of extracting feature vectors in an image by adopting an IncepotionResNet V2 feature extraction network with a symmetrical structure, then carrying out bilinear convergence operation on the extracted feature vectors and a self-generated transpose thereof to obtain bilinear feature matrices of each position of the image, converting the bilinear feature matrices into bilinear feature vectors, and finally carrying out multi-classification on the bilinear feature vectors through a full-connection layer and a softmax layer to obtain each class probability.
Further, the preprocessing in step 3 includes centralization, normalization, scaling, random cropping, and random horizontal flipping.
Further, after inputting image data of any size into the network identification model, the average value of the whole data set is firstly subtracted and divided by the standard deviation of the whole data set for centering and normalization, then the image is scaled to 448 pixels on the short side, a square image area of 448 x 448 is cut out from the image by using a random cutting mode, and finally the image is randomly flipped.
Further, in the step 4, an inclusion resnetv2 network in an inclusion series network model is used for feature extraction, and a residual block is added to the inclusion resnetv2 feature extraction network.
Furthermore, the first 7 layers of the inclusion rennet v2 network are composed of three convolution layers, one maximum pooling layer, two convolution layers, and one maximum pooling layer, and then 10 times of the residual inclusion module with three branches, a simpler inclusion module, 20 times of the residual inclusion module with two branches, a 4-branch inclusion module, and 10 times of the residual inclusion module with two branches, and then one convolution layer to obtain an output result.
Further, the bilinear model B is composed of quadruplets, as shown in formula (1),
B=(fA,fB,P,C) (4)
wherein f isAAnd fBIs a feature function, P is a pooling function of the model, and C is a classification function of the fungi;
the output features are combined from the features at each location using the outer product of the matrix, as shown in equation (2),
bilinear(L,I,fA,fB)=fA(L,I)TfB(L,I) (5)
wherein L represents a position and a scale, and I represents a picture; if the dimensions of the extracted features of the two feature functions are (K, M) and (K, N), respectively, the dimensions become (M, N) after bilinear fusion operation, if the features of each position are integrated by using summation pooling, as shown in formula (3),
wherein Φ (I) represents a global picture feature representation;
finally, the bilinear eigenvector x ═ phi (I) is transformed by the square root of the sign
And increasing L2 regularization
And then inputting the result into a classifier to obtain a final classification result.
Further, the training process is divided into two steps:
(1) firstly, fixing InceptionResNet V2 characteristics, extracting pre-training parameters loaded by a network and obtained on an ImageNet data set, and only allowing the parameters of the random initialization of the final full-connection layer to be trained;
(2) after the network is converged, the parameters of the InceptionResNet V2 feature extraction network are resolvable and fine-tuned by using a smaller learning rate.
Further, the overall training process is as follows:
(1) constructing a fine-grained fungus phenotype identification model based on transfer learning and bilinearity, wherein the fine-grained fungus phenotype identification model contains InceptionResNetV2 as a feature extraction network;
(2) initializing an InceptionResNet 2 feature extraction network by using an ImageNet pre-training model, and initializing parameters of a full connection layer by using a Glorot normal initializer;
(3) fixing the parameters of the InceptionResNet V2 feature extraction network, so that the parameter values of the part cannot be updated through back propagation in the subsequent training process;
(4) obtaining training samples after image preprocessing from an input pipeline, wherein the batch size is 8, and the image size is 448 x 448;
(5) inputting the batch training samples obtained in the step (4) into a network model, performing feature extraction and bilinear fusion operation and a full connection layer, and finally calculating the probability of each category through softmax;
(6) calculating a loss value of the network model by using a class cross entropy loss function;
(7) by calculating the gradient value, an SGD optimizer is used, the initial learning rate is set to be 1.0, the learning rate attenuation is 1e-8, the Momentum is set to be 0.9, the error is reversely propagated back to the whole network, and the parameters of the full connection layer are updated;
(8) judging whether the specified iteration times are reached to 100 or the 10 early-stop conditions that the iteration change of the verification loss value is not more than 0.001 are met, if so, determining that the network is converged, and entering the step (9), otherwise, entering the step (4) again;
(9) changing the learning rate of the SGD optimizer to 0.001;
(10) the fixation of the InceptionResNet V2 feature extraction network pre-training parameters is released, so that the network can update the parameter values of the part through back propagation;
(11) obtaining training samples after image preprocessing from an input pipeline, wherein the batch size is 8, and the image size is 448 x 448;
(12) inputting the batch training samples obtained in the step (11) into a network model, performing feature extraction, bilinear fusion operation and a full connection layer, and finally calculating the probability of each category through softmax;
(13) calculating a loss value of the network model by using a class cross entropy loss function;
(14) by calculating the gradient value, an SGD optimizer is used, the initial learning rate is set to be 0.001, the learning rate attenuation is 1e-8, the Momentum is set to be 0.9, the error is reversely propagated back to the whole network, and the parameters of each layer of the network are updated;
(15) judging whether the specified iteration times are 70 or the 10 early-stop conditions that the iteration change of the verification loss value is not more than 0.001 are met, if so, determining that the network is converged, and entering the step (16), otherwise, entering the step (11) again;
(16) and calculating the accuracy, precision, recall rate and F1 value of the network model through the test set.
The invention has the beneficial effects that: (1) and by using bilinear convergence, combining the features extracted by the two symmetrical InceprionResNet 2 feature extraction networks to obtain the features with finer granularity, so that the recognition effect is better. (2) By using the model-based transfer learning training method, the pre-trained feature extraction network parameter weights on the ImageNet data set are transferred to the fungus fine-grained phenotype data set, so that better convergence performance can be achieved in shorter training time, and the recognition result is better.
The method is compared with the results of the accuracy, the precision, the recall rate and the F1 value of fungus fine-grained phenotype data sets by using the symmetric VGG16 model and the symmetric VGG19 model respectively, and is shown in the table 1.
TABLE 1 results
It can be seen from the table that the model for identifying the fine-grained fungus phenotype based on the transfer learning and bilinear using the symmetric inclusion resnetv2 network provided by the invention has the best effect, and achieves the accuracy of 0.90, the accuracy of 0.91, the recall rate of 0.90 and the F1 value of 0.90, and each index is about 2-6% higher than that of other methods.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments.
Network model
The invention selects the InceptionResNet V2 network as the feature extraction network in the Bilinear CNN network, and hopefully, the effect of the whole network model is improved by means of stronger feature extraction capability brought by a deeper network.
The fine-grained fungus phenotype identification model based on the transfer learning and the bilinearity is obtained, the whole network structure is shown in fig. 1, after an image is input into the network model, the image is firstly subjected to the preprocessing processes of centralization, normalization, random cutting and random horizontal turning, then the feature vector extracted from the image is obtained through an Inception ResNet V2 feature extraction network with a symmetrical structure, then the extracted feature vector and the self-generated transposition of the extracted feature vector are subjected to bilinear converging operation to obtain a bilinear feature matrix of each position of the image, the bilinear feature matrix is converted into a bilinear feature vector, and finally the bilinear feature vector is subjected to multi-classification through a full-connection layer and a softmax layer to obtain the probability of each category. The names of the fungus categories related in the present invention are: amanita varia, Xerocomus subtutosus, Conocbe antibodies, Cortinarius rubellus, Helvella crispa, Cuphophllus flavipes, Hygrocbe reidii, Inocbe antibodies, Lyophylom fumosum, Russulotictoides, Tricholoma fulvum, Tricholoma scioides, Lycodon utroform, Rhodocollybia butyracea f.
1. Image input and pre-processing
After image data with any size is input into a network model, the average value of the whole data set is firstly subtracted and divided by the standard deviation of the whole data set for centralization and normalization, and the aim is to enable the data to be scaled to be close to the 0 value without changing the distribution of the data, reduce the difference of different samples in the process of calculating the gradient and accelerate the convergence of the network.
The image is then scaled to 448 pixels on the short side and a random cropping is used to crop 448 x 448 square image areas from the image, and finally the image is randomly horizontally flipped. Preprocessing means such as random clipping and random horizontal turning are all used for increasing the diversity of data sets and enabling the network model to have better generalization performance. Due to the characteristics of the fungus data set, the growth direction of the fungus is from bottom to top, so that only horizontal overturning is adopted instead of vertical overturning.
The image color is represented using three channels of RGB, so the pre-processed image data size is 448 x 3, which is then fed into the feature extraction network for processing, the overall pre-processing being shown in fig. 2.
2. Feature extraction network
A feature extraction network based on the migration learning and bilinear fine-grained fungus phenotype identification model is constructed using an InceptionResNet V2 network. As shown in fig. 3, by using a structure (bottleeck Layer) in which a plurality of convolution kernels are processed and recombined in parallel at 4 branches, the width of the network is increased, and the receptivity of the network to different sizes and scales is increased, so that the problems of too many deep neural network parameters, too large computational complexity, gradient diffusion and the like are solved. The reasonable dimensionality decomposition is carried out by using the splitting convolution operation, under the condition that the detail characteristics are not lost in a large amount, the dimensionality decomposition operation can save a plurality of parameter quantities, the calculation consumption is reduced, the convergence speed of the network is accelerated, meanwhile, the depth of the network is further deepened, and the nonlinearity of the network is improved.
The InceptionResNet V2 feature extraction network has the general structure shown in FIG. 4, and by referring to the residual error network of Microsoft, the design of a residual error block is added, so that parameters can be transmitted by skipping layers through shortcuts in some networks. The first 7 layers of the InceptionResNet V2 network are composed of three convolutional layers, one maximum pooling layer, two convolutional layers and one maximum pooling layer, and then 10 times of residual error inclusion modules with three branches are repeated, and the output result is obtained through one convolutional layer, through a simpler Inception module, through 20 times of residual error inclusion modules with two branches, through one 4-branch Inception module, and finally through 10 times of residual error inclusion modules with two branches.
The parameters of the InceptionResNet V2 feature extraction network main layers are shown in Table 1, where only the top 7 convolutional and maximum pooling layers are listed, as well as the merge, convolutional, residual, and last convolutional layers of each residual Inclusion module, followed by the Batch Normalization and ReLU layers. The image starts from the dimension size of 448 x 3 entering the input layer, the depth of the image is increased through continuous convolution, the maximum pooling layer is halved by the dimension of the image, the residual increment module maintains the dimension size of the image unchanged, the length and the width of the image are reduced and the depth is increased every time the residual increment module passes, the dimension size at the final output is 12 x 1536, and the total parameter number is 54336736.
Table 1 inclusion resnetv2 feature extraction network main layer parameters
3. Bilinear fusion and classification
Bilinear means that for a function f (x, y), when one of the parameters, e.g., x, is fixed, the function f (x, y) is linear to the other parameter y. In the present invention, the bilinear model B is composed of quadruplets, as shown in equation (1),
B=(fA,fB,P,C) (7)
wherein f isAAnd fBIs a feature function, P is a pooling function of the model, and C is a classification function of the fungus.
The feature function f, i.e., the feature extraction network in the present invention, is to map an input picture and position to features of size c × D, where D is depth. The features output in the present invention are combined from the features at each position using the outer product of the matrix, as shown in equation (2),
bilinear(L,I,fA,fB)=fA(L,I)TfB(L,I) (8)
where L denotes the position and scale and I denotes the picture. If the dimensions of the extracted features of the two feature functions are (K, M) and (K, N) respectively, the dimensions become (M, N) after the bilinear fusion operation. If summing pooling is used to integrate the characteristics of the various locations, as shown in equation (3),
where Φ (I) represents a global picture feature representation.
Finally, the bilinear eigenvector x ═ phi (I) is transformed by the square root of the sign
And increasing L2 regularization
And then inputting the result into a classifier to obtain a final classification result.
In the invention, the length and width of the feature extracted by the IncepistionResNet V2 feature extraction network are both 12, and the depth is 1536. Performing bilinear fusion on the feature vectors first requires that the three-dimensional feature vector reshape is a two-dimensional feature vector to obtain a 144 × 1536 feature vector. Then, the eigenvectors are transposed to obtain eigenvectors with dimensions 1536 × 144, and the original eigenvectors and the transposed eigenvectors are used for matrix outer product, namely bilinear fusion operation, to obtain bilinear eigenvectors with dimensions 1536 × 1536. The bilinear eigenvectors are flattened into one-dimensional bilinear eigenvectors of size 2359296, plus a signed square root transform and an L2 regularization layer, followed by multi-classification by softmax using a fully-connected layer with a parameter number of 33030158.
Second, migration learning
In the invention, model-based transfer learning is used, an ImageNet data set with about 1419 thousands of pictures is used as a source domain, the ImageNet data set comprises a plurality of categories, wherein the categories of plants, mushrooms and the like similar to the fungus target task of the invention exist, and model weights pre-trained on the ImageNet data set are transferred onto the fungus data set of the invention, as shown in FIG. 5, so that the required data volume is reduced, higher initial performance, higher training speed and better convergence performance can be obtained.
The pre-training model is obtained from a Keras pre-training model library and then loaded into a fine-grained fungus phenotype identification model based on transfer learning and bilinearity, and the training process is divided into two steps:
(1) the inclusion resnetv2 feature is first fixed to extract the pre-training parameters loaded by the network that are obtained on the ImageNet dataset, allowing only the last full-link layer randomly initialized parameters to be trained.
(2) After the network is converged, the parameters of the InceptionResNet V2 feature extraction network are resolvable and fine-tuned by using a smaller learning rate.
The reason for fixing the InceptionseResNet V2 network pre-training parameters in the first step is that the added full-connection layer is initialized randomly, a large loss value and a large gradient are generated at the beginning, the pre-trained parameters are easy to break, and therefore the whole model needs to be fine-tuned by using a small learning rate after the full-connection layer converges.
The pre-training model optimizer for transfer learning of the invention uses a Stochastic Gradient Descent (SGD) algorithm as an optimizer,
the general training process is shown in fig. 6, and includes the following specific steps:
(1) constructing a fine-grained fungus phenotype identification model based on transfer learning and bilinearity, wherein the fine-grained fungus phenotype identification model contains InceptionResNetV2 as a feature extraction network;
(2) initializing an InceptionResNet 2 feature extraction network by using an ImageNet pre-training model, and initializing parameters of a full connection layer by using a Glorot normal initializer;
(3) fixing the parameters of the InceptionResNet V2 feature extraction network, so that the parameter values of the part cannot be updated through back propagation in the subsequent training process;
(4) obtaining training samples after image preprocessing from an input pipeline, wherein the batch size is 8, and the image size is 448 x 448;
(5) inputting the batch training samples obtained in the step (4) into a network model, performing feature extraction and bilinear fusion operation and a full connection layer, and finally calculating the probability of each category through softmax;
(6) calculating a loss value of the network model by using a class cross entropy loss function;
(7) by calculating the gradient value, an SGD optimizer is used, the initial learning rate is set to be 1.0, the learning rate attenuation is 1e-8, the Momentum is set to be 0.9, the error is reversely propagated back to the whole network, and the parameters of the full connection layer are updated;
(8) judging whether the specified iteration times are reached to 100 or the 10 early-stop conditions that the iteration change of the verification loss value is not more than 0.001 are met, if so, determining that the network is converged, and entering the step (9), otherwise, entering the step (4) again;
(9) changing the learning rate of the SGD optimizer to 0.001;
(10) the fixation of the InceptionResNet V2 feature extraction network pre-training parameters is released, so that the network can update the parameter values of the part through back propagation;
(11) obtaining training samples after image preprocessing from an input pipeline, wherein the batch size is 8, and the image size is 448 x 448;
(12) inputting the batch training samples obtained in the step (11) into a network model, performing feature extraction, bilinear fusion operation and a full connection layer, and finally calculating the probability of each category through softmax;
(13) calculating a loss value of the network model by using a class cross entropy loss function;
(14) by calculating the gradient value, an SGD optimizer is used, the initial learning rate is set to be 0.001, the learning rate attenuation is 1e-8, the Momentum is set to be 0.9, the error is reversely propagated back to the whole network, and the parameters of each layer of the network are updated;
(15) judging whether the specified iteration times are 70 or the 10 early-stop conditions that the iteration change of the verification loss value is not more than 0.001 are met, if so, determining that the network is converged, and entering the step (16), otherwise, entering the step (11) again;
(16) and calculating the accuracy, precision, recall rate and F1 value of the network model through the test set.
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It should be understood by those skilled in the art that the above embodiments do not limit the scope of the present invention in any way, and all technical solutions obtained by using equivalent substitution methods fall within the scope of the present invention.
The parts not involved in the present invention are the same as or can be implemented using the prior art.