CN110263863B

CN110263863B - Fine-grained fungus phenotype identification method based on transfer learning and bilinear InceptionResNet V2

Info

Publication number: CN110263863B
Application number: CN201910547744.8A
Authority: CN
Inventors: 袁培森; 申成吉; 任守纲; 顾兴健; 车建华; 徐焕良
Original assignee: Nanjing Agricultural University
Current assignee: Nanjing Agricultural University
Priority date: 2019-06-24
Filing date: 2019-06-24
Publication date: 2021-09-10
Anticipated expiration: 2039-06-24
Also published as: CN110263863A

Abstract

The invention discloses a fine-grained fungi phenotype identification method based on migration learning and bilinear InceptionResNetV2, the main steps of which are: (1) establishing a fine-grained fungi phenotype identification model based on migration learning and bilinear; (2) Perform transfer learning and training based on the recognition model; (3) Preprocess the image after inputting the image to the recognition model; (4) Perform feature extraction on the preprocessed image data. The present invention combines the features extracted by two symmetrical InceptionResNetV2 feature extraction networks to obtain more fine-grained features, so that the recognition effect is better; The parameter weights of the feature extraction network are transferred to the fine-grained phenotype dataset of fungi, which can achieve better convergence performance in a shorter training time and make the recognition results better.

Description

Fine-grained fungus phenotype identification method based on transfer learning and bilinear InceptionResNet V2

Technical Field

The invention belongs to the fields of computers, artificial intelligence and image processing, and particularly relates to a fine-grained fungus phenotype identification method based on transfer learning and bilinear InceptionResNet V2.

Background

Fine-grained Image Recognition (Fine-grained Image Recognition) is currently applied to the fields of vehicle type Recognition, bird Recognition and the like, but no product specially used for fungus phenotype Recognition exists at present due to the fact that the number of types of fungi is large, the similarity of different subclasses is high, and the Recognition difficulty is high.

Although some fine-grained image recognition technologies exist in the market at present, the fine-grained phenotype recognition cannot be performed on the fungi. Specifically, the following problems need to be solved:

(1) how to use a model-based transfer learning method to transfer the pre-trained model weight on the ImageNet data set to a fungus fine-grained phenotype recognition model, reduce the required data size and training time, and obtain better initial performance and convergence performance.

(2) How to combine the image features extracted by the two feature extraction networks by using bilinear fusion operation to obtain the features with finer granularity for image recognition.

(3) How to use the InceptionResNet V2 feature extraction network with stronger feature extraction capability to extract the features of the image and obtain better features to perform bilinear fusion operation.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a fine-grained fungus phenotype identification method based on transfer learning and bilinear inclusion ResNet V2, which can be used for training a model according to a fine-grained fungus phenotype data set and identifying different types of fine-grained fungus phenotype images.

In order to achieve the purpose, the invention adopts the technical scheme that: a fine-grained fungus phenotype identification method based on transfer learning and bilinear InceptionResNet V2 comprises the following steps:

step 1, establishing a fine-grained fungus phenotype identification model based on transfer learning and bilinearity;

step 2, performing transfer learning and training based on the recognition model;

step 3, inputting the image into the recognition model and then preprocessing the image;

step 4, extracting the characteristics of the preprocessed image data; the method comprises the steps of extracting feature vectors in an image by adopting an IncepotionResNet V2 feature extraction network with a symmetrical structure, then carrying out bilinear convergence operation on the extracted feature vectors and a self-generated transpose thereof to obtain bilinear feature matrices of each position of the image, converting the bilinear feature matrices into bilinear feature vectors, and finally carrying out multi-classification on the bilinear feature vectors through a full-connection layer and a softmax layer to obtain each class probability.

Further, the preprocessing in step 3 includes centralization, normalization, scaling, random cropping, and random horizontal flipping.

Further, after inputting image data of any size into the network identification model, the average value of the whole data set is firstly subtracted and divided by the standard deviation of the whole data set for centering and normalization, then the image is scaled to 448 pixels on the short side, a square image area of 448 x 448 is cut out from the image by using a random cutting mode, and finally the image is randomly flipped.

Further, in the step 4, an inclusion resnetv2 network in an inclusion series network model is used for feature extraction, and a residual block is added to the inclusion resnetv2 feature extraction network.

Furthermore, the first 7 layers of the inclusion rennet v2 network are composed of three convolution layers, one maximum pooling layer, two convolution layers, and one maximum pooling layer, and then 10 times of the residual inclusion module with three branches, a simpler inclusion module, 20 times of the residual inclusion module with two branches, a 4-branch inclusion module, and 10 times of the residual inclusion module with two branches, and then one convolution layer to obtain an output result.

Further, the bilinear model B is composed of quadruplets, as shown in formula (1),

B＝(f_A,f_B,P,C) (4)

wherein f is_AAnd f_BIs a feature function, P is a pooling function of the model, and C is a classification function of the fungi;

the output features are combined from the features at each location using the outer product of the matrix, as shown in equation (2),

bilinear(L,I,f_A,f_B)＝f_A(L,I)^Tf_B(L,I) (5)

wherein L represents a position and a scale, and I represents a picture; if the dimensions of the extracted features of the two feature functions are (K, M) and (K, N), respectively, the dimensions become (M, N) after bilinear fusion operation, if the features of each position are integrated by using summation pooling, as shown in formula (3),

wherein Φ (I) represents a global picture feature representation;

finally, the bilinear eigenvector x ═ phi (I) is transformed by the square root of the sign

And increasing L2 regularization

And then inputting the result into a classifier to obtain a final classification result.

Further, the training process is divided into two steps:

(1) firstly, fixing InceptionResNet V2 characteristics, extracting pre-training parameters loaded by a network and obtained on an ImageNet data set, and only allowing the parameters of the random initialization of the final full-connection layer to be trained;

(2) after the network is converged, the parameters of the InceptionResNet V2 feature extraction network are resolvable and fine-tuned by using a smaller learning rate.

Further, the overall training process is as follows:

(1) constructing a fine-grained fungus phenotype identification model based on transfer learning and bilinearity, wherein the fine-grained fungus phenotype identification model contains InceptionResNetV2 as a feature extraction network;

(2) initializing an InceptionResNet 2 feature extraction network by using an ImageNet pre-training model, and initializing parameters of a full connection layer by using a Glorot normal initializer;

(3) fixing the parameters of the InceptionResNet V2 feature extraction network, so that the parameter values of the part cannot be updated through back propagation in the subsequent training process;

(4) obtaining training samples after image preprocessing from an input pipeline, wherein the batch size is 8, and the image size is 448 x 448;

(5) inputting the batch training samples obtained in the step (4) into a network model, performing feature extraction and bilinear fusion operation and a full connection layer, and finally calculating the probability of each category through softmax;

(6) calculating a loss value of the network model by using a class cross entropy loss function;

(7) by calculating the gradient value, an SGD optimizer is used, the initial learning rate is set to be 1.0, the learning rate attenuation is 1e-8, the Momentum is set to be 0.9, the error is reversely propagated back to the whole network, and the parameters of the full connection layer are updated;

(8) judging whether the specified iteration times are reached to 100 or the 10 early-stop conditions that the iteration change of the verification loss value is not more than 0.001 are met, if so, determining that the network is converged, and entering the step (9), otherwise, entering the step (4) again;

(9) changing the learning rate of the SGD optimizer to 0.001;

(10) the fixation of the InceptionResNet V2 feature extraction network pre-training parameters is released, so that the network can update the parameter values of the part through back propagation;

(11) obtaining training samples after image preprocessing from an input pipeline, wherein the batch size is 8, and the image size is 448 x 448;

(12) inputting the batch training samples obtained in the step (11) into a network model, performing feature extraction, bilinear fusion operation and a full connection layer, and finally calculating the probability of each category through softmax;

(13) calculating a loss value of the network model by using a class cross entropy loss function;

(14) by calculating the gradient value, an SGD optimizer is used, the initial learning rate is set to be 0.001, the learning rate attenuation is 1e-8, the Momentum is set to be 0.9, the error is reversely propagated back to the whole network, and the parameters of each layer of the network are updated;

(15) judging whether the specified iteration times are 70 or the 10 early-stop conditions that the iteration change of the verification loss value is not more than 0.001 are met, if so, determining that the network is converged, and entering the step (16), otherwise, entering the step (11) again;

(16) and calculating the accuracy, precision, recall rate and F1 value of the network model through the test set.

The invention has the beneficial effects that: (1) and by using bilinear convergence, combining the features extracted by the two symmetrical InceprionResNet 2 feature extraction networks to obtain the features with finer granularity, so that the recognition effect is better. (2) By using the model-based transfer learning training method, the pre-trained feature extraction network parameter weights on the ImageNet data set are transferred to the fungus fine-grained phenotype data set, so that better convergence performance can be achieved in shorter training time, and the recognition result is better.

The method is compared with the results of the accuracy, the precision, the recall rate and the F1 value of fungus fine-grained phenotype data sets by using the symmetric VGG16 model and the symmetric VGG19 model respectively, and is shown in the table 1.

TABLE 1 results

It can be seen from the table that the model for identifying the fine-grained fungus phenotype based on the transfer learning and bilinear using the symmetric inclusion resnetv2 network provided by the invention has the best effect, and achieves the accuracy of 0.90, the accuracy of 0.91, the recall rate of 0.90 and the F1 value of 0.90, and each index is about 2-6% higher than that of other methods.

Drawings

FIG. 1 is a framework of a fine-grained fungus phenotype recognition model of bilinear IncepotionResNet V2.

Fig. 2 is a pre-processing flow diagram.

Fig. 3 is an inclusion module network structure.

Fig. 4 is an inclpetionresnetv 2 overall network structure.

Fig. 5 is a schematic diagram of migration learning.

Fig. 6 is a training flow diagram.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments.

Network model

The invention selects the InceptionResNet V2 network as the feature extraction network in the Bilinear CNN network, and hopefully, the effect of the whole network model is improved by means of stronger feature extraction capability brought by a deeper network.

The fine-grained fungus phenotype identification model based on the transfer learning and the bilinearity is obtained, the whole network structure is shown in fig. 1, after an image is input into the network model, the image is firstly subjected to the preprocessing processes of centralization, normalization, random cutting and random horizontal turning, then the feature vector extracted from the image is obtained through an Inception ResNet V2 feature extraction network with a symmetrical structure, then the extracted feature vector and the self-generated transposition of the extracted feature vector are subjected to bilinear converging operation to obtain a bilinear feature matrix of each position of the image, the bilinear feature matrix is converted into a bilinear feature vector, and finally the bilinear feature vector is subjected to multi-classification through a full-connection layer and a softmax layer to obtain the probability of each category. The names of the fungus categories related in the present invention are: amanita varia, Xerocomus subtutosus, Conocbe antibodies, Cortinarius rubellus, Helvella crispa, Cuphophllus flavipes, Hygrocbe reidii, Inocbe antibodies, Lyophylom fumosum, Russulotictoides, Tricholoma fulvum, Tricholoma scioides, Lycodon utroform, Rhodocollybia butyracea f.

1. Image input and pre-processing

After image data with any size is input into a network model, the average value of the whole data set is firstly subtracted and divided by the standard deviation of the whole data set for centralization and normalization, and the aim is to enable the data to be scaled to be close to the 0 value without changing the distribution of the data, reduce the difference of different samples in the process of calculating the gradient and accelerate the convergence of the network.

The image is then scaled to 448 pixels on the short side and a random cropping is used to crop 448 x 448 square image areas from the image, and finally the image is randomly horizontally flipped. Preprocessing means such as random clipping and random horizontal turning are all used for increasing the diversity of data sets and enabling the network model to have better generalization performance. Due to the characteristics of the fungus data set, the growth direction of the fungus is from bottom to top, so that only horizontal overturning is adopted instead of vertical overturning.

The image color is represented using three channels of RGB, so the pre-processed image data size is 448 x 3, which is then fed into the feature extraction network for processing, the overall pre-processing being shown in fig. 2.

2. Feature extraction network

A feature extraction network based on the migration learning and bilinear fine-grained fungus phenotype identification model is constructed using an InceptionResNet V2 network. As shown in fig. 3, by using a structure (bottleeck Layer) in which a plurality of convolution kernels are processed and recombined in parallel at 4 branches, the width of the network is increased, and the receptivity of the network to different sizes and scales is increased, so that the problems of too many deep neural network parameters, too large computational complexity, gradient diffusion and the like are solved. The reasonable dimensionality decomposition is carried out by using the splitting convolution operation, under the condition that the detail characteristics are not lost in a large amount, the dimensionality decomposition operation can save a plurality of parameter quantities, the calculation consumption is reduced, the convergence speed of the network is accelerated, meanwhile, the depth of the network is further deepened, and the nonlinearity of the network is improved.

The InceptionResNet V2 feature extraction network has the general structure shown in FIG. 4, and by referring to the residual error network of Microsoft, the design of a residual error block is added, so that parameters can be transmitted by skipping layers through shortcuts in some networks. The first 7 layers of the InceptionResNet V2 network are composed of three convolutional layers, one maximum pooling layer, two convolutional layers and one maximum pooling layer, and then 10 times of residual error inclusion modules with three branches are repeated, and the output result is obtained through one convolutional layer, through a simpler Inception module, through 20 times of residual error inclusion modules with two branches, through one 4-branch Inception module, and finally through 10 times of residual error inclusion modules with two branches.

The parameters of the InceptionResNet V2 feature extraction network main layers are shown in Table 1, where only the top 7 convolutional and maximum pooling layers are listed, as well as the merge, convolutional, residual, and last convolutional layers of each residual Inclusion module, followed by the Batch Normalization and ReLU layers. The image starts from the dimension size of 448 x 3 entering the input layer, the depth of the image is increased through continuous convolution, the maximum pooling layer is halved by the dimension of the image, the residual increment module maintains the dimension size of the image unchanged, the length and the width of the image are reduced and the depth is increased every time the residual increment module passes, the dimension size at the final output is 12 x 1536, and the total parameter number is 54336736.

Table 1 inclusion resnetv2 feature extraction network main layer parameters

3. Bilinear fusion and classification

Bilinear means that for a function f (x, y), when one of the parameters, e.g., x, is fixed, the function f (x, y) is linear to the other parameter y. In the present invention, the bilinear model B is composed of quadruplets, as shown in equation (1),

B＝(f_A,f_B,P,C) (7)

wherein f is_AAnd f_BIs a feature function, P is a pooling function of the model, and C is a classification function of the fungus.

The feature function f, i.e., the feature extraction network in the present invention, is to map an input picture and position to features of size c × D, where D is depth. The features output in the present invention are combined from the features at each position using the outer product of the matrix, as shown in equation (2),

bilinear(L,I,f_A,f_B)＝f_A(L,I)^Tf_B(L,I) (8)

where L denotes the position and scale and I denotes the picture. If the dimensions of the extracted features of the two feature functions are (K, M) and (K, N) respectively, the dimensions become (M, N) after the bilinear fusion operation. If summing pooling is used to integrate the characteristics of the various locations, as shown in equation (3),

where Φ (I) represents a global picture feature representation.

And increasing L2 regularization

In the invention, the length and width of the feature extracted by the IncepistionResNet V2 feature extraction network are both 12, and the depth is 1536. Performing bilinear fusion on the feature vectors first requires that the three-dimensional feature vector reshape is a two-dimensional feature vector to obtain a 144 × 1536 feature vector. Then, the eigenvectors are transposed to obtain eigenvectors with dimensions 1536 × 144, and the original eigenvectors and the transposed eigenvectors are used for matrix outer product, namely bilinear fusion operation, to obtain bilinear eigenvectors with dimensions 1536 × 1536. The bilinear eigenvectors are flattened into one-dimensional bilinear eigenvectors of size 2359296, plus a signed square root transform and an L2 regularization layer, followed by multi-classification by softmax using a fully-connected layer with a parameter number of 33030158.

Second, migration learning

In the invention, model-based transfer learning is used, an ImageNet data set with about 1419 thousands of pictures is used as a source domain, the ImageNet data set comprises a plurality of categories, wherein the categories of plants, mushrooms and the like similar to the fungus target task of the invention exist, and model weights pre-trained on the ImageNet data set are transferred onto the fungus data set of the invention, as shown in FIG. 5, so that the required data volume is reduced, higher initial performance, higher training speed and better convergence performance can be obtained.

The pre-training model is obtained from a Keras pre-training model library and then loaded into a fine-grained fungus phenotype identification model based on transfer learning and bilinearity, and the training process is divided into two steps:

(1) the inclusion resnetv2 feature is first fixed to extract the pre-training parameters loaded by the network that are obtained on the ImageNet dataset, allowing only the last full-link layer randomly initialized parameters to be trained.

The reason for fixing the InceptionseResNet V2 network pre-training parameters in the first step is that the added full-connection layer is initialized randomly, a large loss value and a large gradient are generated at the beginning, the pre-trained parameters are easy to break, and therefore the whole model needs to be fine-tuned by using a small learning rate after the full-connection layer converges.

The pre-training model optimizer for transfer learning of the invention uses a Stochastic Gradient Descent (SGD) algorithm as an optimizer,

the general training process is shown in fig. 6, and includes the following specific steps:

(9) changing the learning rate of the SGD optimizer to 0.001;

The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It should be understood by those skilled in the art that the above embodiments do not limit the scope of the present invention in any way, and all technical solutions obtained by using equivalent substitution methods fall within the scope of the present invention.

The parts not involved in the present invention are the same as or can be implemented using the prior art.

Claims

1. a fine-grained fungal phenotype identification method based on migration learning and bilinear InceptionResNetV2, is characterized in that comprising the following steps:

Step 1. Establish a fine-grained fungal phenotype recognition model based on transfer learning and bilinearity;

Step 2. Perform transfer learning and training based on the recognition model;

Step 3. Preprocess the image after inputting the image into the recognition model;

Step 4. Perform feature extraction on the preprocessed image data; use the symmetrical structure InceptionResNetV2 feature extraction network to extract the feature vector in the image, and then perform a bilinear confluence operation on the extracted feature vector and its self-generated transpose to obtain a picture The bilinear feature matrix of each position is converted into a bilinear feature vector, and finally the bilinear feature vector is multi-classified by the fully connected layer followed by the softmax layer to obtain the probability of each category;

In the step 4, the InceptionResNetV2 network in the Inception series network model is used for feature extraction, and a residual block is added to the InceptionResNetV2 feature extraction network; the first 7 layers of the InceptionResNetV2 network are composed of three layers of convolution layers and one layer of maximum pooling. It consists of a layer, a two-layer convolution layer, and a maximum pooling layer, and then repeats the residual Inception module with three branches 10 times, and then passes through a simpler Inception module, and then passes through 20 residuals with two branches. The difference Inception module passes through a 4-branch Inception module, and finally passes through 10 residual Inception modules with two branches, and then passes through a convolutional layer to get the output result;

The training process is divided into two steps:

(1) First, fix the pre-training parameters obtained on the ImageNet dataset loaded by the InceptionResNetV2 feature extraction network, and only allow the training of the parameters randomly initialized by the final fully connected layer;

(2) After the network converges, unfix the parameters of the InceptionResNetV2 feature extraction network, and use a smaller learning rate for fine-tuning;

The overall training process is as follows:

(1) Build a fine-grained fungal phenotype recognition model based on transfer learning and bilinearity, including InceptionResNetV2 as a feature extraction network;

(2) Use the ImageNet pre-training model to initialize the InceptionResNetV2 feature extraction network, and use the Glorot normal initializer to initialize the fully connected layer parameters;

(3) The parameters of the InceptionResNetV2 feature extraction network are fixed, so that the subsequent training process cannot update the parameter values of this part through backpropagation;

(4) Obtain the training samples after image preprocessing from the input pipeline, the batch size is 8, and the image size is 448*448;

(5) Input the batch training samples obtained in (4) into the network model, go through feature extraction, bilinear confluence operation and full connection layer, and finally calculate the probability of each category through softmax;

(6) Use the category cross entropy loss function to calculate the loss value of the network model;

(7) By calculating the gradient value, using the SGD optimizer, set the initial learning rate to 1.0, the learning rate decay to 1e-8, and the momentum Momentum to 0.9, backpropagate the error back to the entire network, and update the parameters of the fully connected layer;

(8) Judging whether the specified number of iterations is 100 or the condition of the early stopping method that the change of the verification loss value in 10 iterations does not exceed 0.001 is met. If so, the network is considered to have converged, and then go to step (9), otherwise, re-enter step (4) ;

(9) Change the learning rate of the SGD optimizer to 0.001;

(10) Unfix the pre-training parameters of the InceptionResNetV2 feature extraction network, so that the network can update the parameter values of this part through backpropagation;

(11) Obtain the training samples after image preprocessing from the input pipeline, the batch size is 8, and the image size is 448*448;

(12) Input the batch training samples obtained in (11) into the network model, go through feature extraction, bilinear convergence operation and fully connected layer, and finally calculate the probability of each category through softmax;

(13) Use the category cross entropy loss function to calculate the loss value of the network model;

(14) By calculating the gradient value, using the SGD optimizer, set the initial learning rate to 0.001, the learning rate decay to 1e-8, and the momentum Momentum to 0.9, backpropagate the error back to the entire network, and update the parameters of each layer of the network ;

(15) Judging whether the specified number of iterations is 70 or the condition of the early stopping method that the change of the verification loss value in 10 iterations does not exceed 0.001 is satisfied. If so, the network is considered to have converged, and then go to step (16), otherwise, re-enter step (11) ;

(16) Calculate the accuracy, precision, recall, and F1 value of the network model through the test set.

2. the fine-grained fungi phenotype identification method based on transfer learning and bilinear InceptionResNetV2 according to claim 1, is characterized in that, the preprocessing in described step 3 comprises centralization, normalization, scaling, randomization Crop and randomly flip horizontally.

3. The fine-grained fungi phenotype identification method based on transfer learning and bilinear InceptionResNetV2 according to claim 2, is characterized in that, after the image data of any size is input into the network identification model, first subtract the average of the entire data set. value and divided by the standard deviation of the entire dataset for centering and normalization, after which the image is scaled to 448 pixels on the short side, and a 448*448 square image area is cropped from the image using random cropping, Finally, the image is randomly flipped horizontally.

4. the fine-grained fungi phenotype identification method based on transfer learning and bilinear InceptionResNetV2 according to claim 1, is characterized in that, bilinear model B is made up of quaternary group, as shown in formula (1),

B=(f _A ,f _B ,P,C) (1)

where f _A and f _B are the feature functions, P is the pooling function of the model, and C is the classification function of fungi;

The output features are combined from the features at each position using the outer product of the matrix, as shown in formula (2),

bilinear(L,I,f _A ,f _B )=f _A (L,I) ^T f _B (L,I) (2)

Among them, L represents the position and scale, and I represents the image; if the dimensions of the extracted features of the two feature functions are (K, M) and (K, N), respectively, after the bilinear bilinear convergence operation, the dimension becomes ( M, N), if the summation pooling is used to synthesize the features of each position, as shown in formula (3),

where Φ(I) represents the global image feature representation;

Finally, the bilinear eigenvector x=Φ(I) is transformed by the symbolic square root

and add L2 regularization

Then enter the classifier to get the final classification result.