CN117011611A - Seed identification method, device and medium based on layered bilinear pooling model - Google Patents
Seed identification method, device and medium based on layered bilinear pooling model Download PDFInfo
- Publication number
- CN117011611A CN117011611A CN202311030530.6A CN202311030530A CN117011611A CN 117011611 A CN117011611 A CN 117011611A CN 202311030530 A CN202311030530 A CN 202311030530A CN 117011611 A CN117011611 A CN 117011611A
- Authority
- CN
- China
- Prior art keywords
- model
- seed
- pooling model
- hierarchical
- image data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000011176 pooling Methods 0.000 title claims abstract description 116
- 238000000034 method Methods 0.000 title claims abstract description 69
- 238000012549 training Methods 0.000 claims abstract description 36
- 238000000605 extraction Methods 0.000 claims description 37
- 239000013598 vector Substances 0.000 claims description 27
- 230000006870 function Effects 0.000 claims description 19
- 238000012360 testing method Methods 0.000 claims description 14
- 238000013507 mapping Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 3
- 241000196324 Embryophyta Species 0.000 description 46
- 239000010410 layer Substances 0.000 description 46
- 241000894007 species Species 0.000 description 18
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 12
- 238000002474 experimental method Methods 0.000 description 12
- 230000000052 comparative effect Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 241000221079 Euphorbia <genus> Species 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 230000001965 increasing effect Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 240000001592 Amaranthus caudatus Species 0.000 description 3
- 235000009328 Amaranthus caudatus Nutrition 0.000 description 3
- 235000012735 amaranth Nutrition 0.000 description 3
- 239000004178 amaranth Substances 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000011229 interlayer Substances 0.000 description 3
- 230000003595 spectral effect Effects 0.000 description 3
- 241000219318 Amaranthus Species 0.000 description 2
- 241000219925 Oenothera Species 0.000 description 2
- 235000004496 Oenothera biennis Nutrition 0.000 description 2
- 241000569924 Pinanga maculata Species 0.000 description 2
- 244000061458 Solanum melongena Species 0.000 description 2
- 235000002597 Solanum melongena Nutrition 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 238000004880 explosion Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 230000009545 invasion Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 241000271566 Aves Species 0.000 description 1
- 241000345998 Calamus manan Species 0.000 description 1
- 241000282472 Canis lupus familiaris Species 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000035784 germination Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000000053 physical method Methods 0.000 description 1
- 235000012950 rattan cane Nutrition 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The application relates to the technical field of seed identification, in particular to a seed identification method, device and medium based on a hierarchical bilinear pooling model. The method comprises the following steps: collecting image data of invasive plant seeds of various categories; acquiring a network structure of ResNet-50, constructing a layered double-linear pooling model by taking ResNet-50 as a main network, inputting the layered double-linear pooling model into image data of invasive plant seeds, and outputting the image data as a seed identification result; training the layered double-linear pooling model by using image data, and fine-tuning the trained layered double-linear pooling model to obtain a seed identification model; and inputting the seed image data to be identified into a seed identification model to obtain a corresponding seed identification result. The accuracy and generalization capability of the model are improved, and the built layered bilinear pooling model is applied to the identification of seed images, so that the identification efficiency of invasive plant seeds is improved, and meanwhile, the classification precision is greatly improved.
Description
Technical Field
The application relates to the technical field of seed identification, in particular to a seed identification method based on a hierarchical bilinear pooling model.
Background
With the rapid development of economic globalization, the invasion situation of the exotic plants is increasingly severe, and the inspection and quarantine work of customs ports is taken as a first gateway for preventing the invasion of the exotic plants, which is the most critical in the control and management of the exotic invaded plants. The main work of customs ports on foreign invasive plant control is to identify and classify invasive plant seeds. However, unlike common fine-grained image recognition, plant seeds may have greater diversity and variability between different species and individuals. This means that seeds of the same species may differ significantly in morphology, while seed morphology between different species may also be very similar, which increases the challenges of customs workers in classifying and identifying invasive plant seeds.
The traditional plant seed classification and identification modes mainly comprise three types: firstly, the expert participates in the visual classification by utilizing a microscope or a scanning electron microscope, and the technical defects of high identification cost, low identification speed and the like are caused by depending on subjective experience of the expert; secondly, the physical method is utilized to carry out classified collection of the invasive plant seeds by measuring the volume, the weight and the like of the invasive plant seeds, but high-precision measuring equipment is needed, and meanwhile, the screening effect is relatively poor and the accuracy is low; thirdly, when the classification cannot be effectively performed according to the appearance of the invasive plant seeds, professional researchers need to construct molecular experiments to extract species DNA by using chemical, biological and other methods so as to perform classification on a genetic level, or perform germination experiments of the invasive plant seeds, and classify according to plant leaves and inflorescences, and although the method has high accuracy, the identification cost is too high and the efficiency is low. In general, the conventional invasive plant seed detection and identification process is complex and time-consuming, and a method for assisting customs staff in rapidly and efficiently identifying invasive plant seeds in a practical application scene is needed.
For this reason, invasive plant seed identification techniques based on hyperspectral techniques have been developed. The method for identifying the invasive plant seeds based on the hyperspectral technology combines spectral information and an image processing technology, uses a hyperspectral sensor or a spectral camera to image the invasive plant seeds, obtains spectral characteristics of the invasive plant seeds and carries out classification identification, improves the efficiency and accuracy of identifying the invasive plant seeds to a certain extent, but realizes the method for identifying the invasive plant seeds by the computer vision technology, has fewer types of identifying the invasive plant seeds, and has low identification accuracy and insufficient classification accuracy for some high-similarity plant seed genera such as amaranthus, euphorbia and the like.
Disclosure of Invention
Aiming at the technical problems, the application provides a seed identification method, device and medium based on a layered bilinear pooling model, aiming at improving the identification efficiency and classification precision of invasive plant seeds.
The application adopts the following technical scheme: the seed identification method based on the hierarchical bilinear pooling model comprises the following steps:
step 101, collecting image data of invasive plant seeds of various categories, and preprocessing the image data;
step 102, acquiring a network structure of ResNet-50, and constructing a layered double-linear pooling model by taking ResNet-50 as a backbone network, wherein the input of the layered double-linear pooling model is image data of invasive plant seeds, and the input of the layered double-linear pooling model is an identification result of the seeds;
step 103, training the layered double-linear pooling model by using the preprocessed image data, and fine-tuning the trained layered double-linear pooling model to obtain a seed identification model;
and 104, inputting the seed image data to be identified into the seed identification model to obtain a corresponding seed identification result.
Wherein, resNet-50 is a deep Residual Network (Residual Network), specifically, resNet-50 is a deep neural Network with 50 convolution layers, wherein a plurality of Residual blocks (Residual blocks) are included, each Residual Block consists of a plurality of convolution layers, and gradient elimination and gradient explosion problems in the deep Network training process are solved by introducing Residual connection (Shortcut Connection). Compared to conventional convolutional neural networks, resNet-50 is innovative in that a jump connection or a shortcut connection is introduced. This way of connection allows the network to learn the residual function by adding the input directly to the output of the network, i.e. the network can preserve the original characteristics while learning the residual. Such a design makes the network more easy to train and can build deeper network structures. ResNet-50 has strong feature extraction capability and expression capability, and can learn rich image features.
The application combines deep learning with fine-grained image recognition, creatively uses ResNet-50 as a backbone network to construct a layered bilinear pooling model, improves the accuracy and generalization capability of the model, and applies the constructed layered bilinear pooling model to recognition of seed images, thereby greatly improving classification precision while improving recognition efficiency of invasive plant seeds.
Preferably, in step 102, the method for constructing the hierarchical bilinear pooling model by taking ResNet-50 as a backbone network comprises the following steps:
removing the full connection layer of the ResNet-50, and taking the treated ResNet-50 as a characteristic extraction network;
and connecting the two parallel feature extraction networks with the input end of the bilinear pooling layer, and connecting the output end of the double-layer linear pooling layer with the output layer to form a network structure of the layered bilinear pooling model.
Wherein the network structure of ResNet-50 can be divided into a plurality of phases, each phase containing a plurality of residual blocks. Specifically, it includes an input convolutional layer, 4 stages (each stage containing multiple residual blocks), a global average pooling layer, and a fully-connected layer. In each stage, the number of output channels of the residual block is gradually increased while the spatial size is halved to gradually extract higher-level features.
Preferably, in step 103, the method for training the hierarchical bilinear pooling model by using the preprocessed image data includes:
step 201, inputting the preprocessed image data into two parallel feature extraction networks in sequence, and outputting features of last three convolution layers of the two feature extraction networks;
step 202, expanding the features output by two feature extraction networks into high-dimensional features through linear mapping by a bilinear pooling layer to obtain two groups of high-dimensional features with the same dimension;
step 203, integrating the two groups of high-dimensional features by a Hadamard product method to obtain a plurality of integrated high-dimensional features, and splicing the plurality of integrated high-dimensional features to generate a feature vector;
in step 204, the feature vector is used as an input of an output layer, the output layer classifies the feature vector by using a softmax activation function, and a classification result of the feature vector is output.
Preferably, the activation function in ResNet-50 as the feature extraction network is a SiLU function.
Preferably, the features of the last three convolution layers output by the two feature extraction networks are respectively siluA1, siluA2, siluA3, siluB1, siluB2 and siluB3, and in step 203, the two sets of high-dimensional features are integrated by a Hadamard method to obtain a plurality of integrated high-dimensional features, and the plurality of integrated high-dimensional features are spliced to generate a feature vector, where the method specifically includes:
C=siluA′3⊙siluB′2+siluA′3⊙siluB′1+siluA′2⊙siluB′1
wherein, C represents the generated eigenvector, and by the Hadamard product operation, the siluA '1, the siluA'2, the siluA '3, the siluB'1, the siluB '2 and the siluB'3 respectively represent the high-dimensional characteristics obtained by expanding the characteristics output by the two characteristic extraction networks through linear mapping;
preferably, the image data of each type of invasive plant seeds is divided into a training set and a test set, and step 103 specifically includes:
training the layered double-linear pooling model by using the preprocessed training set, and fine-tuning the trained layered double-linear pooling model;
and testing the finely tuned layered double linear pooling model by using the preprocessed testing set to obtain the seed identification model.
Preferably, the method for fine tuning the trained hierarchical bilinear pooling model comprises the following steps:
setting the initial value of the learning rate to be 0.01, and fine-tuning the learning rate in a state that other parameters are kept unchanged in the training process of the model, wherein the fine-tuning size is that the learning rate is reduced by 10 times after each training iteration is performed for 40 times;
and repeatedly executing the steps until the training result is converged, and stopping fine adjustment.
A computer device, comprising:
a processor;
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform a seed identification method based on a hierarchical bilinear pooling model as described above via execution of the executable instructions.
A computer-readable storage medium comprising a memory, a storage medium, and a memory,
the computer readable storage medium stores a computer program which, when executed by a processor, implements a seed identification method based on a hierarchical bilinear pooling model as described above.
One of the beneficial technical effects of the application is as follows: by combining deep learning with fine-granularity image recognition, the method creatively uses ResNet-50 as a main network to construct a layered bilinear pooling model, improves the accuracy and generalization capability of the model, and applies the constructed layered bilinear pooling model to the recognition of seed images, thereby greatly improving the classification precision while improving the recognition efficiency of invasive plant seeds.
Other features and advantages of the present application will be disclosed in the following detailed description of the application and the accompanying drawings.
Drawings
The application is further described with reference to the accompanying drawings:
FIG. 1 is a flow chart of a seed identification method based on a hierarchical bilinear pooling model in an embodiment of the application.
FIG. 2 is a flow chart of a method for training a hierarchical bilinear pooling model in accordance with an embodiment of the present application.
FIG. 3 is a schematic diagram of training a hierarchical bilinear pooling model according to an embodiment of the present application.
Fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Wherein: 1. processor, 2, memory.
Detailed Description
The technical solutions of the embodiments of the present application will be explained and illustrated below with reference to the drawings of the embodiments of the present application, but the following embodiments are only preferred embodiments of the present application, and not all embodiments. Based on the examples in the implementation manner, other examples obtained by a person skilled in the art without making creative efforts fall within the protection scope of the present application.
In the following description, directional or positional relationships such as the terms "inner", "outer", "upper", "lower", "left", "right", etc., are presented for convenience in describing the embodiments and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and therefore should not be construed as limiting the application.
The embodiment of the application provides a seed identification method based on a hierarchical bilinear pooling model, referring to fig. 1, comprising the following steps:
and 101, collecting image data of invasive plant seeds of various types, and preprocessing the image data.
The image data of the seeds may be affected by light, shadow, noise and other environmental factors, so that preprocessing of the image data is required to extract effective features, and includes operations of removing noise, adjusting illumination and enhancing contrast, so as to improve the accuracy of subsequent feature extraction.
Step 102, obtaining a network structure of ResNet-50, constructing a layered double-linear pooling model by taking ResNet-50 as a main network, inputting the layered double-linear pooling model into image data of invasive plant seeds, and outputting the image data as a seed identification result.
On the other hand, in this embodiment, the network structure of ResNet-50 may be obtained by loading through an interface provided by an open source deep learning library (such as TensorFlow or PyTorch).
Wherein, resNet-50 is a deep Residual Network (Residual Network), specifically, resNet-50 is a deep neural Network with 50 convolution layers, wherein a plurality of Residual blocks (Residual blocks) are included, each Residual Block consists of a plurality of convolution layers, and gradient elimination and gradient explosion problems in the deep Network training process are solved by introducing Residual connection (Shortcut Connection). Compared to conventional convolutional neural networks, resNet-50 is innovative in that a jump connection or a shortcut connection is introduced. This way of connection allows the network to learn the residual function by adding the input directly to the output of the network, i.e. the network can preserve the original characteristics while learning the residual. Such a design makes the network more easy to train and can build deeper network structures. ResNet-50 has strong feature extraction capability and expression capability, and can learn rich image features.
And step 103, training the layered double-linear pooling model by using the preprocessed image data, and fine-tuning the trained layered double-linear pooling model to obtain a seed identification model.
And 104, inputting the seed image data to be identified into a seed identification model to obtain a corresponding seed identification result.
The purpose of the fine-grained image recognition algorithm is to conduct finer class distinction on coarse-grained large classes, class precision is finer, differences among classes are finer, different classes can be distinguished only through small local differences, and known application fields of fine-grained image recognition include, but are not limited to, specific species classification of animals such as cats, dogs and birds, class classification of flowers and plants and classification of retail goods. However, unlike common fine-grained image recognition, plant seeds may have greater diversity and variability between different species and individuals. This means that seeds of the same species may differ significantly in morphology, while seed morphology may also be very similar between different species, which increases the requirements of plant seed invasive plant seed detection identification for identification accuracy and classification accuracy. Therefore, the embodiment combines deep learning with fine-grained image recognition, creatively builds a layered bilinear pooling model by taking ResNet-50 as a main network, improves the accuracy and generalization capability of the model, and greatly improves the classification precision while improving the recognition efficiency of invasive plant seeds by applying the built layered bilinear pooling model to the recognition of seed images.
On the other hand, in the embodiment, in step 102, the method for constructing the hierarchical bilinear pooling model by using ResNet-50 as the backbone network includes:
removing the full connection layer of the ResNet-50, and taking the treated ResNet-50 as a characteristic extraction network;
and connecting the two parallel feature extraction networks with the input end of the bilinear pooling layer, and connecting the output end of the double-layer linear pooling layer with the output layer to form a network structure of the layered bilinear pooling model.
Wherein the network structure of ResNet-50 can be divided into a plurality of phases, each phase containing a plurality of residual blocks. Specifically, it includes one input convolutional layer, four stages (each stage containing multiple residual blocks), one global average pooling layer, and one fully-connected layer. In each stage, the number of output channels of the residual block is gradually increased while the spatial size is halved to gradually extract higher-level features. The fully connected layer in the ResNet-50 model is typically used for the final classification task, and since we use ResNet-50 as the backbone network for the hierarchical bilinear pooling model, this fully connected layer needs to be removed first for the subsequent bilinear pooling operation.
In another aspect, in this embodiment, the output layer of the hierarchical bilinear pooling model includes a full connection layer and softmax activation function for mapping the final feature vector to the corresponding class or label.
In another aspect, referring to fig. 2, in step 103, a method for training a hierarchical bilinear pooling model using preprocessed image data includes:
step 201, inputting the preprocessed image data into two parallel feature extraction networks in sequence, and outputting features of last three convolution layers of the two feature extraction networks;
by using two features to extract the features of the last three convolution layers of the network, a more discriminative feature representation can be obtained, as these features are processed by multiple convolution layers, they are more abstract, which helps to improve the accuracy and robustness of subsequent classification. By inputting one piece of image data into two feature extraction networks separately, each feature extraction network can be caused to learn a more robust feature representation. Specifically, the two feature extraction networks can learn features through different paths and strategies, each feature extraction network can learn features with different levels, such as edge features, texture features, color features and the like, and the features of the two feature extraction networks are combined to obtain richer information, so that the diversity and the robustness of the features are enhanced. The expression capacity and generalization capacity of the model are improved through the parallel feature learning.
Step 202, the bilinear pooling layer expands the features output by the two feature extraction networks into high-dimensional features through linear mapping, and two groups of high-dimensional features with the same dimension are obtained.
Wherein, the linear mapping is an operation of multiplying the feature vector by a weight matrix, and the linear mapping can increase the dimension of the feature. The linear mapping may be represented as y=x·w+b, where X is the feature representation, W is the weight matrix, b is the bias vector, and Y is the extended high-dimensional feature.
More feature information is provided by extending the feature linear mapping to high-dimensional features.
Step 203, integrating the two groups of high-dimensional features by a Hadamard product method to obtain a plurality of integrated high-dimensional features, and splicing the plurality of integrated high-dimensional features to generate a feature vector;
the interlayer interaction of the local attribute is modeled by integrating the two sets of high-dimensional features by a Hadamard product method. Modeling inter-layer interactions of local properties refers to the ability to better capture relevance and contextual information between features in a neural network by designing mechanisms or layers to facilitate information exchange and interactions between different layers. When local properties are involved, each layer may extract features of different scales or different levels of abstraction. However, there may be some dependencies or dependencies between these features, for example, lower level features may contain local details, while higher level features may capture more global information. Thus, by inter-layer interactions such as hadamard products, features between different layers can be made to interact and provide a richer representation.
In the embodiment, the features output by the two feature extraction networks are mapped and combined in a deeper level through the bilinear pooling layer.
In step 204, the feature vector is used as an input of the output layer, the output layer classifies the feature vector by using the softmax activation function, and the classification result of the feature vector is output.
Among other things, the softmax function is typically used for multi-class classification problems, which converts an input vector into an output vector representing the probability of each class.
The specific implementation manner of classifying the feature vectors by the output layer through the softmax activation function is as follows:
the input feature vector is subjected to a forward propagation process of the network to obtain an original value of an output layer;
the softmax function converts the original numerical value of the output layer into probability distribution representing the probability of each category;
from the probability distribution output by softmax, the category with the highest probability may be selected as the final seed identification result.
On the other hand, in this embodiment, the activation function in ResNet-50, which is the feature extraction network, is a SiLU function.
The SiLU function is not monotonically increasing, and the gradient disappearance problem caused by model training can be effectively relieved by combining the smooth SiLU function with a residual error network structure, so that the classification accuracy of the model is improved to a certain extent.
On the other hand, in this embodiment, the features of the last three convolution layers output by the two feature extraction networks are defined as siluA1, siluA2, siluA3, siluB1, siluB2 and siluB3 respectively, and in step 203, two sets of high-dimensional features are integrated by a hadamard product method to obtain a plurality of integrated high-dimensional features, and the method for generating a feature vector by splicing the plurality of integrated high-dimensional features specifically includes:
C=siluA′3⊙siluB′2+siluA′3⊙siluB′1+siluA′2⊙siluB′1
wherein, C represents the generated eigenvector, and by the Hadamard product operation, the siluA '1, the siluA'2, the siluA '3, the siluB'1, the siluB '2 and the siluB'3 respectively represent the high-dimensional characteristics obtained by expanding the characteristics output by the two characteristic extraction networks through linear mapping;
for example, referring to fig. 3, a specific implementation manner of training the hierarchical bilinear pooling model using the preprocessed image data is:
inputting the preprocessed image data into two parallel feature extraction networks in sequence, taking one piece of image data as an example, inputting one piece of image data into the two feature extraction networks respectively, and expanding the features siluA1, siluA2, siluA3, siluB1, siluB2 and siluB3 with the last three convolution layer dimensions of 512 of the two feature extraction networks ResNet-50 to high latitude 8192 through independent linear mapping to obtain two groups of high-dimensional features siluA '1, siluA'2, siluA '3, siluB'1, siluB '2 and siluB'3 with dimensions of 8192;
the two groups of high-dimensional features are integrated through a Hadamard product method, specifically, the Hadamard product operation is carried out on the siluA '3 and the siluB'2 to obtain an integrated high-dimensional feature, the Hadamard product operation is carried out on the siluA '3 and the siluB'1 to obtain an integrated high-dimensional feature, the Hadamard product operation is carried out on the siluA '2 and the siluB'1 to obtain an integrated high-dimensional feature, and three integrated high-dimensional features are obtained in total and have the same dimension and shape as the two groups of high-dimensional features before the Hadamard product;
splicing the three integrated high-dimensional features to generate a feature vector, namely compressing the high-dimensional features into compact features, wherein the dimension of the generated feature vector is 24576;
the output layer classifies the feature vectors by using a softmax activation function and outputs classification results of the feature vectors.
On the other hand, in this embodiment, the image data of each type of invasive plant seeds is divided into a training set and a testing set, and step 103 specifically includes:
training the layered double-linear pooling model by using the preprocessed training set, and fine-tuning the trained layered double-linear pooling model;
and testing the finely tuned layered double linear pooling model by using the preprocessed testing set to obtain the seed identification model.
Illustratively, image data of invasive plant seeds of each category are processed according to 3:1 is divided into a training set and a test set. In this embodiment, the operation of testing the trimmed layered bilinear pooling model using the preprocessed test set is similar to the operation of training the layered bilinear pooling model using the preprocessed image data, and then comparing the output of the layered bilinear pooling model with the real label of the test set to evaluate the performance of the layered bilinear pooling model. This helps to understand the generalization ability and accuracy of the hierarchical bilinear pooling model and make the necessary adjustments, improvements or comparisons.
On the other hand, in this embodiment, the method for fine tuning the trained hierarchical bilinear pooling model includes:
setting the initial value of the learning rate to be 0.01, and fine-tuning the learning rate in a state that other parameters are kept unchanged in the training process of the model, wherein the fine-tuning size is that the learning rate is reduced by 10 times after each training iteration is performed for 40 times;
and repeatedly executing the steps until the training result is converged, and stopping fine adjustment.
Illustratively, the specific implementation manner of fine tuning the trained hierarchical bilinear pooling model is as follows: setting projection dimension d=8192, momentum is 0.9, weight attenuation is 0.0001, the initial value of learning rate is 0.01, and the learning rate is reduced by 10 times after each training iteration for 40 times under the condition that other parameters are kept unchanged in the training process of the model.
In order to further verify the excellent performance of the seed identification method based on the layered double linear pooling model provided by the embodiment, a large invasive plant seed data set is constructed, and comparison experiments are carried out on the invasive plant seed data set with classical models in the fine grain identification field respectively from three aspects of overall reference performance of the layered double linear pooling model, classification performance of similar species and classification performance of different sized species.
The invasive plant seed dataset contained 33844 pieces of image data from 168 species of 91 genus of 33 family, we used Nikon D850 camera together with LAOWA lens (LW-FF 25mm f/2.8.5-5.0 XULTRA MACRO) to take pictures, and Godox AD200Pro flash lamp to provide auxiliary light source. The resolution of the image is 8256 multiplied by 5504, multiple targets are adopted for simultaneous shooting during shooting, so that the working time is saved, and then the image is intercepted to a sub-image of a single target. In order to achieve a better shooting effect, the selection of the species of the data set is controlled within the range of 0.7-10 mm in length, which covers 95% of invasive plant seeds collected in a laboratory. Each category in the dataset contains 200-210 pieces of data to ensure even distribution of the data, and then the data for each category is processed according to 3:1 is divided into training and testing sets.
Comparative experiment one:
we selected 5 pooling models: the model was tested for overall baseline performance using a fully connected Pooling (FCP, fully Connected Pooling) model, a global average Pooling (GAP, global Average Pooling) model, a Bilinear Pooling (BP) model, a compact Bilinear Pooling (CBP, compact Bilinear Pooling) model, and a layered Bilinear Pooling (HBP, hierarchical Bilinea Pooling) model, we used VGG-16 and ResNet-50 models, respectively, as the backbone networks for the Pooling models, and tested the performance of the five Pooling models on the plant seed dataset.
The experimental results are shown in the following table 1, and the layered bilinear pooling (HBP) model using the res net-50 as the backbone network has an optimal accuracy of 99.12%, and the accuracy of the layered bilinear pooling model using VGG-16 as the backbone network is improved by 0.79%.
Table 1 comparative experiment of overall benchmark Performance for five pooling models
Comparison experiment II:
the method selects image data of 5 genera (amaranthus, euphorbia, eggplant, tiger palm vine and evening primrose) with more species in the data set, continues to perform classification performance experiments of similar species in the genera, and calculates classification error rates of various models in data of different genera. Experimental results as shown in table 2 below, the classification error rates of the pooled models on amaranth and euphorbia were generally higher than those of the other several genera, with FCP models reaching 24.20% and 15.45% error rates in amaranth and euphorbia, and the best performing HBP (res net-50) models also having 2.06% and 3.09% error rates in amaranth and eup. The eggplant genus of the five genera has the lowest error rate, and the error rates of 0 in four models of BP (VGG-16), CBP (VGG-16), HBP (VGG-16) and HBP (ResNet-50). The error rate of the tiger palm rattan genus in each model is generally about 2% -3%, but the FCP model error rate is 8%. Evening primrose has a high error rate in the standard pooling methods FCP and GAP due to the generally smaller seed size, but a relatively low error rate is achieved in the bilinear pooling model in 5, with HBP (ResNet-50) of only 0.8%.
In general, the hierarchical bilinear pooling model with ResNet-50 as the backbone network has the lowest classification error rate among all four genera.
TABLE 2 comparative experiments of classification performance of five pooling models on similar species in genus
Comparison experiment three:
we define the length of the longest side of the seed as the seed size, we define seeds of length less than or equal to 1mm as Small seeds, seeds of length greater than 5mm as Large, and seeds of length between 1-5mm (including 5 mm) as Medium seeds, based on the data distribution of seed sizes. At the relative scale, the small size targets and the large size targets are fewer in number, and the Medium size covers 73.21% of the data. The error rate of each model under three sizes is calculated, and classification performance experiments of different size species are carried out. The experimental results are shown in table 3 below, where almost all model error rates decrease with increasing seed size in the invasive plant seed dataset. In large-sized seed data, the error rate of the other 6 models is less than 1% except for the FCP model which has an error rate of 6.9%. Among them, the BP (VGG-16) method achieves all correct results, with an error rate of only 0.1% for the HBP (VGG-16) and HBP (ResNet-50) models. This should be because the FCP model requires the input image size to be 224, while the input image size of the other models is 448, the smaller input image affecting the model's feature extraction on the seeds. In the small-size dataset, the HBP (ResNet-50) model achieves the optimal accuracy with an error rate of 1.98%. In the medium-size seed dataset, the accuracy of the HBP (ResNet-50) model was still optimal with an error rate of only 0.82%. BP (VGG-16) and CBP (VGG-16) also gave good results with error rates of 1.10% and 1.11%, respectively.
Overall, the hierarchical bilinear pooling model with res net-50 as the backbone network achieved the lowest classification error rates of only 1.98% and 0.82% among the medium and small size seeds. The error rate in large-sized seeds is also only 0.1%.
Table 3 comparative experiments on the classification performance of five pooling models for different sized species
The results of the three comparative experiments are sufficient to show the superior performance of the seed identification method based on the hierarchical bilinear pooling model proposed in this example. In general, the method provided by the embodiment can assist customs workers to effectively intercept inspection and quarantine species, can assist field workers in the national range to distinguish invaded plants outdoors and organize the spread and diffusion of the invaded plants, and has very important significance on ecological environment safety and economic safety.
In another aspect, an embodiment of the present application further provides a computer device, referring to fig. 4, including:
a processor 1;
a memory 2 for storing executable instructions of the processor;
wherein the processor is configured to perform a seed identification method based on a hierarchical bilinear pooling model as described above via execution of the executable instructions.
It should be noted that the description of the embodiment of the method for the computer device described above may also include other implementations. Specific implementation may refer to descriptions of related method embodiments, which are not described herein in detail.
The computer device provided by the specification can also be applied to various data analysis processing systems. The computer device may be a separate server, or may include a server cluster, a system (including a distributed system), software (application), an actual operating device, a logic gate device, a quantum computer, or the like, which uses the method of the embodiment of the present specification, and a terminal device which incorporates necessary implementation hardware.
The processor 1 may be a central processing unit (Central Processing Unit, CPU), and the processor 1 may be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or may be any conventional processor.
Wherein the memory 2 stores program code executable by the processor 1 such that the processor 1 performs the seed identification method based on the hierarchical bilinear pooling model of any of the above-described embodiments of the present specification. The memory 2 may in some embodiments be an internal storage unit of a computer device, such as a hard disk or a memory of the computer device. The memory 2 may in other embodiments also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the computer device. Further, the memory 2 may also include both internal storage units and external storage devices of the computer device.
In another aspect, embodiments of the present application also provide a computer-readable storage medium,
the computer readable storage medium stores a computer program which when executed by a processor implements a seed identification method based on a hierarchical bilinear pooling model as described above.
The computer readable storage medium of the present disclosure may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (Hyper Text Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable storage medium may be contained in the computer device; or may exist alone without being fitted into the computer device.
The foregoing is merely illustrative of the preferred embodiments of the present disclosure and the technical principles applied thereto, and it will be understood by those skilled in the art that the scope of the disclosure is not limited to the specific combination of the technical features described above, but encompasses other technical solutions formed by any combination of the technical features described above or the equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).
Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Claims (9)
1. The seed identification method based on the hierarchical bilinear pooling model is characterized by comprising the following steps of:
step 101, collecting image data of invasive plant seeds of various categories, and preprocessing the image data;
step 102, acquiring a network structure of ResNet-50, and constructing a layered double-linear pooling model by taking ResNet-50 as a backbone network, wherein the input of the layered double-linear pooling model is image data of invasive plant seeds, and the input of the layered double-linear pooling model is an identification result of the seeds;
step 103, training the layered double-linear pooling model by using the preprocessed image data, and fine-tuning the trained layered double-linear pooling model to obtain a seed identification model;
and 104, inputting the seed image data to be identified into the seed identification model to obtain a corresponding seed identification result.
2. The seed identification method based on a hierarchical bilinear pooling model of claim 1,
in step 102, the method for constructing the hierarchical bilinear pooling model by taking ResNet-50 as a backbone network comprises the following steps:
removing the full connection layer of the ResNet-50, and taking the treated ResNet-50 as a characteristic extraction network;
and connecting the two parallel feature extraction networks with the input end of the bilinear pooling layer, and connecting the output end of the double-layer linear pooling layer with the output layer to form a network structure of the layered bilinear pooling model.
3. The seed identification method based on a hierarchical bilinear pooling model of claim 2,
in step 103, the method for training the hierarchical bilinear pooling model by using the preprocessed image data includes:
step 201, inputting the preprocessed image data into two parallel feature extraction networks in sequence, and outputting features of last three convolution layers of the two feature extraction networks;
step 202, expanding the features output by two feature extraction networks into high-dimensional features through linear mapping by a bilinear pooling layer to obtain two groups of high-dimensional features with the same dimension;
step 203, integrating the two groups of high-dimensional features by a Hadamard product method to obtain a plurality of integrated high-dimensional features, and splicing the plurality of integrated high-dimensional features to generate a feature vector;
in step 204, the feature vector is used as an input of an output layer, the output layer classifies the feature vector by using a softmax activation function, and a classification result of the feature vector is output.
4. The seed identification method based on a hierarchical bilinear pooling model of claim 3,
the activation function in ResNet-50 as the feature extraction network is the SiLU function.
5. The seed identification method based on a hierarchical bilinear pooling model of claim 4,
defining that the characteristics of the last three convolution layers output by the two characteristic extraction networks are siluA1, siluA2, siluA3, siluB1, siluB2 and siluB3 respectively, and integrating the two groups of high-dimensional characteristics by a Hadamard product method in step 203 to obtain a plurality of integrated high-dimensional characteristics, and splicing the plurality of integrated high-dimensional characteristics to generate a characteristic vector, wherein the method for generating the characteristic vector comprises the following steps of:
C=siluA′3⊙siluB′2+siluA′3⊙siluB′1+siluA′2⊙siluB′1
wherein C represents the generated eigenvector, and by the Hadamard product operation, siluA '1, siluA'2, siluA '3, siluB'1, siluB '2 and SiluB'3 represent the high-dimensional features obtained by linear mapping expansion of the features output by the two feature extraction networks, respectively.
6. The seed identification method based on a hierarchical bilinear pooling model of claim 1,
the image data of each type of invasive plant seeds is divided into a training set and a test set, and step 103 specifically includes:
training the layered double-linear pooling model by using the preprocessed training set, and fine-tuning the trained layered double-linear pooling model;
and testing the finely tuned layered double linear pooling model by using the preprocessed testing set to obtain the seed identification model.
7. The seed identification method based on a hierarchical bilinear pooling model of claim 6,
the method for fine tuning the trained hierarchical bilinear pooling model comprises the following steps:
setting the initial value of the learning rate to be 0.01, and fine-tuning the learning rate in a state that other parameters are kept unchanged in the training process of the model, wherein the fine-tuning size is that the learning rate is reduced by 10 times after each training iteration is performed for 40 times;
and repeatedly executing the steps until the training result is converged, and stopping fine adjustment.
8. A computer device, comprising:
a processor;
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the hierarchical bilinear pooling model-based seed identification method of any one of claims 1 to 7 via execution of the executable instructions.
9. A computer-readable storage medium comprising,
the computer readable storage medium stores a computer program which, when executed by a processor, implements the hierarchical bilinear pooling model-based seed identification method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311030530.6A CN117011611A (en) | 2023-08-16 | 2023-08-16 | Seed identification method, device and medium based on layered bilinear pooling model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311030530.6A CN117011611A (en) | 2023-08-16 | 2023-08-16 | Seed identification method, device and medium based on layered bilinear pooling model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117011611A true CN117011611A (en) | 2023-11-07 |
Family
ID=88563426
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311030530.6A Pending CN117011611A (en) | 2023-08-16 | 2023-08-16 | Seed identification method, device and medium based on layered bilinear pooling model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117011611A (en) |
-
2023
- 2023-08-16 CN CN202311030530.6A patent/CN117011611A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110188795B (en) | Image classification method, data processing method and device | |
CN110309856A (en) | Image classification method, the training method of neural network and device | |
CN112529146B (en) | Neural network model training method and device | |
Kurtulmuş | Identification of sunflower seeds with deep convolutional neural networks | |
CN111161201B (en) | Infrared and visible light image fusion method based on detail enhancement channel attention | |
CN110222718A (en) | The method and device of image procossing | |
CN113420640A (en) | Mangrove hyperspectral image classification method and device, electronic equipment and storage medium | |
CN116630700A (en) | Remote sensing image classification method based on introduction channel-space attention mechanism | |
Sehree et al. | Olive trees cases classification based on deep convolutional neural network from unmanned aerial vehicle imagery | |
CN118230166A (en) | Corn canopy organ identification method and canopy phenotype detection method based on improved Mask2YOLO network | |
Wang et al. | An ultra-lightweight efficient network for image-based plant disease and pest infection detection | |
Song et al. | Multi-source remote sensing image classification based on two-channel densely connected convolutional networks. | |
Rangarajan et al. | Crop identification and disease classification using traditional machine learning and deep learning approaches | |
Raja et al. | Convolutional Neural Networks based Classification and Detection of Plant Disease | |
Alzhanov et al. | Crop classification using UAV multispectral images with gray-level co-occurrence matrix features | |
Karadeniz et al. | Identification of walnut variety from the leaves using deep learning algorithms | |
CN117011611A (en) | Seed identification method, device and medium based on layered bilinear pooling model | |
Rajeswarappa et al. | Crop Pests Identification based on Fusion CNN Model: A Deep Learning | |
Mall et al. | AMaizeD: An End to End Pipeline for Automatic Maize Disease Detection | |
Jin et al. | Multi-stream aggregation network for fine-grained crop pests and diseases image recognition | |
Youssef et al. | A new method for face recognition based on color information and a neural network | |
Kapoor et al. | Bell-Pepper Leaf Bacterial Spot Detection Using AlexNet and VGG-16 | |
CN106841054B (en) | Seed variety recognition methods and device | |
Anandababu et al. | An effective content based image retrieval model using improved memetic algorithm | |
Momm et al. | Evaluation of the use of spectral and textural information by an evolutionary algorithm for multi-spectral imagery classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |