CN117011611A - Seed identification method, device and medium based on layered bilinear pooling model - Google Patents

Seed identification method, device and medium based on layered bilinear pooling model Download PDF

Info

Publication number
CN117011611A
CN117011611A CN202311030530.6A CN202311030530A CN117011611A CN 117011611 A CN117011611 A CN 117011611A CN 202311030530 A CN202311030530 A CN 202311030530A CN 117011611 A CN117011611 A CN 117011611A
Authority
CN
China
Prior art keywords
model
seed
pooling model
hierarchical
image data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311030530.6A
Other languages
Chinese (zh)
Inventor
祁哲晨
杨良海
王瑞红
鹿启祥
刘永辉
闫小玲
严靖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI CHENSHAN BOTANICAL GARDEN
Zhejiang Sci Tech University ZSTU
Original Assignee
SHANGHAI CHENSHAN BOTANICAL GARDEN
Zhejiang Sci Tech University ZSTU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI CHENSHAN BOTANICAL GARDEN, Zhejiang Sci Tech University ZSTU filed Critical SHANGHAI CHENSHAN BOTANICAL GARDEN
Priority to CN202311030530.6A priority Critical patent/CN117011611A/en
Publication of CN117011611A publication Critical patent/CN117011611A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the technical field of seed identification, in particular to a seed identification method, device and medium based on a hierarchical bilinear pooling model. The method comprises the following steps: collecting image data of invasive plant seeds of various categories; acquiring a network structure of ResNet-50, constructing a layered double-linear pooling model by taking ResNet-50 as a main network, inputting the layered double-linear pooling model into image data of invasive plant seeds, and outputting the image data as a seed identification result; training the layered double-linear pooling model by using image data, and fine-tuning the trained layered double-linear pooling model to obtain a seed identification model; and inputting the seed image data to be identified into a seed identification model to obtain a corresponding seed identification result. The accuracy and generalization capability of the model are improved, and the built layered bilinear pooling model is applied to the identification of seed images, so that the identification efficiency of invasive plant seeds is improved, and meanwhile, the classification precision is greatly improved.

Description

Seed identification method, device and medium based on layered bilinear pooling model
Technical Field
The application relates to the technical field of seed identification, in particular to a seed identification method based on a hierarchical bilinear pooling model.
Background
With the rapid development of economic globalization, the invasion situation of the exotic plants is increasingly severe, and the inspection and quarantine work of customs ports is taken as a first gateway for preventing the invasion of the exotic plants, which is the most critical in the control and management of the exotic invaded plants. The main work of customs ports on foreign invasive plant control is to identify and classify invasive plant seeds. However, unlike common fine-grained image recognition, plant seeds may have greater diversity and variability between different species and individuals. This means that seeds of the same species may differ significantly in morphology, while seed morphology between different species may also be very similar, which increases the challenges of customs workers in classifying and identifying invasive plant seeds.
The traditional plant seed classification and identification modes mainly comprise three types: firstly, the expert participates in the visual classification by utilizing a microscope or a scanning electron microscope, and the technical defects of high identification cost, low identification speed and the like are caused by depending on subjective experience of the expert; secondly, the physical method is utilized to carry out classified collection of the invasive plant seeds by measuring the volume, the weight and the like of the invasive plant seeds, but high-precision measuring equipment is needed, and meanwhile, the screening effect is relatively poor and the accuracy is low; thirdly, when the classification cannot be effectively performed according to the appearance of the invasive plant seeds, professional researchers need to construct molecular experiments to extract species DNA by using chemical, biological and other methods so as to perform classification on a genetic level, or perform germination experiments of the invasive plant seeds, and classify according to plant leaves and inflorescences, and although the method has high accuracy, the identification cost is too high and the efficiency is low. In general, the conventional invasive plant seed detection and identification process is complex and time-consuming, and a method for assisting customs staff in rapidly and efficiently identifying invasive plant seeds in a practical application scene is needed.
For this reason, invasive plant seed identification techniques based on hyperspectral techniques have been developed. The method for identifying the invasive plant seeds based on the hyperspectral technology combines spectral information and an image processing technology, uses a hyperspectral sensor or a spectral camera to image the invasive plant seeds, obtains spectral characteristics of the invasive plant seeds and carries out classification identification, improves the efficiency and accuracy of identifying the invasive plant seeds to a certain extent, but realizes the method for identifying the invasive plant seeds by the computer vision technology, has fewer types of identifying the invasive plant seeds, and has low identification accuracy and insufficient classification accuracy for some high-similarity plant seed genera such as amaranthus, euphorbia and the like.
Disclosure of Invention
Aiming at the technical problems, the application provides a seed identification method, device and medium based on a layered bilinear pooling model, aiming at improving the identification efficiency and classification precision of invasive plant seeds.
The application adopts the following technical scheme: the seed identification method based on the hierarchical bilinear pooling model comprises the following steps:
step 101, collecting image data of invasive plant seeds of various categories, and preprocessing the image data;
step 102, acquiring a network structure of ResNet-50, and constructing a layered double-linear pooling model by taking ResNet-50 as a backbone network, wherein the input of the layered double-linear pooling model is image data of invasive plant seeds, and the input of the layered double-linear pooling model is an identification result of the seeds;
step 103, training the layered double-linear pooling model by using the preprocessed image data, and fine-tuning the trained layered double-linear pooling model to obtain a seed identification model;
and 104, inputting the seed image data to be identified into the seed identification model to obtain a corresponding seed identification result.
Wherein, resNet-50 is a deep Residual Network (Residual Network), specifically, resNet-50 is a deep neural Network with 50 convolution layers, wherein a plurality of Residual blocks (Residual blocks) are included, each Residual Block consists of a plurality of convolution layers, and gradient elimination and gradient explosion problems in the deep Network training process are solved by introducing Residual connection (Shortcut Connection). Compared to conventional convolutional neural networks, resNet-50 is innovative in that a jump connection or a shortcut connection is introduced. This way of connection allows the network to learn the residual function by adding the input directly to the output of the network, i.e. the network can preserve the original characteristics while learning the residual. Such a design makes the network more easy to train and can build deeper network structures. ResNet-50 has strong feature extraction capability and expression capability, and can learn rich image features.
The application combines deep learning with fine-grained image recognition, creatively uses ResNet-50 as a backbone network to construct a layered bilinear pooling model, improves the accuracy and generalization capability of the model, and applies the constructed layered bilinear pooling model to recognition of seed images, thereby greatly improving classification precision while improving recognition efficiency of invasive plant seeds.
Preferably, in step 102, the method for constructing the hierarchical bilinear pooling model by taking ResNet-50 as a backbone network comprises the following steps:
removing the full connection layer of the ResNet-50, and taking the treated ResNet-50 as a characteristic extraction network;
and connecting the two parallel feature extraction networks with the input end of the bilinear pooling layer, and connecting the output end of the double-layer linear pooling layer with the output layer to form a network structure of the layered bilinear pooling model.
Wherein the network structure of ResNet-50 can be divided into a plurality of phases, each phase containing a plurality of residual blocks. Specifically, it includes an input convolutional layer, 4 stages (each stage containing multiple residual blocks), a global average pooling layer, and a fully-connected layer. In each stage, the number of output channels of the residual block is gradually increased while the spatial size is halved to gradually extract higher-level features.
Preferably, in step 103, the method for training the hierarchical bilinear pooling model by using the preprocessed image data includes:
step 201, inputting the preprocessed image data into two parallel feature extraction networks in sequence, and outputting features of last three convolution layers of the two feature extraction networks;
step 202, expanding the features output by two feature extraction networks into high-dimensional features through linear mapping by a bilinear pooling layer to obtain two groups of high-dimensional features with the same dimension;
step 203, integrating the two groups of high-dimensional features by a Hadamard product method to obtain a plurality of integrated high-dimensional features, and splicing the plurality of integrated high-dimensional features to generate a feature vector;
in step 204, the feature vector is used as an input of an output layer, the output layer classifies the feature vector by using a softmax activation function, and a classification result of the feature vector is output.
Preferably, the activation function in ResNet-50 as the feature extraction network is a SiLU function.
Preferably, the features of the last three convolution layers output by the two feature extraction networks are respectively siluA1, siluA2, siluA3, siluB1, siluB2 and siluB3, and in step 203, the two sets of high-dimensional features are integrated by a Hadamard method to obtain a plurality of integrated high-dimensional features, and the plurality of integrated high-dimensional features are spliced to generate a feature vector, where the method specifically includes:
C=siluA′3⊙siluB′2+siluA′3⊙siluB′1+siluA′2⊙siluB′1
wherein, C represents the generated eigenvector, and by the Hadamard product operation, the siluA '1, the siluA'2, the siluA '3, the siluB'1, the siluB '2 and the siluB'3 respectively represent the high-dimensional characteristics obtained by expanding the characteristics output by the two characteristic extraction networks through linear mapping;
preferably, the image data of each type of invasive plant seeds is divided into a training set and a test set, and step 103 specifically includes:
training the layered double-linear pooling model by using the preprocessed training set, and fine-tuning the trained layered double-linear pooling model;
and testing the finely tuned layered double linear pooling model by using the preprocessed testing set to obtain the seed identification model.
Preferably, the method for fine tuning the trained hierarchical bilinear pooling model comprises the following steps:
setting the initial value of the learning rate to be 0.01, and fine-tuning the learning rate in a state that other parameters are kept unchanged in the training process of the model, wherein the fine-tuning size is that the learning rate is reduced by 10 times after each training iteration is performed for 40 times;
and repeatedly executing the steps until the training result is converged, and stopping fine adjustment.
A computer device, comprising:
a processor;
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform a seed identification method based on a hierarchical bilinear pooling model as described above via execution of the executable instructions.
A computer-readable storage medium comprising a memory, a storage medium, and a memory,
the computer readable storage medium stores a computer program which, when executed by a processor, implements a seed identification method based on a hierarchical bilinear pooling model as described above.
One of the beneficial technical effects of the application is as follows: by combining deep learning with fine-granularity image recognition, the method creatively uses ResNet-50 as a main network to construct a layered bilinear pooling model, improves the accuracy and generalization capability of the model, and applies the constructed layered bilinear pooling model to the recognition of seed images, thereby greatly improving the classification precision while improving the recognition efficiency of invasive plant seeds.
Other features and advantages of the present application will be disclosed in the following detailed description of the application and the accompanying drawings.
Drawings
The application is further described with reference to the accompanying drawings:
FIG. 1 is a flow chart of a seed identification method based on a hierarchical bilinear pooling model in an embodiment of the application.
FIG. 2 is a flow chart of a method for training a hierarchical bilinear pooling model in accordance with an embodiment of the present application.
FIG. 3 is a schematic diagram of training a hierarchical bilinear pooling model according to an embodiment of the present application.
Fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Wherein: 1. processor, 2, memory.
Detailed Description
The technical solutions of the embodiments of the present application will be explained and illustrated below with reference to the drawings of the embodiments of the present application, but the following embodiments are only preferred embodiments of the present application, and not all embodiments. Based on the examples in the implementation manner, other examples obtained by a person skilled in the art without making creative efforts fall within the protection scope of the present application.
In the following description, directional or positional relationships such as the terms "inner", "outer", "upper", "lower", "left", "right", etc., are presented for convenience in describing the embodiments and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and therefore should not be construed as limiting the application.
The embodiment of the application provides a seed identification method based on a hierarchical bilinear pooling model, referring to fig. 1, comprising the following steps:
and 101, collecting image data of invasive plant seeds of various types, and preprocessing the image data.
The image data of the seeds may be affected by light, shadow, noise and other environmental factors, so that preprocessing of the image data is required to extract effective features, and includes operations of removing noise, adjusting illumination and enhancing contrast, so as to improve the accuracy of subsequent feature extraction.
Step 102, obtaining a network structure of ResNet-50, constructing a layered double-linear pooling model by taking ResNet-50 as a main network, inputting the layered double-linear pooling model into image data of invasive plant seeds, and outputting the image data as a seed identification result.
On the other hand, in this embodiment, the network structure of ResNet-50 may be obtained by loading through an interface provided by an open source deep learning library (such as TensorFlow or PyTorch).
Wherein, resNet-50 is a deep Residual Network (Residual Network), specifically, resNet-50 is a deep neural Network with 50 convolution layers, wherein a plurality of Residual blocks (Residual blocks) are included, each Residual Block consists of a plurality of convolution layers, and gradient elimination and gradient explosion problems in the deep Network training process are solved by introducing Residual connection (Shortcut Connection). Compared to conventional convolutional neural networks, resNet-50 is innovative in that a jump connection or a shortcut connection is introduced. This way of connection allows the network to learn the residual function by adding the input directly to the output of the network, i.e. the network can preserve the original characteristics while learning the residual. Such a design makes the network more easy to train and can build deeper network structures. ResNet-50 has strong feature extraction capability and expression capability, and can learn rich image features.
And step 103, training the layered double-linear pooling model by using the preprocessed image data, and fine-tuning the trained layered double-linear pooling model to obtain a seed identification model.
And 104, inputting the seed image data to be identified into a seed identification model to obtain a corresponding seed identification result.
The purpose of the fine-grained image recognition algorithm is to conduct finer class distinction on coarse-grained large classes, class precision is finer, differences among classes are finer, different classes can be distinguished only through small local differences, and known application fields of fine-grained image recognition include, but are not limited to, specific species classification of animals such as cats, dogs and birds, class classification of flowers and plants and classification of retail goods. However, unlike common fine-grained image recognition, plant seeds may have greater diversity and variability between different species and individuals. This means that seeds of the same species may differ significantly in morphology, while seed morphology may also be very similar between different species, which increases the requirements of plant seed invasive plant seed detection identification for identification accuracy and classification accuracy. Therefore, the embodiment combines deep learning with fine-grained image recognition, creatively builds a layered bilinear pooling model by taking ResNet-50 as a main network, improves the accuracy and generalization capability of the model, and greatly improves the classification precision while improving the recognition efficiency of invasive plant seeds by applying the built layered bilinear pooling model to the recognition of seed images.
On the other hand, in the embodiment, in step 102, the method for constructing the hierarchical bilinear pooling model by using ResNet-50 as the backbone network includes:
removing the full connection layer of the ResNet-50, and taking the treated ResNet-50 as a characteristic extraction network;
and connecting the two parallel feature extraction networks with the input end of the bilinear pooling layer, and connecting the output end of the double-layer linear pooling layer with the output layer to form a network structure of the layered bilinear pooling model.
Wherein the network structure of ResNet-50 can be divided into a plurality of phases, each phase containing a plurality of residual blocks. Specifically, it includes one input convolutional layer, four stages (each stage containing multiple residual blocks), one global average pooling layer, and one fully-connected layer. In each stage, the number of output channels of the residual block is gradually increased while the spatial size is halved to gradually extract higher-level features. The fully connected layer in the ResNet-50 model is typically used for the final classification task, and since we use ResNet-50 as the backbone network for the hierarchical bilinear pooling model, this fully connected layer needs to be removed first for the subsequent bilinear pooling operation.
In another aspect, in this embodiment, the output layer of the hierarchical bilinear pooling model includes a full connection layer and softmax activation function for mapping the final feature vector to the corresponding class or label.
In another aspect, referring to fig. 2, in step 103, a method for training a hierarchical bilinear pooling model using preprocessed image data includes:
step 201, inputting the preprocessed image data into two parallel feature extraction networks in sequence, and outputting features of last three convolution layers of the two feature extraction networks;
by using two features to extract the features of the last three convolution layers of the network, a more discriminative feature representation can be obtained, as these features are processed by multiple convolution layers, they are more abstract, which helps to improve the accuracy and robustness of subsequent classification. By inputting one piece of image data into two feature extraction networks separately, each feature extraction network can be caused to learn a more robust feature representation. Specifically, the two feature extraction networks can learn features through different paths and strategies, each feature extraction network can learn features with different levels, such as edge features, texture features, color features and the like, and the features of the two feature extraction networks are combined to obtain richer information, so that the diversity and the robustness of the features are enhanced. The expression capacity and generalization capacity of the model are improved through the parallel feature learning.
Step 202, the bilinear pooling layer expands the features output by the two feature extraction networks into high-dimensional features through linear mapping, and two groups of high-dimensional features with the same dimension are obtained.
Wherein, the linear mapping is an operation of multiplying the feature vector by a weight matrix, and the linear mapping can increase the dimension of the feature. The linear mapping may be represented as y=x·w+b, where X is the feature representation, W is the weight matrix, b is the bias vector, and Y is the extended high-dimensional feature.
More feature information is provided by extending the feature linear mapping to high-dimensional features.
Step 203, integrating the two groups of high-dimensional features by a Hadamard product method to obtain a plurality of integrated high-dimensional features, and splicing the plurality of integrated high-dimensional features to generate a feature vector;
the interlayer interaction of the local attribute is modeled by integrating the two sets of high-dimensional features by a Hadamard product method. Modeling inter-layer interactions of local properties refers to the ability to better capture relevance and contextual information between features in a neural network by designing mechanisms or layers to facilitate information exchange and interactions between different layers. When local properties are involved, each layer may extract features of different scales or different levels of abstraction. However, there may be some dependencies or dependencies between these features, for example, lower level features may contain local details, while higher level features may capture more global information. Thus, by inter-layer interactions such as hadamard products, features between different layers can be made to interact and provide a richer representation.
In the embodiment, the features output by the two feature extraction networks are mapped and combined in a deeper level through the bilinear pooling layer.
In step 204, the feature vector is used as an input of the output layer, the output layer classifies the feature vector by using the softmax activation function, and the classification result of the feature vector is output.
Among other things, the softmax function is typically used for multi-class classification problems, which converts an input vector into an output vector representing the probability of each class.
The specific implementation manner of classifying the feature vectors by the output layer through the softmax activation function is as follows:
the input feature vector is subjected to a forward propagation process of the network to obtain an original value of an output layer;
the softmax function converts the original numerical value of the output layer into probability distribution representing the probability of each category;
from the probability distribution output by softmax, the category with the highest probability may be selected as the final seed identification result.
On the other hand, in this embodiment, the activation function in ResNet-50, which is the feature extraction network, is a SiLU function.
The SiLU function is not monotonically increasing, and the gradient disappearance problem caused by model training can be effectively relieved by combining the smooth SiLU function with a residual error network structure, so that the classification accuracy of the model is improved to a certain extent.
On the other hand, in this embodiment, the features of the last three convolution layers output by the two feature extraction networks are defined as siluA1, siluA2, siluA3, siluB1, siluB2 and siluB3 respectively, and in step 203, two sets of high-dimensional features are integrated by a hadamard product method to obtain a plurality of integrated high-dimensional features, and the method for generating a feature vector by splicing the plurality of integrated high-dimensional features specifically includes:
C=siluA′3⊙siluB′2+siluA′3⊙siluB′1+siluA′2⊙siluB′1
wherein, C represents the generated eigenvector, and by the Hadamard product operation, the siluA '1, the siluA'2, the siluA '3, the siluB'1, the siluB '2 and the siluB'3 respectively represent the high-dimensional characteristics obtained by expanding the characteristics output by the two characteristic extraction networks through linear mapping;
for example, referring to fig. 3, a specific implementation manner of training the hierarchical bilinear pooling model using the preprocessed image data is:
inputting the preprocessed image data into two parallel feature extraction networks in sequence, taking one piece of image data as an example, inputting one piece of image data into the two feature extraction networks respectively, and expanding the features siluA1, siluA2, siluA3, siluB1, siluB2 and siluB3 with the last three convolution layer dimensions of 512 of the two feature extraction networks ResNet-50 to high latitude 8192 through independent linear mapping to obtain two groups of high-dimensional features siluA '1, siluA'2, siluA '3, siluB'1, siluB '2 and siluB'3 with dimensions of 8192;
the two groups of high-dimensional features are integrated through a Hadamard product method, specifically, the Hadamard product operation is carried out on the siluA '3 and the siluB'2 to obtain an integrated high-dimensional feature, the Hadamard product operation is carried out on the siluA '3 and the siluB'1 to obtain an integrated high-dimensional feature, the Hadamard product operation is carried out on the siluA '2 and the siluB'1 to obtain an integrated high-dimensional feature, and three integrated high-dimensional features are obtained in total and have the same dimension and shape as the two groups of high-dimensional features before the Hadamard product;
splicing the three integrated high-dimensional features to generate a feature vector, namely compressing the high-dimensional features into compact features, wherein the dimension of the generated feature vector is 24576;
the output layer classifies the feature vectors by using a softmax activation function and outputs classification results of the feature vectors.
On the other hand, in this embodiment, the image data of each type of invasive plant seeds is divided into a training set and a testing set, and step 103 specifically includes:
training the layered double-linear pooling model by using the preprocessed training set, and fine-tuning the trained layered double-linear pooling model;
and testing the finely tuned layered double linear pooling model by using the preprocessed testing set to obtain the seed identification model.
Illustratively, image data of invasive plant seeds of each category are processed according to 3:1 is divided into a training set and a test set. In this embodiment, the operation of testing the trimmed layered bilinear pooling model using the preprocessed test set is similar to the operation of training the layered bilinear pooling model using the preprocessed image data, and then comparing the output of the layered bilinear pooling model with the real label of the test set to evaluate the performance of the layered bilinear pooling model. This helps to understand the generalization ability and accuracy of the hierarchical bilinear pooling model and make the necessary adjustments, improvements or comparisons.
On the other hand, in this embodiment, the method for fine tuning the trained hierarchical bilinear pooling model includes:
setting the initial value of the learning rate to be 0.01, and fine-tuning the learning rate in a state that other parameters are kept unchanged in the training process of the model, wherein the fine-tuning size is that the learning rate is reduced by 10 times after each training iteration is performed for 40 times;
and repeatedly executing the steps until the training result is converged, and stopping fine adjustment.
Illustratively, the specific implementation manner of fine tuning the trained hierarchical bilinear pooling model is as follows: setting projection dimension d=8192, momentum is 0.9, weight attenuation is 0.0001, the initial value of learning rate is 0.01, and the learning rate is reduced by 10 times after each training iteration for 40 times under the condition that other parameters are kept unchanged in the training process of the model.
In order to further verify the excellent performance of the seed identification method based on the layered double linear pooling model provided by the embodiment, a large invasive plant seed data set is constructed, and comparison experiments are carried out on the invasive plant seed data set with classical models in the fine grain identification field respectively from three aspects of overall reference performance of the layered double linear pooling model, classification performance of similar species and classification performance of different sized species.
The invasive plant seed dataset contained 33844 pieces of image data from 168 species of 91 genus of 33 family, we used Nikon D850 camera together with LAOWA lens (LW-FF 25mm f/2.8.5-5.0 XULTRA MACRO) to take pictures, and Godox AD200Pro flash lamp to provide auxiliary light source. The resolution of the image is 8256 multiplied by 5504, multiple targets are adopted for simultaneous shooting during shooting, so that the working time is saved, and then the image is intercepted to a sub-image of a single target. In order to achieve a better shooting effect, the selection of the species of the data set is controlled within the range of 0.7-10 mm in length, which covers 95% of invasive plant seeds collected in a laboratory. Each category in the dataset contains 200-210 pieces of data to ensure even distribution of the data, and then the data for each category is processed according to 3:1 is divided into training and testing sets.
Comparative experiment one:
we selected 5 pooling models: the model was tested for overall baseline performance using a fully connected Pooling (FCP, fully Connected Pooling) model, a global average Pooling (GAP, global Average Pooling) model, a Bilinear Pooling (BP) model, a compact Bilinear Pooling (CBP, compact Bilinear Pooling) model, and a layered Bilinear Pooling (HBP, hierarchical Bilinea Pooling) model, we used VGG-16 and ResNet-50 models, respectively, as the backbone networks for the Pooling models, and tested the performance of the five Pooling models on the plant seed dataset.
The experimental results are shown in the following table 1, and the layered bilinear pooling (HBP) model using the res net-50 as the backbone network has an optimal accuracy of 99.12%, and the accuracy of the layered bilinear pooling model using VGG-16 as the backbone network is improved by 0.79%.
Table 1 comparative experiment of overall benchmark Performance for five pooling models
Comparison experiment II:
the method selects image data of 5 genera (amaranthus, euphorbia, eggplant, tiger palm vine and evening primrose) with more species in the data set, continues to perform classification performance experiments of similar species in the genera, and calculates classification error rates of various models in data of different genera. Experimental results as shown in table 2 below, the classification error rates of the pooled models on amaranth and euphorbia were generally higher than those of the other several genera, with FCP models reaching 24.20% and 15.45% error rates in amaranth and euphorbia, and the best performing HBP (res net-50) models also having 2.06% and 3.09% error rates in amaranth and eup. The eggplant genus of the five genera has the lowest error rate, and the error rates of 0 in four models of BP (VGG-16), CBP (VGG-16), HBP (VGG-16) and HBP (ResNet-50). The error rate of the tiger palm rattan genus in each model is generally about 2% -3%, but the FCP model error rate is 8%. Evening primrose has a high error rate in the standard pooling methods FCP and GAP due to the generally smaller seed size, but a relatively low error rate is achieved in the bilinear pooling model in 5, with HBP (ResNet-50) of only 0.8%.
In general, the hierarchical bilinear pooling model with ResNet-50 as the backbone network has the lowest classification error rate among all four genera.
TABLE 2 comparative experiments of classification performance of five pooling models on similar species in genus
Comparison experiment three:
we define the length of the longest side of the seed as the seed size, we define seeds of length less than or equal to 1mm as Small seeds, seeds of length greater than 5mm as Large, and seeds of length between 1-5mm (including 5 mm) as Medium seeds, based on the data distribution of seed sizes. At the relative scale, the small size targets and the large size targets are fewer in number, and the Medium size covers 73.21% of the data. The error rate of each model under three sizes is calculated, and classification performance experiments of different size species are carried out. The experimental results are shown in table 3 below, where almost all model error rates decrease with increasing seed size in the invasive plant seed dataset. In large-sized seed data, the error rate of the other 6 models is less than 1% except for the FCP model which has an error rate of 6.9%. Among them, the BP (VGG-16) method achieves all correct results, with an error rate of only 0.1% for the HBP (VGG-16) and HBP (ResNet-50) models. This should be because the FCP model requires the input image size to be 224, while the input image size of the other models is 448, the smaller input image affecting the model's feature extraction on the seeds. In the small-size dataset, the HBP (ResNet-50) model achieves the optimal accuracy with an error rate of 1.98%. In the medium-size seed dataset, the accuracy of the HBP (ResNet-50) model was still optimal with an error rate of only 0.82%. BP (VGG-16) and CBP (VGG-16) also gave good results with error rates of 1.10% and 1.11%, respectively.
Overall, the hierarchical bilinear pooling model with res net-50 as the backbone network achieved the lowest classification error rates of only 1.98% and 0.82% among the medium and small size seeds. The error rate in large-sized seeds is also only 0.1%.
Table 3 comparative experiments on the classification performance of five pooling models for different sized species
The results of the three comparative experiments are sufficient to show the superior performance of the seed identification method based on the hierarchical bilinear pooling model proposed in this example. In general, the method provided by the embodiment can assist customs workers to effectively intercept inspection and quarantine species, can assist field workers in the national range to distinguish invaded plants outdoors and organize the spread and diffusion of the invaded plants, and has very important significance on ecological environment safety and economic safety.
In another aspect, an embodiment of the present application further provides a computer device, referring to fig. 4, including:
a processor 1;
a memory 2 for storing executable instructions of the processor;
wherein the processor is configured to perform a seed identification method based on a hierarchical bilinear pooling model as described above via execution of the executable instructions.
It should be noted that the description of the embodiment of the method for the computer device described above may also include other implementations. Specific implementation may refer to descriptions of related method embodiments, which are not described herein in detail.
The computer device provided by the specification can also be applied to various data analysis processing systems. The computer device may be a separate server, or may include a server cluster, a system (including a distributed system), software (application), an actual operating device, a logic gate device, a quantum computer, or the like, which uses the method of the embodiment of the present specification, and a terminal device which incorporates necessary implementation hardware.
The processor 1 may be a central processing unit (Central Processing Unit, CPU), and the processor 1 may be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or may be any conventional processor.
Wherein the memory 2 stores program code executable by the processor 1 such that the processor 1 performs the seed identification method based on the hierarchical bilinear pooling model of any of the above-described embodiments of the present specification. The memory 2 may in some embodiments be an internal storage unit of a computer device, such as a hard disk or a memory of the computer device. The memory 2 may in other embodiments also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the computer device. Further, the memory 2 may also include both internal storage units and external storage devices of the computer device.
In another aspect, embodiments of the present application also provide a computer-readable storage medium,
the computer readable storage medium stores a computer program which when executed by a processor implements a seed identification method based on a hierarchical bilinear pooling model as described above.
The computer readable storage medium of the present disclosure may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (Hyper Text Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable storage medium may be contained in the computer device; or may exist alone without being fitted into the computer device.
The foregoing is merely illustrative of the preferred embodiments of the present disclosure and the technical principles applied thereto, and it will be understood by those skilled in the art that the scope of the disclosure is not limited to the specific combination of the technical features described above, but encompasses other technical solutions formed by any combination of the technical features described above or the equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).
Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Claims (9)

1. The seed identification method based on the hierarchical bilinear pooling model is characterized by comprising the following steps of:
step 101, collecting image data of invasive plant seeds of various categories, and preprocessing the image data;
step 102, acquiring a network structure of ResNet-50, and constructing a layered double-linear pooling model by taking ResNet-50 as a backbone network, wherein the input of the layered double-linear pooling model is image data of invasive plant seeds, and the input of the layered double-linear pooling model is an identification result of the seeds;
step 103, training the layered double-linear pooling model by using the preprocessed image data, and fine-tuning the trained layered double-linear pooling model to obtain a seed identification model;
and 104, inputting the seed image data to be identified into the seed identification model to obtain a corresponding seed identification result.
2. The seed identification method based on a hierarchical bilinear pooling model of claim 1,
in step 102, the method for constructing the hierarchical bilinear pooling model by taking ResNet-50 as a backbone network comprises the following steps:
removing the full connection layer of the ResNet-50, and taking the treated ResNet-50 as a characteristic extraction network;
and connecting the two parallel feature extraction networks with the input end of the bilinear pooling layer, and connecting the output end of the double-layer linear pooling layer with the output layer to form a network structure of the layered bilinear pooling model.
3. The seed identification method based on a hierarchical bilinear pooling model of claim 2,
in step 103, the method for training the hierarchical bilinear pooling model by using the preprocessed image data includes:
step 201, inputting the preprocessed image data into two parallel feature extraction networks in sequence, and outputting features of last three convolution layers of the two feature extraction networks;
step 202, expanding the features output by two feature extraction networks into high-dimensional features through linear mapping by a bilinear pooling layer to obtain two groups of high-dimensional features with the same dimension;
step 203, integrating the two groups of high-dimensional features by a Hadamard product method to obtain a plurality of integrated high-dimensional features, and splicing the plurality of integrated high-dimensional features to generate a feature vector;
in step 204, the feature vector is used as an input of an output layer, the output layer classifies the feature vector by using a softmax activation function, and a classification result of the feature vector is output.
4. The seed identification method based on a hierarchical bilinear pooling model of claim 3,
the activation function in ResNet-50 as the feature extraction network is the SiLU function.
5. The seed identification method based on a hierarchical bilinear pooling model of claim 4,
defining that the characteristics of the last three convolution layers output by the two characteristic extraction networks are siluA1, siluA2, siluA3, siluB1, siluB2 and siluB3 respectively, and integrating the two groups of high-dimensional characteristics by a Hadamard product method in step 203 to obtain a plurality of integrated high-dimensional characteristics, and splicing the plurality of integrated high-dimensional characteristics to generate a characteristic vector, wherein the method for generating the characteristic vector comprises the following steps of:
C=siluA′3⊙siluB′2+siluA′3⊙siluB′1+siluA′2⊙siluB′1
wherein C represents the generated eigenvector, and by the Hadamard product operation, siluA '1, siluA'2, siluA '3, siluB'1, siluB '2 and SiluB'3 represent the high-dimensional features obtained by linear mapping expansion of the features output by the two feature extraction networks, respectively.
6. The seed identification method based on a hierarchical bilinear pooling model of claim 1,
the image data of each type of invasive plant seeds is divided into a training set and a test set, and step 103 specifically includes:
training the layered double-linear pooling model by using the preprocessed training set, and fine-tuning the trained layered double-linear pooling model;
and testing the finely tuned layered double linear pooling model by using the preprocessed testing set to obtain the seed identification model.
7. The seed identification method based on a hierarchical bilinear pooling model of claim 6,
the method for fine tuning the trained hierarchical bilinear pooling model comprises the following steps:
setting the initial value of the learning rate to be 0.01, and fine-tuning the learning rate in a state that other parameters are kept unchanged in the training process of the model, wherein the fine-tuning size is that the learning rate is reduced by 10 times after each training iteration is performed for 40 times;
and repeatedly executing the steps until the training result is converged, and stopping fine adjustment.
8. A computer device, comprising:
a processor;
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the hierarchical bilinear pooling model-based seed identification method of any one of claims 1 to 7 via execution of the executable instructions.
9. A computer-readable storage medium comprising,
the computer readable storage medium stores a computer program which, when executed by a processor, implements the hierarchical bilinear pooling model-based seed identification method according to any one of claims 1 to 7.
CN202311030530.6A 2023-08-16 2023-08-16 Seed identification method, device and medium based on layered bilinear pooling model Pending CN117011611A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311030530.6A CN117011611A (en) 2023-08-16 2023-08-16 Seed identification method, device and medium based on layered bilinear pooling model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311030530.6A CN117011611A (en) 2023-08-16 2023-08-16 Seed identification method, device and medium based on layered bilinear pooling model

Publications (1)

Publication Number Publication Date
CN117011611A true CN117011611A (en) 2023-11-07

Family

ID=88563426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311030530.6A Pending CN117011611A (en) 2023-08-16 2023-08-16 Seed identification method, device and medium based on layered bilinear pooling model

Country Status (1)

Country Link
CN (1) CN117011611A (en)

Similar Documents

Publication Publication Date Title
CN110188795B (en) Image classification method, data processing method and device
CN110309856A (en) Image classification method, the training method of neural network and device
CN112529146B (en) Neural network model training method and device
Kurtulmuş Identification of sunflower seeds with deep convolutional neural networks
CN111161201B (en) Infrared and visible light image fusion method based on detail enhancement channel attention
CN110222718A (en) The method and device of image procossing
CN113420640A (en) Mangrove hyperspectral image classification method and device, electronic equipment and storage medium
CN116630700A (en) Remote sensing image classification method based on introduction channel-space attention mechanism
Sehree et al. Olive trees cases classification based on deep convolutional neural network from unmanned aerial vehicle imagery
CN118230166A (en) Corn canopy organ identification method and canopy phenotype detection method based on improved Mask2YOLO network
Wang et al. An ultra-lightweight efficient network for image-based plant disease and pest infection detection
Song et al. Multi-source remote sensing image classification based on two-channel densely connected convolutional networks.
Rangarajan et al. Crop identification and disease classification using traditional machine learning and deep learning approaches
Raja et al. Convolutional Neural Networks based Classification and Detection of Plant Disease
Alzhanov et al. Crop classification using UAV multispectral images with gray-level co-occurrence matrix features
Karadeniz et al. Identification of walnut variety from the leaves using deep learning algorithms
CN117011611A (en) Seed identification method, device and medium based on layered bilinear pooling model
Rajeswarappa et al. Crop Pests Identification based on Fusion CNN Model: A Deep Learning
Mall et al. AMaizeD: An End to End Pipeline for Automatic Maize Disease Detection
Jin et al. Multi-stream aggregation network for fine-grained crop pests and diseases image recognition
Youssef et al. A new method for face recognition based on color information and a neural network
Kapoor et al. Bell-Pepper Leaf Bacterial Spot Detection Using AlexNet and VGG-16
CN106841054B (en) Seed variety recognition methods and device
Anandababu et al. An effective content based image retrieval model using improved memetic algorithm
Momm et al. Evaluation of the use of spectral and textural information by an evolutionary algorithm for multi-spectral imagery classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination