CN115359353A - Flower identification and classification method and device - Google Patents

Flower identification and classification method and device Download PDF

Info

Publication number
CN115359353A
CN115359353A CN202210998890.4A CN202210998890A CN115359353A CN 115359353 A CN115359353 A CN 115359353A CN 202210998890 A CN202210998890 A CN 202210998890A CN 115359353 A CN115359353 A CN 115359353A
Authority
CN
China
Prior art keywords
flower
classification
image
model
image block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210998890.4A
Other languages
Chinese (zh)
Inventor
刘怡俊
陈少真
叶武剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202210998890.4A priority Critical patent/CN115359353A/en
Publication of CN115359353A publication Critical patent/CN115359353A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/188Vegetation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a flower identification and classification method and device, wherein the scheme provided by the application is used for shooting flowers to obtain images, or directly selecting and obtaining the images through albums; then, preprocessing is carried out, and the flower object is obtained after preprocessing is carried out on the image; finally, the flower image after pretreatment is subjected to flower recognition model to obtain a final classification result, the flower recognition model adopts a Transformer architecture design model, the feature is extracted from the image overall situation by utilizing the self-attention mechanism, the attention is focused on the flower part and the complex background is ignored, so that the flower feature is accurately extracted, the accurate classification is realized, and the technical problem that the classification is inaccurate because the local feature of the image is extracted by adopting a convolution mode in the existing classification method, the local and overall key features are difficult to be concerned at the same time, and the feature extraction capability is incomplete is solved.

Description

Flower identification and classification method and device
Technical Field
The application relates to the technical field of image recognition, in particular to a flower recognition and classification method and device.
Background
In flowers agricultural field, automatic cultivation at first needs to discern categorised to flowers to further detect flowers growth situation, rely on the professional to guide to carry out the categorised work of a large amount of repeatability flowers and can cause consuming of a large amount of manpower and materials costs. Therefore, the automatic identification and classification of flowers by using the artificial intelligence technology has huge requirements and practical application values.
In the field of flower image classification, most of the traditional classification methods extract corresponding features based on a specific image processing algorithm, and then utilize a classifier to perform mathematical analysis on the features to obtain a classification result. However, most of the existing methods adopt a convolution mode to extract local features of the image, local and global key features are difficult to be concerned at the same time, the feature extraction capability is incomplete, and accurate classification is difficult to realize.
Disclosure of Invention
The application provides a flower identification and classification method and device, which are used for solving the technical problems that the existing classification method adopts a convolution mode to extract image local features, local and global key features are difficult to pay attention to at the same time, and the feature extraction capability is incomplete, so that the classification is inaccurate.
In order to solve the above technical problem, a first aspect of the present application provides a flower identification and classification method, including:
collecting a flower image to be identified;
preprocessing the flower image;
inputting the preprocessed flower images into a preset flower recognition model for recognition and classification to obtain a classification result, wherein the flower recognition model is a machine learning model based on a Transformer structure, and the flower recognition model is composed of a linear mapping layer, a plurality of Conv-Trans modules, a plurality of ResMLP modules and a classifier;
the Conv-Trans module is used for carrying out space domain feature fusion on the image block sequence through a multi-head self-attention mechanism and then carrying out channel domain feature fusion on the image block sequence through a convolution operation mode;
the ResMLP module is used for integrating the channel domain characteristics and the spatial domain characteristics of the image block sequences in a ResMLP processing mode;
the classifier is constructed on the basis of a student network model obtained by a knowledge distillation training mode.
Preferably, the formula of the convolution processing is specifically:
Figure BDA0003806761130000021
in the formula, Z i Representing the output of a sequence of image blocks through a Conv-Trans module, X i For the input image block sequence, σ is the GELU activation function, n is the image block sequence length, T represents the matrix transposition, W 1 Representing a convolution operation based on a first convolution kernel, W 2 Representing a convolution operation based on a second convolution kernel.
Preferably, the formula definition of the ResMLP module is specifically
Y i =X i +W 3 ·σ·(W 4 ·LayerNorm(X) i )
i=1,2,3,…,n
In the formula, Y i Representing the output of a sequence of image blocks through a ResMLP module, X i For the input image block sequence, σ is the GELU activation function, n is the image block sequence length, W 3 Representing convolution operations based on a third convolution kernel, W 4 Representing a convolution operation based on a fourth convolution kernel.
Preferably, the knowledge distillation training mode is a soft distillation training mode.
Preferably, the objective function of the classifier is specifically:
L total =(1-λ)L CE (ψ(z s ),y)+λT 2 L KL (ψ(z s ,T),ψ(z t ,T))
in the formula, L total Is the total loss; l is CE () Is a cross entropy loss function; l is a radical of an alcohol KL () Is the KL divergence loss function; ψ () is a soft objective function; z is a radical of formula s And z t The classification probabilities of the classes output by the student model and the teacher model respectively; t is the temperature coefficient, lambda is the distillation coefficient, and y is the classification label;
the soft objective function is specifically:
Figure BDA0003806761130000022
in the formula, q i Is a soft target output of a function, z i Is the class classification probability output by the student model or the teacher model.
The second aspect of the present application provides a flower recognition and classification device, including:
the image acquisition unit is used for acquiring a flower image to be identified;
the preprocessing unit is used for preprocessing the flower image;
the model classification processing unit is used for inputting the preprocessed flower images into a preset flower recognition model for recognition and classification so as to obtain a classification result, wherein the flower recognition model is a machine learning model based on a Transformer structure, and the flower recognition model is specifically composed of a linear mapping layer, a plurality of Conv-Trans modules, a plurality of ResMLP modules and a classifier;
the Conv-Trans module is used for performing space domain feature fusion on the image block sequence through a multi-head self-attention mechanism and performing channel domain feature fusion on the image block sequence through a convolution operation mode;
the ResMLP module is used for integrating the channel domain characteristics and the spatial domain characteristics of the image block sequences in a ResMLP processing mode;
the classifier is constructed on the basis of a student network model obtained by a knowledge distillation training mode.
Preferably, the formula of the convolution processing is specifically:
Figure BDA0003806761130000031
in the formula, Z i Representing the output of a sequence of image blocks through a Conv-Trans module, X i For the input image block sequence, σ is the GELU activation function, n is the image block sequence length, T represents the matrix transposition, W 1 Representing a convolution operation based on a first convolution kernel, W 2 Representing a convolution operation based on a second convolution kernel.
Preferably, the formula definition of the ResMLP module is specifically
Yi=X i +W 3 ·σ·(W 4 ·LayerNorm(X) i )
i=1,2,3,…,n
In the formula, Y i Representing the output of a sequence of image blocks through a ResMLP module, X i For the input image block sequence, σ is the GELU activation function, n is the image block sequence length, W 3 Representing convolution operations based on a third convolution kernel, W 4 Representing a convolution operation based on a fourth convolution kernel.
Preferably, the knowledge distillation training mode is a soft distillation training mode.
Preferably, the objective function of the classifier is specifically:
L total =(1-λ)L CE (ψ(z s ),y)+λT 2 L KL (ψ(z s ,T),ψ(z t ,T))
in the formula, L total Is the total loss; l is CE () Is a cross entropy loss function; l is a radical of an alcohol KL () Is the KL divergence loss function; ψ () is a soft objective function; z is a radical of s And z t The classification probabilities of the classes output by the student model and the teacher model respectively; t is the temperature coefficient, lambda is the distillation coefficient, and y is the classification label;
the soft objective function is specifically:
Figure BDA0003806761130000041
in the formula, q i Is a soft target output of a function, z i Is the class classification probability output by the student model or the teacher model.
According to the technical scheme, the embodiment of the application has the following advantages:
according to the scheme, flowers are shot to obtain images, or the images are directly selected to obtain through an album; then, preprocessing is carried out, and the flower object is obtained after the image is preprocessed; finally, the flower image after pretreatment is subjected to flower recognition model to obtain a final classification result, the flower recognition model adopts a Transformer architecture design model, the feature is extracted from the image overall by utilizing the self-attention mechanism of the flower recognition model, the attention is focused on the flower part, and the complex background is ignored, so that the flower feature is accurately extracted, the accurate classification is realized, and the technical problem that the classification is inaccurate because the local feature of the image is extracted by adopting a convolution mode, the local and overall key features are difficult to be concerned at the same time, and the feature extraction capability is incomplete in the conventional classification method is solved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic diagram of an overall system framework of a flower identification and classification method provided in the present application.
Fig. 2 is a schematic flowchart of an embodiment of a flower identification and classification method provided by the present application.
FIG. 3 is a diagram of a knowledge distillation framework.
Fig. 4 is a schematic structural diagram of an embodiment of a flower identification and classification device provided by the present application.
Detailed Description
The embodiment of the application provides a flower identification and classification method and device, which are used for solving the technical problems that the existing classification method adopts a convolution mode to extract local features of an image, local and global key features are difficult to concern at the same time, and the classification is inaccurate due to the incomplete feature extraction capability.
In order to make the objects, features and advantages of the present invention more apparent and understandable, the following embodiments of the present invention are clearly and completely described with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments of the present invention. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
As shown in fig. 1, the present application is based on the flower recognition model provided below, and the flower recognition method of the present embodiment can be implemented by a mobile-end-based flower recognition system. The system can be divided into a mobile client side and a cloud server side, and a client-server (C/S) model is adopted. The mobile client device can adopt embedded devices such as a mobile phone or a single chip microcomputer.
The mobile client is mainly responsible for flower image acquisition, image preprocessing and flower image classification by operating a small-scale network model. The specific operation flow is as follows: firstly, shooting flowers by using a camera of a mobile terminal of a mobile phone to obtain an image, or directly selecting to obtain the image through an album; then, preprocessing is carried out, operations such as cutting or turning are carried out on the image, and a square area containing the flower object is selected by utilizing an interactive frame, so that the flower object is obtained; and finally, obtaining a final classification result of the flower image after the pretreatment through a network model.
The server side has the main functions of training a network model and interacting information with the mobile side.
Referring to fig. 2, a first embodiment of the present application provides a flower identification and classification method, including:
step 101, collecting a flower image to be identified.
And 102, preprocessing the flower image.
And 103, inputting the preprocessed flower image into a preset flower recognition model for recognition and classification to obtain a classification result, wherein the flower recognition model is a machine learning model based on a Transformer structure, and the flower recognition model is composed of a linear mapping layer, a plurality of Conv-Trans modules, a plurality of ResMLP modules and a classifier.
The system model design provided by the embodiment utilizes the 1 × 1 convolution kernel to replace a linear layer, which not only can increase the flexibility of the network, but also can enhance the nonlinear expression capability of the network. The linear fully-connected layer requires the input tensor to be of fixed size, while the convolution can arbitrarily adjust the input tensor size. Moreover, the linear full-connection layer can destroy the spatial structure of the characteristic diagram, and the convolution operation reserves the spatial characteristic of the two-dimensional characteristic diagram. The model is different from the convolutional neural network pyramid architecture, and the input of each layer is fixed, which is consistent with ViT. The difference is that the ViT adds a classification mark block in an input image block sequence as the basis of the final classification output, but the classification mark block is not added in the model, and the basis of the final classification output is the output average value of the image block sequence.
It should be noted that, the Conv-Trans module is configured to perform spatial domain feature fusion on the image block sequence through a multi-head self-attention mechanism, and then perform channel domain feature fusion on the image block sequence through a convolution operation manner.
Conv-Trans Module design, the Conv-Trans Module includes a multi-headed self-attention, two convolutional layers, and a non-linear layer. And meanwhile, jump layer connection and layer normalization are added in the module.
The module receives convolution operation after multi-head self attention, inputs images with X size of c X h X w (wherein h is height, w is width, and c is channel number), obtains image block sequences with each size of c X p after linear layer, and forms an image block sequence X (X is image block sequence X) 1 ,x 2 ,…,x n ) Length of n, n = h · w/p 2 And p is the length and width of each image tile, the size of the image tile is typically chosen to be 16 x 16 or 32 x 32, where the smaller the size of the image tile, the longer the sequence. The size of the sequence is unchanged after the sequence is subjected to a multi-head self-attention mechanism, but the dimensionality of the sequence needs to be converted and then convolution operation is carried out. The specific formula of the convolution operation part is defined as follows:
Figure BDA0003806761130000061
wherein Z i Representing the output of a sequence of image blocks through a Conv-Trans module, X i To input, σ is the GELU activation function, n is the image block sequence length, T represents the transpose, W represents the convolution operation, and the subscript to W represents a different convolution kernel, i.e., W 1 Representing a convolution operation based on a first convolution kernel, W 2 Representing a convolution operation based on a second convolution kernel.
And the ResMLP (Residual Multi-layer Perceptin) module is used for integrating and processing the channel domain characteristics and the spatial domain characteristics of the image block sequence in a ResMLP processing mode.
It should be noted that the ResMLP module design of this embodiment includes two fully connected layers and one nonlinear layer, and a Dropout layer is added between each fully connected layer and nonlinear layer. In order to improve the performance of the network, layer normalization is added into each ResMLP module, the idea of a residual error network is introduced, and layer jump connection is added between each ResMLP module. The specific formula of the module is defined as follows:
Y i =X i +W 3 ·σ·(W 4 ·LayerNorm(X) i )
i=1,2,3,…,n
wherein Y is i Representing the output, X, of a sequence of image blocks through a ResMLP module i To input, σ is the GELU activation function, n is the image block sequence length, W 3 Representing convolution operations based on a third convolution kernel, W 4 Representing a convolution operation based on a fourth convolution kernel.
The classifier is constructed based on a student network model obtained by a knowledge distillation training mode.
It should be noted that the knowledge distillation method migrates knowledge learned by a large-scale pre-training model to a smaller network model. The flower data sets used in the embodiment all belong to small-scale data sets, and if the flower data sets are directly used for training a large-scale network, the model is easy to overfit, and the generalization capability is poor. The flower image classification belongs to fine-grained image classification, different flowers have certain similarity, a soft target (soft target) can be better utilized by utilizing a knowledge distillation training mode, the soft target has a higher entropy value than a hard target (hard target), and more information is also contained, including the relationship among different types of flowers.
Moreover, knowledge distillation can be used for transferring knowledge learned by a large-scale network to a small-scale network with poor learning capacity. The small-scale lightweight network is easier to be deployed to the edge embedded equipment, so that the artificial intelligence automation technology can be really realized on the ground.
The Soft Target (Soft Target) is shown by the following equation:
Figure BDA0003806761130000071
wherein: t is a temperature parameter, the softening degree of the output probability is controlled, when T =1, the output is the class probability of SoftMax, and when T tends to be infinite, the formula is equivalent to a logic unit of network output; z is a radical of i Outputting the probability of classification categories for the SoftMax function; q. q.s i Is the soft target output of the function.
Knowledge distillation can be divided into soft distillation and hard distillation, and the difference is that soft targets or hard targets output by a teacher network are utilized, and the hard targets are prediction labels output by the teacher network. This example preferably employs soft distillation, whose framework flow is shown in fig. 3, whose purpose is to minimize the KL divergence (Kullback-Leibler divergence) between the SoftMax output of the teacher model and the SoftMax output of the student model, with the loss function of soft knowledge distillation as follows:
L total =(1-λ)L CE (ψ(z s ),y)+λT 2 L KL (ψ(z s ,T),ψ(z t ,T))
wherein: l is total Is the total loss; l is CE () Is a cross entropy loss function; l is KL () Is the KL divergence loss function; ψ () is a soft objective function; z is a radical of s And z t The classification probabilities of the classes output by the student model and the teacher model respectively; t is the temperature coefficient.
According to the scheme provided by the application, on the basis of the existing Vision Transformer (ViT) network, the flower characteristics are accurately extracted by using a self-attention mechanism, and then the RseMLP (resource Multi-layer Perception) full-link layer with a double convolution layer and a Residual structure is introduced, so that the characteristic extraction and identification capability of the model on the flower is improved. The dual convolutional layers make the attention mechanism of the model more focused and accurate, which further enhances the feature extraction capability of the model. On the basis, in order to further improve the classification accuracy of the model, a ResMLP module is introduced into the model proposed herein. Secondly, in the verification of the evaluation model, besides the public data set, a homemade more complex fine-grained data set is used to solve the problem of data set limitation. And finally, a knowledge distillation method is provided for compressing the large-scale network model, the large-scale network model has better learning capacity, and the knowledge learned by the large-scale network can be transferred to a small-scale network with poorer learning capacity by using the knowledge distillation. The small-scale lightweight network is easier to deploy to edge embedded equipment, so that the artificial intelligence automation technology can be really realized.
The above description is a detailed description of an embodiment of a flower identification and classification method provided by the present application, and the following description is a detailed description of an embodiment of a flower identification and classification device provided by the present application.
Referring to fig. 4, a second aspect of the present application provides a flower recognition and classification device, including:
the image acquisition unit 201 is used for acquiring a flower image to be identified;
the preprocessing unit 202 is used for preprocessing the flower image;
the model classification processing unit 203 is used for inputting the preprocessed flower images into a preset flower recognition model for recognition and classification so as to obtain a classification result, wherein the flower recognition model is a machine learning model based on a Transformer structure, and the flower recognition model is specifically composed of a linear mapping layer, a plurality of Conv-Trans modules, a plurality of ResMLP modules and a classifier;
the Conv-Trans module is used for performing space domain feature fusion on the image block sequence through a multi-head self-attention mechanism and performing channel domain feature fusion on the image block sequence through a convolution operation mode;
the ResMLP module is used for integrating and processing the channel domain characteristics and the space domain characteristics of the image block sequence in a ResMLP processing mode;
the classifier is constructed based on a student network model obtained by a knowledge distillation training mode.
Further, the formula of the convolution processing is specifically as follows:
Figure BDA0003806761130000081
in the formula, Z i Representing the output of a sequence of image blocks through a Conv-Trans module, X i For the input image block sequence, σ is the GELU activation function, n is the length of the image block sequence, and T represents the matrix transposition.
Further, the formula definition of the ResMLP module is specifically as
Y i =X i +W 3 ·σ·(W 4 ·LayerNorm(X) i )
i=1,2,3,…,n
In the formula, Y i Representing the output, X, of a sequence of image blocks through a ResMLP module i For the input image block sequence, σ is the GELU activation function, n is the image block sequence length, W 3 Representing convolution operations based on a third convolution kernel, W 4 Representing a convolution operation based on a fourth convolution kernel.
Further, the knowledge distillation training mode is specifically a soft distillation training mode.
Further, the objective function of the classifier is specifically:
L total =(1-λ)L CE (ψ(z s ),y)+λT 2 L KL (ψ(z s ,T),ψ(z t ,T))
in the formula, L total Is the total loss; l is CE () Is a cross entropy loss function; l is a radical of an alcohol KL () Is the KL divergence loss function; ψ () is a soft objective function; z is a radical of s And z t The class classification probabilities output by the student model and the teacher model respectively; t is a temperature coefficient, lambda is a distillation coefficient, and y is a classification label;
the soft objective function is specifically:
Figure BDA0003806761130000091
in the formula, q i Is a soft target output of a function, z i Is the class classification probability output by the student model or the teacher model.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the terminal, the device and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed terminal, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is substantially or partly contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A flower identification and classification method, comprising:
collecting a flower image to be identified;
preprocessing the flower image;
inputting the preprocessed flower image into a preset flower recognition model for recognition and classification to obtain a classification result, wherein the flower recognition model is a machine learning model based on a Transformer structure and specifically comprises a linear mapping layer, a plurality of Conv-Trans modules, a plurality of ResMLP modules and a classifier;
the Conv-Trans module is used for performing space domain feature fusion on the image block sequence through a multi-head self-attention mechanism and performing channel domain feature fusion on the image block sequence through a convolution operation mode;
the ResMLP module is used for integrating the channel domain characteristics and the space domain characteristics of the image block sequence in a ResMLP processing mode;
the classifier is constructed on the basis of a student network model obtained by a knowledge distillation training mode.
2. A flower identification and classification method according to claim 1, wherein the formula of the convolution process is specifically:
Figure FDA0003806761120000011
in the formula, Z i Representing the output of the sequence of image blocks through a Conv-Trans module, X i For the input image block sequence, σ is the GELU activation function, n is the image block sequence length, T represents the matrix transposition, W 1 Representing convolution operations based on a first convolution kernel, W 2 Representing a convolution operation based on a second convolution kernel.
3. A flower recognition and classification method according to claim 1, wherein the formula definition of the ResMLP module is Y i =X i +W 3 ·σ·(W 4 ·LayerNorm(X) i )
i=1,2,3,…,n
In the formula, Y i Representing the output of a sequence of image blocks through a ResMLP module, X i For the input image block sequence, σ is the GELU activation function, n is the image block sequence length, W 3 Representing convolution operations based on a third convolution kernel, W 4 Representing a convolution operation based on a fourth convolution kernel.
4. The flower recognition and classification method according to claim 1, wherein the knowledge distillation training mode is a soft distillation training mode.
5. A flower identification and classification method according to claim 4, wherein the objective function of the classifier is specifically:
L total =(1-λ)L CE (ψ(z s ),y)+λT 2 L KL (ψ(z s ,T),ψ(z t ,T))
in the formula, L total Is the total loss; l is CE () Is thatA cross entropy loss function; l is KL () Is the KL divergence loss function; ψ () is a soft objective function; z is a radical of formula s And z t The classification probabilities of the classes output by the student model and the teacher model respectively; t is the temperature coefficient, lambda is the distillation coefficient, and y is the classification label;
the soft objective function is specifically:
Figure FDA0003806761120000021
in the formula, q i Is a soft target output of a function, z i And the classification probability result is output by the student model or the teacher model.
6. A flower recognition and classification device, comprising:
the image acquisition unit is used for acquiring a flower image to be identified;
the preprocessing unit is used for preprocessing the flower image;
the model classification processing unit is used for inputting the preprocessed flower images into a preset flower recognition model for recognition and classification so as to obtain a classification result, wherein the flower recognition model is a machine learning model based on a Transformer structure, and specifically consists of a linear mapping layer, a plurality of Conv-Trans modules, a plurality of ResMLP modules and a classifier;
the Conv-Trans module is used for performing space domain feature fusion on the image block sequence through a multi-head self-attention mechanism and performing channel domain feature fusion on the image block sequence through a convolution operation mode;
the ResMLP module is used for integrating the channel domain characteristics and the spatial domain characteristics of the image block sequences in a ResMLP processing mode;
the classifier is constructed on the basis of a student network model obtained by a knowledge distillation training mode.
7. A flower recognition and classification device according to claim 6, wherein the formula of the convolution process is specifically:
Figure FDA0003806761120000022
in the formula, Z i Representing the output of the sequence of image blocks through a Conv-Trans module, X i For the input image block sequence, σ is the GELU activation function, n is the image block sequence length, T represents the matrix transposition, W 1 Representing convolution operations based on a first convolution kernel, W 2 Representing a convolution operation based on a second convolution kernel.
8. A flower recognition and classification device according to claim 6, wherein the ResMLP module has a formula definition of Yi = X i +W 3 ·σ·(W4·LayerNorm(X) i )
i=1,2,3,…,n
In the formula, Y i Representing the output, X, of a sequence of image blocks through a ResMLP module i For the input image block sequence, σ is the GELU activation function, n is the image block sequence length, W 3 Representing convolution operations based on a third convolution kernel, W 4 Representing a convolution operation based on a fourth convolution kernel.
9. The flower recognition and classification device according to claim 6, wherein the knowledge distillation training mode is a soft distillation training mode.
10. A flower recognition and classification device according to claim 9, wherein the objective function of the classifier is specifically:
L total =(1-λ)L CE (ψ(z s ),y)+λT 2 L KL (ψ(z s ,T),ψ(z t ,T))
in the formula, L total Is the total loss; l is CE () Is a cross entropy loss function; l is a radical of an alcohol KL () Is the KL divergence loss function; ψ () is a soft objective function; z is a radical of s And z t The classification probabilities of the classes output by the student model and the teacher model respectively; t is the temperature coefficient, lambda is the distillation coefficient, and y is the classification label;
the soft objective function is specifically:
Figure FDA0003806761120000031
in the formula, q i Is a soft target output of a function, z i Is the class classification probability output by the student model or the teacher model.
CN202210998890.4A 2022-08-19 2022-08-19 Flower identification and classification method and device Pending CN115359353A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210998890.4A CN115359353A (en) 2022-08-19 2022-08-19 Flower identification and classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210998890.4A CN115359353A (en) 2022-08-19 2022-08-19 Flower identification and classification method and device

Publications (1)

Publication Number Publication Date
CN115359353A true CN115359353A (en) 2022-11-18

Family

ID=84003055

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210998890.4A Pending CN115359353A (en) 2022-08-19 2022-08-19 Flower identification and classification method and device

Country Status (1)

Country Link
CN (1) CN115359353A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116645716A (en) * 2023-05-31 2023-08-25 南京林业大学 Expression Recognition Method Based on Local Features and Global Features
CN117058437A (en) * 2023-06-16 2023-11-14 江苏大学 Flower classification method, system, equipment and medium based on knowledge distillation
CN117114053A (en) * 2023-08-24 2023-11-24 之江实验室 Convolutional neural network model compression method and device based on structure search and knowledge distillation
CN117253122A (en) * 2023-11-17 2023-12-19 云南大学 Corn seed approximate variety screening method, device, equipment and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116645716A (en) * 2023-05-31 2023-08-25 南京林业大学 Expression Recognition Method Based on Local Features and Global Features
CN116645716B (en) * 2023-05-31 2024-01-19 南京林业大学 Expression recognition method based on local features and global features
CN117058437A (en) * 2023-06-16 2023-11-14 江苏大学 Flower classification method, system, equipment and medium based on knowledge distillation
CN117058437B (en) * 2023-06-16 2024-03-08 江苏大学 Flower classification method, system, equipment and medium based on knowledge distillation
CN117114053A (en) * 2023-08-24 2023-11-24 之江实验室 Convolutional neural network model compression method and device based on structure search and knowledge distillation
CN117253122A (en) * 2023-11-17 2023-12-19 云南大学 Corn seed approximate variety screening method, device, equipment and storage medium
CN117253122B (en) * 2023-11-17 2024-01-23 云南大学 Corn seed approximate variety screening method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN115359353A (en) Flower identification and classification method and device
CN111695467B (en) Spatial spectrum full convolution hyperspectral image classification method based on super-pixel sample expansion
Romero et al. Unsupervised deep feature extraction for remote sensing image classification
CN110245709B (en) 3D point cloud data semantic segmentation method based on deep learning and self-attention
CN105740894B (en) Semantic annotation method for hyperspectral remote sensing image
CN105303198B (en) A kind of remote sensing image semisupervised classification method learnt from fixed step size
CN108090472B (en) Pedestrian re-identification method and system based on multi-channel consistency characteristics
CN109871830A (en) Spatial-spectral fusion hyperspectral image classification method based on three-dimensional depth residual error network
CN108090447A (en) Hyperspectral image classification method and device under double branch's deep structures
CN112347970B (en) Remote sensing image ground object identification method based on graph convolution neural network
CN108537115B (en) Image recognition method and device and electronic equipment
US11941865B2 (en) Hyperspectral image classification method based on context-rich networks
CN110222592A (en) A kind of construction method of the timing behavioral value network model generated based on complementary timing behavior motion
CN115457006B (en) Unmanned aerial vehicle inspection defect classification method and device based on similarity consistency self-distillation
CN112464766A (en) Farmland automatic identification method and system
CN113435254A (en) Sentinel second image-based farmland deep learning extraction method
CN114676769A (en) Visual transform-based small sample insect image identification method
CN114972208A (en) YOLOv 4-based lightweight wheat scab detection method
CN113111716A (en) Remote sensing image semi-automatic labeling method and device based on deep learning
CN114332482A (en) Lightweight target detection method based on feature fusion
CN115953621A (en) Semi-supervised hyperspectral image classification method based on unreliable pseudo-label learning
Farooq et al. Transferable convolutional neural network for weed mapping with multisensor imagery
CN115457311A (en) Hyperspectral remote sensing image band selection method based on self-expression transfer learning
CN116843952A (en) Small sample learning classification method for fruit and vegetable disease identification
CN116630700A (en) Remote sensing image classification method based on introduction channel-space attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination