CN115359353A

CN115359353A - Flower identification and classification method and device

Info

Publication number: CN115359353A
Application number: CN202210998890.4A
Authority: CN
Inventors: 刘怡俊; 陈少真; 叶武剑
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2022-08-19
Filing date: 2022-08-19
Publication date: 2022-11-18

Abstract

The application discloses a flower identification and classification method and device, wherein the scheme provided by the application is used for shooting flowers to obtain images, or directly selecting and obtaining the images through albums; then, preprocessing is carried out, and the flower object is obtained after preprocessing is carried out on the image; finally, the flower image after pretreatment is subjected to flower recognition model to obtain a final classification result, the flower recognition model adopts a Transformer architecture design model, the feature is extracted from the image overall situation by utilizing the self-attention mechanism, the attention is focused on the flower part and the complex background is ignored, so that the flower feature is accurately extracted, the accurate classification is realized, and the technical problem that the classification is inaccurate because the local feature of the image is extracted by adopting a convolution mode in the existing classification method, the local and overall key features are difficult to be concerned at the same time, and the feature extraction capability is incomplete is solved.

Description

Flower identification and classification method and device

Technical Field

The application relates to the technical field of image recognition, in particular to a flower recognition and classification method and device.

Background

In flowers agricultural field, automatic cultivation at first needs to discern categorised to flowers to further detect flowers growth situation, rely on the professional to guide to carry out the categorised work of a large amount of repeatability flowers and can cause consuming of a large amount of manpower and materials costs. Therefore, the automatic identification and classification of flowers by using the artificial intelligence technology has huge requirements and practical application values.

In the field of flower image classification, most of the traditional classification methods extract corresponding features based on a specific image processing algorithm, and then utilize a classifier to perform mathematical analysis on the features to obtain a classification result. However, most of the existing methods adopt a convolution mode to extract local features of the image, local and global key features are difficult to be concerned at the same time, the feature extraction capability is incomplete, and accurate classification is difficult to realize.

Disclosure of Invention

The application provides a flower identification and classification method and device, which are used for solving the technical problems that the existing classification method adopts a convolution mode to extract image local features, local and global key features are difficult to pay attention to at the same time, and the feature extraction capability is incomplete, so that the classification is inaccurate.

In order to solve the above technical problem, a first aspect of the present application provides a flower identification and classification method, including:

collecting a flower image to be identified;

preprocessing the flower image;

inputting the preprocessed flower images into a preset flower recognition model for recognition and classification to obtain a classification result, wherein the flower recognition model is a machine learning model based on a Transformer structure, and the flower recognition model is composed of a linear mapping layer, a plurality of Conv-Trans modules, a plurality of ResMLP modules and a classifier;

the Conv-Trans module is used for carrying out space domain feature fusion on the image block sequence through a multi-head self-attention mechanism and then carrying out channel domain feature fusion on the image block sequence through a convolution operation mode;

the ResMLP module is used for integrating the channel domain characteristics and the spatial domain characteristics of the image block sequences in a ResMLP processing mode;

the classifier is constructed on the basis of a student network model obtained by a knowledge distillation training mode.

Preferably, the formula of the convolution processing is specifically:

in the formula, Z _i Representing the output of a sequence of image blocks through a Conv-Trans module, X _i For the input image block sequence, σ is the GELU activation function, n is the image block sequence length, T represents the matrix transposition, W ₁ Representing a convolution operation based on a first convolution kernel, W ₂ Representing a convolution operation based on a second convolution kernel.

Preferably, the formula definition of the ResMLP module is specifically

Y _i ＝X _i +W ₃ ·σ·(W ₄ ·LayerNorm(X) _i )

i＝1,2,3,…,n

In the formula, Y _i Representing the output of a sequence of image blocks through a ResMLP module, X _i For the input image block sequence, σ is the GELU activation function, n is the image block sequence length, W ₃ Representing convolution operations based on a third convolution kernel, W ₄ Representing a convolution operation based on a fourth convolution kernel.

Preferably, the knowledge distillation training mode is a soft distillation training mode.

Preferably, the objective function of the classifier is specifically:

L _total ＝(1-λ)L _CE (ψ(z _s ),y)+λT ² L _KL (ψ(z _s ,T),ψ(z _t ,T))

in the formula, L _total Is the total loss; l is _CE () Is a cross entropy loss function; l is a radical of an alcohol _KL () Is the KL divergence loss function; ψ () is a soft objective function; z is a radical of formula _s And z _t The classification probabilities of the classes output by the student model and the teacher model respectively; t is the temperature coefficient, lambda is the distillation coefficient, and y is the classification label;

the soft objective function is specifically:

in the formula, q _i Is a soft target output of a function, z _i Is the class classification probability output by the student model or the teacher model.

The second aspect of the present application provides a flower recognition and classification device, including:

the image acquisition unit is used for acquiring a flower image to be identified;

the preprocessing unit is used for preprocessing the flower image;

the model classification processing unit is used for inputting the preprocessed flower images into a preset flower recognition model for recognition and classification so as to obtain a classification result, wherein the flower recognition model is a machine learning model based on a Transformer structure, and the flower recognition model is specifically composed of a linear mapping layer, a plurality of Conv-Trans modules, a plurality of ResMLP modules and a classifier;

the Conv-Trans module is used for performing space domain feature fusion on the image block sequence through a multi-head self-attention mechanism and performing channel domain feature fusion on the image block sequence through a convolution operation mode;

Preferably, the formula of the convolution processing is specifically:

Preferably, the formula definition of the ResMLP module is specifically

Yi＝X _i +W ₃ ·σ·(W ₄ ·LayerNorm(X) _i )

i＝1,2,3,…,n

Preferably, the objective function of the classifier is specifically:

L _total ＝(1-λ)L _CE (ψ(z _s ),y)+λT ² L _KL (ψ(z _s ,T),ψ(z _t ,T))

in the formula, L _total Is the total loss; l is _CE () Is a cross entropy loss function; l is a radical of an alcohol _KL () Is the KL divergence loss function; ψ () is a soft objective function; z is a radical of _s And z _t The classification probabilities of the classes output by the student model and the teacher model respectively; t is the temperature coefficient, lambda is the distillation coefficient, and y is the classification label;

the soft objective function is specifically:

According to the technical scheme, the embodiment of the application has the following advantages:

according to the scheme, flowers are shot to obtain images, or the images are directly selected to obtain through an album; then, preprocessing is carried out, and the flower object is obtained after the image is preprocessed; finally, the flower image after pretreatment is subjected to flower recognition model to obtain a final classification result, the flower recognition model adopts a Transformer architecture design model, the feature is extracted from the image overall by utilizing the self-attention mechanism of the flower recognition model, the attention is focused on the flower part, and the complex background is ignored, so that the flower feature is accurately extracted, the accurate classification is realized, and the technical problem that the classification is inaccurate because the local feature of the image is extracted by adopting a convolution mode, the local and overall key features are difficult to be concerned at the same time, and the feature extraction capability is incomplete in the conventional classification method is solved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic diagram of an overall system framework of a flower identification and classification method provided in the present application.

Fig. 2 is a schematic flowchart of an embodiment of a flower identification and classification method provided by the present application.

FIG. 3 is a diagram of a knowledge distillation framework.

Fig. 4 is a schematic structural diagram of an embodiment of a flower identification and classification device provided by the present application.

Detailed Description

The embodiment of the application provides a flower identification and classification method and device, which are used for solving the technical problems that the existing classification method adopts a convolution mode to extract local features of an image, local and global key features are difficult to concern at the same time, and the classification is inaccurate due to the incomplete feature extraction capability.

In order to make the objects, features and advantages of the present invention more apparent and understandable, the following embodiments of the present invention are clearly and completely described with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments of the present invention. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

As shown in fig. 1, the present application is based on the flower recognition model provided below, and the flower recognition method of the present embodiment can be implemented by a mobile-end-based flower recognition system. The system can be divided into a mobile client side and a cloud server side, and a client-server (C/S) model is adopted. The mobile client device can adopt embedded devices such as a mobile phone or a single chip microcomputer.

The mobile client is mainly responsible for flower image acquisition, image preprocessing and flower image classification by operating a small-scale network model. The specific operation flow is as follows: firstly, shooting flowers by using a camera of a mobile terminal of a mobile phone to obtain an image, or directly selecting to obtain the image through an album; then, preprocessing is carried out, operations such as cutting or turning are carried out on the image, and a square area containing the flower object is selected by utilizing an interactive frame, so that the flower object is obtained; and finally, obtaining a final classification result of the flower image after the pretreatment through a network model.

The server side has the main functions of training a network model and interacting information with the mobile side.

Referring to fig. 2, a first embodiment of the present application provides a flower identification and classification method, including:

step 101, collecting a flower image to be identified.

And 102, preprocessing the flower image.

And 103, inputting the preprocessed flower image into a preset flower recognition model for recognition and classification to obtain a classification result, wherein the flower recognition model is a machine learning model based on a Transformer structure, and the flower recognition model is composed of a linear mapping layer, a plurality of Conv-Trans modules, a plurality of ResMLP modules and a classifier.

The system model design provided by the embodiment utilizes the 1 × 1 convolution kernel to replace a linear layer, which not only can increase the flexibility of the network, but also can enhance the nonlinear expression capability of the network. The linear fully-connected layer requires the input tensor to be of fixed size, while the convolution can arbitrarily adjust the input tensor size. Moreover, the linear full-connection layer can destroy the spatial structure of the characteristic diagram, and the convolution operation reserves the spatial characteristic of the two-dimensional characteristic diagram. The model is different from the convolutional neural network pyramid architecture, and the input of each layer is fixed, which is consistent with ViT. The difference is that the ViT adds a classification mark block in an input image block sequence as the basis of the final classification output, but the classification mark block is not added in the model, and the basis of the final classification output is the output average value of the image block sequence.

It should be noted that, the Conv-Trans module is configured to perform spatial domain feature fusion on the image block sequence through a multi-head self-attention mechanism, and then perform channel domain feature fusion on the image block sequence through a convolution operation manner.

Conv-Trans Module design, the Conv-Trans Module includes a multi-headed self-attention, two convolutional layers, and a non-linear layer. And meanwhile, jump layer connection and layer normalization are added in the module.

The module receives convolution operation after multi-head self attention, inputs images with X size of c X h X w (wherein h is height, w is width, and c is channel number), obtains image block sequences with each size of c X p after linear layer, and forms an image block sequence X (X is image block sequence X) ₁ ,x ₂ ,…,x _n ) Length of n, n = h · w/p ² And p is the length and width of each image tile, the size of the image tile is typically chosen to be 16 x 16 or 32 x 32, where the smaller the size of the image tile, the longer the sequence. The size of the sequence is unchanged after the sequence is subjected to a multi-head self-attention mechanism, but the dimensionality of the sequence needs to be converted and then convolution operation is carried out. The specific formula of the convolution operation part is defined as follows:

wherein Z _i Representing the output of a sequence of image blocks through a Conv-Trans module, X _i To input, σ is the GELU activation function, n is the image block sequence length, T represents the transpose, W represents the convolution operation, and the subscript to W represents a different convolution kernel, i.e., W ₁ Representing a convolution operation based on a first convolution kernel, W ₂ Representing a convolution operation based on a second convolution kernel.

And the ResMLP (Residual Multi-layer Perceptin) module is used for integrating and processing the channel domain characteristics and the spatial domain characteristics of the image block sequence in a ResMLP processing mode.

It should be noted that the ResMLP module design of this embodiment includes two fully connected layers and one nonlinear layer, and a Dropout layer is added between each fully connected layer and nonlinear layer. In order to improve the performance of the network, layer normalization is added into each ResMLP module, the idea of a residual error network is introduced, and layer jump connection is added between each ResMLP module. The specific formula of the module is defined as follows:

Y _i ＝X _i +W ₃ ·σ·(W ₄ ·LayerNorm(X) _i )

i＝1,2,3,…,n

wherein Y is _i Representing the output, X, of a sequence of image blocks through a ResMLP module _i To input, σ is the GELU activation function, n is the image block sequence length, W ₃ Representing convolution operations based on a third convolution kernel, W ₄ Representing a convolution operation based on a fourth convolution kernel.

The classifier is constructed based on a student network model obtained by a knowledge distillation training mode.

It should be noted that the knowledge distillation method migrates knowledge learned by a large-scale pre-training model to a smaller network model. The flower data sets used in the embodiment all belong to small-scale data sets, and if the flower data sets are directly used for training a large-scale network, the model is easy to overfit, and the generalization capability is poor. The flower image classification belongs to fine-grained image classification, different flowers have certain similarity, a soft target (soft target) can be better utilized by utilizing a knowledge distillation training mode, the soft target has a higher entropy value than a hard target (hard target), and more information is also contained, including the relationship among different types of flowers.

Moreover, knowledge distillation can be used for transferring knowledge learned by a large-scale network to a small-scale network with poor learning capacity. The small-scale lightweight network is easier to be deployed to the edge embedded equipment, so that the artificial intelligence automation technology can be really realized on the ground.

The Soft Target (Soft Target) is shown by the following equation:

wherein: t is a temperature parameter, the softening degree of the output probability is controlled, when T =1, the output is the class probability of SoftMax, and when T tends to be infinite, the formula is equivalent to a logic unit of network output; z is a radical of _i Outputting the probability of classification categories for the SoftMax function; q. q.s _i Is the soft target output of the function.

Knowledge distillation can be divided into soft distillation and hard distillation, and the difference is that soft targets or hard targets output by a teacher network are utilized, and the hard targets are prediction labels output by the teacher network. This example preferably employs soft distillation, whose framework flow is shown in fig. 3, whose purpose is to minimize the KL divergence (Kullback-Leibler divergence) between the SoftMax output of the teacher model and the SoftMax output of the student model, with the loss function of soft knowledge distillation as follows:

L _total ＝(1-λ)L _CE (ψ(z _s ),y)+λT ² L _KL (ψ(z _s ,T),ψ(z _t ,T))

wherein: l is _total Is the total loss; l is _CE () Is a cross entropy loss function; l is _KL () Is the KL divergence loss function; ψ () is a soft objective function; z is a radical of _s And z _t The classification probabilities of the classes output by the student model and the teacher model respectively; t is the temperature coefficient.

According to the scheme provided by the application, on the basis of the existing Vision Transformer (ViT) network, the flower characteristics are accurately extracted by using a self-attention mechanism, and then the RseMLP (resource Multi-layer Perception) full-link layer with a double convolution layer and a Residual structure is introduced, so that the characteristic extraction and identification capability of the model on the flower is improved. The dual convolutional layers make the attention mechanism of the model more focused and accurate, which further enhances the feature extraction capability of the model. On the basis, in order to further improve the classification accuracy of the model, a ResMLP module is introduced into the model proposed herein. Secondly, in the verification of the evaluation model, besides the public data set, a homemade more complex fine-grained data set is used to solve the problem of data set limitation. And finally, a knowledge distillation method is provided for compressing the large-scale network model, the large-scale network model has better learning capacity, and the knowledge learned by the large-scale network can be transferred to a small-scale network with poorer learning capacity by using the knowledge distillation. The small-scale lightweight network is easier to deploy to edge embedded equipment, so that the artificial intelligence automation technology can be really realized.

The above description is a detailed description of an embodiment of a flower identification and classification method provided by the present application, and the following description is a detailed description of an embodiment of a flower identification and classification device provided by the present application.

Referring to fig. 4, a second aspect of the present application provides a flower recognition and classification device, including:

the image acquisition unit 201 is used for acquiring a flower image to be identified;

the preprocessing unit 202 is used for preprocessing the flower image;

the model classification processing unit 203 is used for inputting the preprocessed flower images into a preset flower recognition model for recognition and classification so as to obtain a classification result, wherein the flower recognition model is a machine learning model based on a Transformer structure, and the flower recognition model is specifically composed of a linear mapping layer, a plurality of Conv-Trans modules, a plurality of ResMLP modules and a classifier;

the ResMLP module is used for integrating and processing the channel domain characteristics and the space domain characteristics of the image block sequence in a ResMLP processing mode;

Further, the formula of the convolution processing is specifically as follows:

in the formula, Z _i Representing the output of a sequence of image blocks through a Conv-Trans module, X _i For the input image block sequence, σ is the GELU activation function, n is the length of the image block sequence, and T represents the matrix transposition.

Further, the formula definition of the ResMLP module is specifically as

Y _i ＝X _i +W ₃ ·σ·(W ₄ ·LayerNorm(X) _i )

i＝1,2,3,…,n

In the formula, Y _i Representing the output, X, of a sequence of image blocks through a ResMLP module _i For the input image block sequence, σ is the GELU activation function, n is the image block sequence length, W ₃ Representing convolution operations based on a third convolution kernel, W ₄ Representing a convolution operation based on a fourth convolution kernel.

Further, the knowledge distillation training mode is specifically a soft distillation training mode.

Further, the objective function of the classifier is specifically:

L _total ＝(1-λ)L _CE (ψ(z _s ),y)+λT ² L _KL (ψ(z _s ,T),ψ(z _t ,T))

in the formula, L _total Is the total loss; l is _CE () Is a cross entropy loss function; l is a radical of an alcohol _KL () Is the KL divergence loss function; ψ () is a soft objective function; z is a radical of _s And z _t The class classification probabilities output by the student model and the teacher model respectively; t is a temperature coefficient, lambda is a distillation coefficient, and y is a classification label;

the soft objective function is specifically:

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the terminal, the device and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed terminal, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is substantially or partly contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A flower identification and classification method, comprising:

collecting a flower image to be identified;

preprocessing the flower image;

inputting the preprocessed flower image into a preset flower recognition model for recognition and classification to obtain a classification result, wherein the flower recognition model is a machine learning model based on a Transformer structure and specifically comprises a linear mapping layer, a plurality of Conv-Trans modules, a plurality of ResMLP modules and a classifier;

the ResMLP module is used for integrating the channel domain characteristics and the space domain characteristics of the image block sequence in a ResMLP processing mode;

2. A flower identification and classification method according to claim 1, wherein the formula of the convolution process is specifically:

in the formula, Z _i Representing the output of the sequence of image blocks through a Conv-Trans module, X _i For the input image block sequence, σ is the GELU activation function, n is the image block sequence length, T represents the matrix transposition, W ₁ Representing convolution operations based on a first convolution kernel, W ₂ Representing a convolution operation based on a second convolution kernel.

3. A flower recognition and classification method according to claim 1, wherein the formula definition of the ResMLP module is Y _i ＝X _i +W ₃ ·σ·(W ₄ ·LayerNorm(X) _i )

i＝1,2,3,…,n

4. The flower recognition and classification method according to claim 1, wherein the knowledge distillation training mode is a soft distillation training mode.

5. A flower identification and classification method according to claim 4, wherein the objective function of the classifier is specifically:

L _total ＝(1-λ)L _CE (ψ(z _s ),y)+λT ² L _KL (ψ(z _s ,T),ψ(z _t ,T))

in the formula, L _total Is the total loss; l is _CE () Is thatA cross entropy loss function; l is _KL () Is the KL divergence loss function; ψ () is a soft objective function; z is a radical of formula _s And z _t The classification probabilities of the classes output by the student model and the teacher model respectively; t is the temperature coefficient, lambda is the distillation coefficient, and y is the classification label;

the soft objective function is specifically:

in the formula, q _i Is a soft target output of a function, z _i And the classification probability result is output by the student model or the teacher model.

6. A flower recognition and classification device, comprising:

the preprocessing unit is used for preprocessing the flower image;

the model classification processing unit is used for inputting the preprocessed flower images into a preset flower recognition model for recognition and classification so as to obtain a classification result, wherein the flower recognition model is a machine learning model based on a Transformer structure, and specifically consists of a linear mapping layer, a plurality of Conv-Trans modules, a plurality of ResMLP modules and a classifier;

7. A flower recognition and classification device according to claim 6, wherein the formula of the convolution process is specifically:

8. A flower recognition and classification device according to claim 6, wherein the ResMLP module has a formula definition of Yi = X _i +W ₃ ·σ·(W4·LayerNorm(X) _i )

i＝1,2,3,…,n

9. The flower recognition and classification device according to claim 6, wherein the knowledge distillation training mode is a soft distillation training mode.

10. A flower recognition and classification device according to claim 9, wherein the objective function of the classifier is specifically:

L _total ＝(1-λ)L _CE (ψ(z _s ),y)+λT ² L _KL (ψ(z _s ,T),ψ(z _t ,T))

the soft objective function is specifically: