CN117058437A

CN117058437A - Flower classification method, system, equipment and medium based on knowledge distillation

Info

Publication number: CN117058437A
Application number: CN202310721513.0A
Authority: CN
Inventors: 苟建平; 辛晓梦; 宋和平; 马忠臣; 陈雯柏; 陈潇君
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2023-06-16
Filing date: 2023-06-16
Publication date: 2023-11-14
Anticipated expiration: 2043-06-16
Also published as: CN117058437B

Abstract

The invention discloses a flower classification method, system, equipment and medium based on knowledge distillation, belongs to the field of artificial intelligence, and aims to solve the technical problem that a knowledge distillation network model in the prior art has poor effect on flower identification and classification. The method adopts neighbor domain relation knowledge distillation to consider a neighbor domain structure as new relation knowledge, calculates a similarity matrix according to logits characteristics output by a teacher network to select neighbor samples, wherein each element in the similarity matrix generally represents similarity of characteristic representation between two samples, calculates neighbor domain characteristic relation distillation loss and neighbor neighborhood logits relation distillation loss by using the neighbor samples, introduces the neighbor domain characteristic relation distillation loss and the neighbor neighborhood logits relation distillation loss into a total loss function of training a student network model, and completes better knowledge transfer by establishing knowledge distillation neighbor domain structure knowledge, thereby improving the effect of flower identification and classification.

Description

Flower classification method, system, equipment and medium based on knowledge distillation

Technical Field

The invention belongs to the technical field of artificial intelligence, relates to flower classification, and in particular relates to a flower classification method, system, equipment and medium based on knowledge distillation.

Background

Deep neural networks have achieved great success in many areas, such as computer vision and natural language processing. As deep neural networks become deeper and wider, the need for computation and memory continues to increase, with the tradeoff between model performance and model size still being an important issue. There is a general need for a compact network that can be easily deployed on these limited computing power and memory edge computing devices.

Knowledge distillation is a simple and efficient model compression method, and is widely applied to different tasks, including visual recognition tasks such as image classification, image retrieval, semantic segmentation, target detection, action prediction and the like; natural language processing tasks such as machine translation. Generally, knowledge distillation works on the principle of transferring knowledge from a large teacher network to a small student network, where the teacher network provides additional supervision. To date, the knowledge for distillation can be divided into three categories: response-based, feature-based, and relationship-based knowledge. Distilling the simulated teacher model prediction results based on the responsive knowledge; feature-based knowledge distillation attempts to enable students to learn intermediate feature representations of the teacher model; the relationship-based approach facilitates network learning of structures between different samples.

The most recent relational distillation method mainly builds knowledge from all randomly selected samples, such as a similarity matrix of small batches of samples, and the local relations of the samples are not well studied. The SP shifted the activated pairwise similarity, RKD penalized structural differences by distance and angle, CRD used contrast loss to make positive samples more similar, and negative samples more different, ICKD passed knowledge of channel relationships.

The invention patent with application number 202210998890.4 discloses a flower identification and classification method and device, which shoots flowers to obtain images or directly selects to obtain images through an album; then preprocessing, and preprocessing the image to obtain a flower object; and finally, obtaining a final classification result from the preprocessed flower images through a flower recognition model. Inputting the preprocessed flower image into a preset flower recognition model for recognition and classification to obtain a classification result, wherein the flower recognition model is a machine learning model based on a Transformer structure, and specifically comprises a linear mapping layer, a plurality of Conv-Trans modules, a plurality of ResMLP modules and a full connection layer; the full-connection layer is constructed based on a student network model obtained by a knowledge distillation training mode. The flower identification model adopts a Transformer architecture design model, utilizes a self-attention mechanism to globally extract features from an image, focuses attention on flower parts and ignores complex backgrounds, thereby realizing accurate extraction of flower features and further realizing accurate classification, and solves the technical problems that the existing classification method adopts a convolution mode to extract local features of the image, is difficult to simultaneously focus on local and global key features, has imperfect feature extraction capability and leads to inaccurate classification.

The invention patent application with application number 202210412189.X also discloses a light flower identification method based on knowledge distillation, which comprises the following steps: s1, constructing a flower data set, and dividing the flower data set into a training set and a testing set; s2, selecting a teacher network and a student network; s3, initializing and training a teacher network to obtain a mature teacher network; s4, initializing a student network; s5, training the initialized student network by using the flower data set with the aid of a teacher network to obtain a mature student neural network; s6, setting a mature student neural network as an eval mode, and not carrying out back propagation; and inputting the flower picture to be identified into a mature student neural network, calculating through forward propagation, and outputting an identification result, so that the flower identification is finished. The recognition method uses an algorithm of knowledge distillation, and utilizes a heavy-weight network to assist in training a light-weight network, so that the loss in accuracy is reduced as much as possible while the model is greatly compressed, and a light-weight flower recognition model with greatly compressed model and higher accuracy is obtained.

As with the above-described invention patents, in the knowledge distillation method of the prior art, a similarity matrix or correlation matrix is mostly constructed among all samples in a small batch, and this complex relationship structure increases the difficulty of student network learning; in addition, the local structure knowledge is very easy to ignore, and rich characteristic representation information cannot be captured; finally, the training knowledge distillation network model has poor effect on flower identification and classification.

Disclosure of Invention

In order to solve the technical problem that the knowledge distillation network model in the prior art has poor effect on flower identification and classification, the invention provides a flower classification method, system, equipment and medium based on knowledge distillation, which are used for completing better knowledge transfer and improving the effect on flower identification and classification by establishing knowledge of a knowledge distillation neighbor domain structure.

In order to solve the technical problems, the invention adopts the following technical scheme:

a flower classification method based on knowledge distillation comprises the following steps:

step S1, obtaining flower image sample data

Obtaining flower image sample data and label data;

s2, constructing a knowledge distillation network model

The knowledge distillation network model comprises a teacher network model and a student network model, wherein the teacher network model and the student network model both adopt convolutional neural networks, and pretraining is carried out on the teacher network model;

step S3, extracting sample characteristics

Respectively inputting the flower image sample data into a pre-trained teacher network model and an untrained student network model, wherein a convolution layer of the teacher network model outputs a first middle feature map, a full-connection layer of the teacher network model outputs a first logits feature, a convolution layer of the student network model outputs a second middle feature map, and a full-connection layer of the student network model outputs a second logits feature;

S4, selecting a neighbor sample

Based on the first logits characteristics, constructing a similarity matrix between the flower image sample data, and selecting K nearest neighbor samples which are the most similar to each flower image sample data according to the similarity matrix;

s5, constructing a neighbor domain relation knowledge distillation loss function

The neighbor domain relation knowledge distillation loss function comprises a neighbor domain characteristic relation distillation loss function and a neighbor domain logits relation distillation loss function;

constructing a neighbor domain feature relation distillation loss function based on the first intermediate feature map and the second intermediate feature map of the K nearest neighbor samples of the selected flower image sample data;

constructing a near neighborhood logits relation distillation loss function based on the first logits characteristic and the second logits characteristic of the K nearest neighbor samples of the selected flower image sample data;

step S6, constructing a total loss function

The total loss function comprises a cross entropy loss function between a prediction result of the student network model and label data, an alignment loss function of softening logits and a neighbor domain relation knowledge distillation loss function;

step S7, training a student network model

Training a student network model by using the total loss function, and back-propagating, and updating parameters of the student network model to obtain a mature student network model;

Step S8, real-time classification of flower images

And acquiring a real-time flower image, inputting the flower image into a student network model, and outputting a classification result by the student network model.

Further, in step S1, the obtained flower image sample data is preprocessed, where the preprocessing includes clipping, scaling, and rotation.

Further, in step S4, when the similarity matrix between the flower image sample data is constructed, the specific manner is as follows:

for sample x _i And x _j The calculation formula of the similarity matrix is as follows:

wherein,<·，·>the dot product operation is represented by a graph, I.I ₂ Represents the standardization of the L2 and,representing sample x _i Logits feature output after inputting teacher network model, < ->Representing sample x _j And inputting the logits characteristics output by the teacher network model.

Further, in step S5, a neighbor domain feature relation distillation loss function L is constructed _NFR The specific mode is as follows:

step S5-1-1, calculating spatial similarity

For sample x _i And its neighbor sample x _j Calculating spatial similarity for the teacher network model and the student network model respectively;

spatial similarity of teacher network modelThe method comprises the following steps:

spatial similarity of student network modelThe method comprises the following steps:

wherein,representing teacher network model for sample x _i Output first intermediate feature map, +.>Representing a student network model for sample x _i Outputting a second intermediate feature map;

s5-1-2, constructing the feature similarity of the neighboring domain

The mixture is subjected to pooling, I.I ₂ Representation of L2 normalization for sample x _i And its neighbor sample x _j Respectively constructing neighbor domain feature similarity for a teacher network model and a student network model;

neighbor domain feature similarity of teacher network modelThe method comprises the following steps:

wherein,sample x representing a first intermediate feature map output according to a teacher network model _i And its first neighbor sample spatial similarity, K represents the number of neighbor samples;

neighbor domain feature similarity of student network modelThe method comprises the following steps:

wherein,sample x representing a first intermediate feature map output from a student network model _i And its first neighbor sample spatial similarity, K represents the number of neighbor samples;

s5-1-3, constructing a distillation loss function L of a near neighborhood characteristic relation _NFR

Neighbor domain feature relation distillation loss function L _NFR The method comprises the following steps:

wherein N represents the number of samples, K represents the number of neighbor samples,representing the spatial similarity of the teacher's network model,representing the spatial similarity of the student network model.

Further, in step S5, a neighbor domain logits relationship distillation loss function L is constructed _NLR The specific mode is as follows:

step S5-2-1, constructing sample x _i And its neighbor sample x _j Similarity between the two, and converting the similarity into a similarity distribution by using a softmax function, wherein the similarity distribution rho _ij Expressed as:

ρ _ij ＝softmax(z _i -z _i )

ρ _ij ∈R ^1×M

wherein z is _i Representing a network model versus sample x _i The output logits features, M represents the number of categories;

step S5-2-2 for sample x _i And (3) respectively constructing logits similarity distribution for the teacher network model and the student network model:

logets similarity distribution of teacher network modelExpressed as:

logets similarity distribution for student network modelExpressed as:

wherein,representing a sample x calculated from a first logits feature output by a teacher network model _i And the similarity distribution of the logits features of its first neighbor sample,/for the first neighbor sample>Representing a sample x calculated from a second logits feature output by the student network model _i And the logits feature similarity distribution of the first neighbor sample, wherein K represents the number of the neighbor samples;

s5-2-3, constructing a near neighborhood logits relation distillation loss function L _NLB

Neighbor domain feature relation distillation loss function L _NLR The method comprises the following steps:

wherein N represents the number of samples and K represents the neighborThe number of samples, JS (·, ·) represents JS divergence, Logits feature similarity distribution representing teacher network model, < >>And the logits characteristic similarity distribution of the student network model is represented.

Further, in step S6, when constructing the total loss function, the total loss function L _total The method comprises the following steps:

L _totol ＝L _c E+L _KL +αL _NFR +βL _NLR

wherein a and β represent the hyper-parameters of the balance loss term, L _CE Cross entropy loss function representing prediction result of student network model and label data, L _RL An alignment loss function, L, representing softening logs _NFR Distillation loss function representing characteristic relation of neighboring region, L _NLR Representing neighbor domain logits relation distillation loss functions;

cross entropy loss function L _CR The method comprises the following steps:

L _CE ＝H(z ^S ，y _trau )

wherein H (·, ·) represents cross entropy loss, z ^S A second logits feature, y, representing the output of the student network model _true Tag data representing the sample;

alignment loss function L _KL The method comprises the following steps:

L _KL ＝KL(softmax(z ^S )，softmax(z ^T ))

wherein KL (·, ·) is the KL function, softmax (·) represents the softmax function, z ^T Representing a first logits feature, z, of the teacher network model output ^s A second logits feature representing the student network model output;

further, in step S7, when updating parameters of the student network model, the specific manner is as follows:

training a student network model using step total loss function, back propagation computationGradient and update network parameters, parameter θ _s The method comprises the following steps:

wherein θ _S Representing parameters of the student network model, gamma representing learning rate,representing the total loss function L _tatcl Gradients of student network parameters are calculated.

A knowledge distillation based flower classification system comprising:

the flower image sample data acquisition module is used for acquiring flower image sample data and tag data;

the knowledge distillation network model construction module is used for constructing a knowledge distillation network model, wherein the knowledge distillation network model comprises a teacher network model and a student network model, the teacher network model and the student network model both adopt convolutional neural networks, and the teacher network model is pre-trained;

the sample feature extraction module is used for respectively inputting the flower image sample data into a pre-trained teacher network model and an untrained student network model, wherein a convolution layer of the teacher network model outputs a first middle feature map, a full-connection layer of the teacher network model outputs first logits features, a convolution layer of the student network model outputs a second middle feature map, and a full-connection layer of the student network model outputs second logits features;

the neighbor sample selection module is used for constructing a similarity matrix between the flower image sample data based on the first logits characteristics, and selecting K nearest neighbor samples which are the most similar to each flower image sample data according to the similarity matrix;

The neighbor domain relation construction module is used for constructing a neighbor domain relation knowledge distillation loss function, wherein the neighbor domain relation knowledge distillation loss function comprises a neighbor domain characteristic relation distillation loss function and a neighbor domain logits relation distillation loss function;

the total loss function construction module is used for constructing a total loss function, wherein the total loss function comprises a cross entropy loss function between a prediction result of the student network model and label data, an alignment loss function of softening logits and a neighbor domain relation knowledge distillation loss function;

the student network model training module is used for training the student network model by using the total loss function, and back-propagating, and updating parameters of the student network model to obtain a mature student network model;

the flower image real-time classification module is used for acquiring real-time flower images, inputting the flower images into the student network model, and outputting classification results by the student network model.

A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the method described above.

A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method described above.

Compared with the prior art, the invention has the beneficial effects that:

in the invention, neighbor domain relation knowledge distillation is adopted to consider a neighbor domain structure as new relation knowledge, a neighbor sample is selected after a similarity matrix is calculated by a logits feature output by a network, each element in the similarity matrix generally represents similarity of feature representation between two samples, neighbor domain feature relation distillation loss and neighbor neighborhood logits relation distillation loss are calculated by using the neighbor samples and are introduced into a total loss function for training a student network model, and better knowledge transfer is completed by establishing neighbor domain structure knowledge of knowledge distillation, so that the effect of identifying and classifying flowers is improved.

Drawings

Fig. 1 is a schematic flow chart of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. Embodiments of the present invention include, but are not limited to, the following examples.

Example 1

The present embodiment provides a flower classification method based on knowledge distillation, as shown in fig. 1, which includes the following steps:

step S1, obtaining flower image sample data

And obtaining flower image sample data and label data.

In addition, the acquired flower image sample data is preprocessed to amplify the sample data.

Preprocessing includes operations such as cropping, scaling, rotation, etc.

S2, constructing a knowledge distillation network model

The knowledge distillation network model comprises a teacher network model and a student network model, wherein the teacher network model and the student network model both adopt convolutional neural networks, and the teacher network model is pre-trained.

The convolutional neural network in the teacher network model and the student network model comprises a convolutional layer, a batch normalization layer, a ReLU layer, a pooling layer and a full connection layer.

Step S3, extracting sample characteristics

The flower image sample data are respectively input into a pre-trained teacher network model and an untrained student network model, a convolution layer of the teacher network model outputs a first middle feature map, a full-connection layer of the teacher network model outputs a first logits feature, a convolution layer of the student network model outputs a second middle feature map, and a full-connection layer of the student network model outputs a second logits feature.

S4, selecting a neighbor sample

Based on the first logits characteristics, a similarity matrix between the flower image sample data is constructed, and K nearest neighbor samples of each flower image sample data are selected according to the similarity matrix.

When the similarity matrix between the flower image sample data is constructed, a neighbor sample is selected by constructing a similarity matrix D, and K nearest neighbors are selected according to the similarity matrix D in each small batch. The specific method is as follows:

wherein,<·，·>the dot product operation is represented by a graph, I.I ₂ Represents the standardization of the L2 and,representing sample x _i Inputting logits output obtained by teacher network model, < >>Representing sample x _j And inputting logits output obtained by the teacher network model.

By calculating the cosine values of the two samples, the larger the cosine value, the more similar the two samples are. The present embodiment selects, for each sample, the sample corresponding to the K largest values in the similarity matrix D as its K nearest neighbors.

The neighbor domain relationship knowledge distillation loss function comprises a neighbor domain feature relationship distillation loss function and a neighbor domain logits relationship distillation loss function:

a near neighborhood logits relationship distillation loss function is constructed based on the first logits features and the second logits features of the K nearest neighbor samples of the selected each flower image sample data.

Based on the selected neighbor samples, the embodiment respectively constructs a neighbor domain feature relation and a neighbor domain logits relation through intermediate features, calculates a neighbor domain feature relation distillation loss function and a neighbor domain logits distillation loss function, and forms a neighbor domain feature relation distillation loss function and a neighbor domain logits distillation loss function to form a neighbor domain feature relation knowledge distillation loss function. Therefore, step S5 may be divided into two parts, namely, a neighbor domain feature relation distillation loss function and a neighbor domain logits distillation loss function.

Constructing a neighbor domain feature relation distillation loss function L _NFR The specific mode is as follows:

step S5-1-l, calculating spatial similarity

The present embodiment makes full use of the information in the feature map for sample x _i And its neighbor sample x _j Calculating spatial similarity for the teacher network model and the student network model respectively;

wherein,representing teacher network model for sample x _i Output first intermediate feature map, +.>Representing a student network model for sample x _i And outputting a second intermediate feature map.

S5-1-2, constructing the feature similarity of the neighboring domain

The embodiment also evaluates the similarity between each pixel point of the sample and the pixel points of the neighboring samples, for sample x _i And its neighbor sample x _j Respectively constructing neighbor domain feature similarity for a teacher network model and a student network model;

wherein,sample x representing a first intermediate feature map output according to a teacher network model _i Spatial similarity to its first neighbor sample,/->Sample x representing a first intermediate feature map output according to a teacher network model _i And its K-th neighbor sample spatial similarity, K represents the number of neighbor samples;

wherein,sample x representing a first intermediate feature map output from a student network model _i Spatial similarity to its first neighbor sample,/- >Sample x representing a first intermediate feature map output from a student network model _i And its K-th neighbor sample spatial similarity, K represents the number of neighbor samples;

if the spatial resolution is different between the intermediate feature maps of the teacher network model and the student network model, an adaptive pool is used to align the resolutions.

The probability of each class is obtained by softmax normalizing the logits, so the logits reflect the prediction result of the network model on the image classification space. The logarithm of the different samples may also reflect the new judgment of the network model on its similarity in the classification space. For this purpose, the present embodiment is provided with a neighbor domain logits relation distillation loss function L _NLR 。

Constructing neighbor domain logitsRelation distillation loss function L _NLR The specific mode is as follows:

in step S5-2-1, the present embodiment uses the row of similarity distribution to represent the similarity of the logits features, namely: construction sample x _i And its neighbor sample x _j Similarity between the two, and converting the similarity into a similarity distribution by using a softmax function, wherein the similarity distribution rho _ij Expressed as:

ρ _ij ＝softmax(z _i -z _i )

ρ _ij ∈R ^1×M

wherein z is _i Representing a network model versus sample x _i The output logits features, M, represent the number of categories.

logets similarity distribution of teacher network modelExpressed as:

logets similarity distribution for student network modelExpressed as:

wherein, the method comprises the steps of,representing a sample x calculated from a first logits feature output by a teacher network model _i And the similarity distribution of the logits features of its first neighbor sample,/for the first neighbor sample>Representing a sample x calculated from a second logits feature output by the student network model _i And the logits feature similarity distribution of the first neighbor sample, wherein K represents the number of the neighbor samples;

the goal of this embodiment is not to let the network model pair sample x _i And x _j The same judgment is made, and the consistency of the category similarity between two samples is maintained in the teacher network model and the student network model, so that a neighbor domain logits relation distillation loss function is constructed.

S5-2-3, constructing a near neighborhood logits relation distillation loss function L _NLR

wherein N represents the number of samples, K represents the number of neighbor samples, JS (·, ·) represents JS divergence,representing a sample x calculated from a first logits feature output by a teacher network model _i And the logs feature similarity distribution of its jth neighbor sample. />Representing a sample x calculated from a first logits feature output by a student network model _i And the logs feature similarity distribution of its jth neighbor sample.

This embodiment uses Jenseb-Shannon (JS) divergence to measure differences in similarity distribution, unlike KL divergence, which is symmetrical. This embodiment desirably has a consistent determination of the structure of the two distributions so that no different values are produced due to variations in the target distribution.

Step S6, constructing a total loss function

The total loss function includes a cross entropy loss function between the prediction result of the student network model and the label data, an alignment loss function of softening logits, and a neighbor domain relationship knowledge distillation loss function.

The knowledge distillation of the neighbor relation is applied to different layers to reduce the difference of the neighbor relation of the teacher and the student. For this reason, the present embodiment builds the total loss function L when training the student network model _total Can be expressed as:

L _total ＝L _cE +L _KL +αL _NFB +βL _NLR

wherein α and β represent the hyper-parameters of the balance loss term, L _CE Cross entropy loss function representing prediction result of student network model and label data, L _KL An alignment loss function, L, representing softening logs _NFB Distillation loss function representing characteristic relation of neighboring region, L _NLB Representing neighbor domain logits relationship distillation loss function.

Cross entropy loss function L _CE The method comprises the following steps:

L _CE ＝H(z ^s ，y _true )

wherein H (·, ·) represents cross entropy loss, z ^s Representing a second logits feature representing the output of the student network model, y _true Tag data representing the sample.

Soft probability alignment loss function L between student network model and teacher network model _KL The method comprises the following steps:

L _KL ＝KL(softmax(z ^S )，softmax(z ^T ))

step S7, training a student network model

And training the student network model by using the total loss function, and back-propagating, and updating parameters of the student network model to obtain a mature student network model.

When updating parameters of the student network model, the specific modes are as follows:

training a student network model using the step total loss function, back-propagating the calculated gradient, and updating the network parameters, parameter θ _s The method comprises the following steps:

wherein θ _S Representing parameters of the student network model, gamma representing learning rate,representing the total loss function L _tatal Gradients of student network parameters are calculated.

Step S8, real-time classification of flower images

Example 2

This embodiment provides a flower classification system based on knowledge distillation, as shown in fig. 1, which includes:

the flower image sample data acquisition module is used for acquiring flower image sample data and tag data.

Preprocessing includes operations such as cropping, scaling, rotation, etc.

The knowledge distillation network model construction module is used for constructing a knowledge distillation network model, wherein the knowledge distillation network model comprises a teacher network model and a student network model, the teacher network model and the student network model both adopt convolutional neural networks, and the teacher network model is pre-trained.

The sample feature extraction module is used for respectively inputting the flower image sample data into a pre-trained teacher network model and an untrained student network model, wherein a convolution layer of the teacher network model outputs a first middle feature map, a full-connection layer of the teacher network model outputs a first logits feature, a convolution layer of the student network model outputs a second middle feature map, and a full-connection layer of the student network model outputs a second logits feature.

The neighbor sample selection module is used for constructing a similarity matrix between the flower image sample data based on the first logits characteristics, and selecting K most similar neighbor samples of each flower image sample data according to the similarity matrix.

step S5-1-1, calculating spatial similarity

spatial phase of student network modelSimilarity degreeThe method comprises the following steps:

wherein f _i ^T Representing teacher network model for sample x _i The first intermediate feature map that is output is,representing a student network model for sample x _i And outputting a second intermediate feature map.

S5-1-2, constructing the feature similarity of the neighboring domain

wherein,sample x representing a first intermediate feature map output according to a teacher network model _i Spatial similarity to its first neighbor sample,/->Sample x representing a first intermediate feature map output according to a teacher network model _i Spatial similarity to its Kth neighbor sample, K representing nearThe number of neighbor samples;

wherein,sample x representing a first intermediate feature map output from a student network model _i Spatial similarity to its first neighbor sample,/->Sample x representing a first intermediate feature map output from a student network model _i And its K-th neighbor sample spatial similarity, K represents the number of neighbor samples;

wherein N represents the number of samples, K represents the number of neighbor samples,representing the spatial similarity of the teacher's network model.Representing student network modelSpatial similarity of (c) is determined.

Distillation loss function L is constructed by constructing near neighborhood logits relation _NIR The specific mode is as follows:

ρ _ij ＝softmax(z _i -z _f )

ρ _ij ∈R ^1×M

Step S5-2-2 for sample x _i K neighbor samples of the model are respectively used for constructing logits similarity distribution for a teacher network model and a student network model;

logets similarity distribution of teacher network modelExpressed as:

/>

logets similarity distribution for student network modelExpressed as:

wherein N represents the number of samples, K represents the number of neighbor samples, js (·, ·) represents JS divergence,representing a sample x calculated from a first logits feature output by a teacher network model _i And the similarity distribution of the logits features of its jth nearest neighbor sample,/for the sample>Representing a sample x calculated from a first logits feature output by a student network model _i And the logs feature similarity distribution of its jth neighbor sample.

And the total loss function construction module is used for constructing a total loss function, wherein the total loss function comprises a cross entropy loss function between a prediction result of the student network model and label data, an alignment loss function of softening logits and a neighbor domain relation knowledge distillation loss function.

L _total ＝L _CE +L _KL +αL _NFR +βL _NLR

wherein α and β represent the hyper-parameters of the balance loss term, L _CE Cross entropy loss function representing prediction result of student network model and label data, L _KL An alignment loss function, L, representing softening logs _NFR Distillation loss function representing characteristic relation of neighboring region, L _NLR Representing neighbor domain logits relationship distillation loss function.

Cross entropy loss function L _CE The method comprises the following steps:

L _CE ＝H(z ^S ，y _true )

Soft probability alignment loss function L between student network model and teacher network model _RL The method comprises the following steps:

L _KL ＝KL(softmax(z ^s )，softmax(z ^τ ))

wherein KL (·, ·) is the KL function, softmax (·) representssoftmax function, z ^T Representing a first logits feature, z, of the teacher network model output ^s A second logits feature representing the student network model output;

and the student network model training module is used for training the student network model by using the total loss function, and back-propagating, and updating parameters of the student network model to obtain a mature student network model.

wherein θ _S Representing parameters of the student network model, gamma representing learning rate,representing the total loss function L _tarcl Gradients of student network parameters are calculated.

Example 3

A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of a knowledge based distillation flower classification method.

The computer equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or D interface display memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. In other embodiments, the memory may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like. Of course, the memory may also include both internal storage units of the computer device and external storage devices. In this embodiment, the memory is typically used to store an operating system and various application software installed on the computer device, such as program codes of the flower classification method based on knowledge distillation. In addition, the memory may be used to temporarily store various types of data that have been output or are to be output.

The processor may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor is typically used to control the overall operation of the computer device. In this embodiment, the processor is configured to execute the program code stored in the memory or process data, such as the program code of the knowledge based distillation flower classification method.

Example 4

A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of a knowledge distillation based flower classification method.

Wherein the computer-readable storage medium stores an interface display program executable by at least one processor to cause the at least one processor to perform the steps of the knowledge distillation based flower classification method as described above.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server or a network device, etc.) to perform the method for classifying flowers based on knowledge distillation according to the embodiments of the present application.

The above is an embodiment of the present invention. The above embodiments and specific parameters in the embodiments are only for clearly describing the inventive verification process of the inventor, and are not intended to limit the scope of the invention, which is defined by the claims, and all equivalent structural changes made by applying the descriptions and the drawings of the invention are included in the scope of the invention.

Claims

1. The flower classification method based on knowledge distillation is characterized by comprising the following steps:

step S1, obtaining flower image sample data

Obtaining flower image sample data and label data;

s2, constructing a knowledge distillation network model

step S3, extracting sample characteristics

S4, selecting a neighbor sample

step S6, constructing a total loss function

step S7, training a student network model

Step S8, real-time classification of flower images

2. A method for classifying flowers based on knowledge distillation as set forth in claim 1, wherein: in step S1, the obtained flower image sample data is preprocessed, where the preprocessing includes clipping, scaling and rotation.

3. A method for classifying flowers based on knowledge distillation as set forth in claim 1, wherein: in step S4, when the similarity matrix between the flower image sample data is constructed, the specific manner is as follows:

4. A method for classifying flowers based on knowledge distillation as set forth in claim 1, wherein: in step S5, a neighbor domain characteristic relation distillation loss function L is constructed _NFR The specific mode is as follows:

step S5-1-1, calculating spatial similarity

s5-1-2, constructing the feature similarity of the neighboring domain

wherein N represents the number of samples, K represents the number of neighbor samples,representing spatial similarity of teacher network model, +.>Representing the spatial similarity of the student network model.

5. A method for classifying flowers based on knowledge distillation as set forth in claim 1, wherein: in step S5, the loss function L is distilled in the neighbor domain logits relation _NLR The specific mode is as follows:

step S5-2-1, constructing sample x _i And its neighbor sample x _j Similarity between the two, and converting the similarity into a similarity distribution by using a softmax function, wherein the similarity is divided intoCloth ρ _ij Expressed as:

ρ _if ＝softmax(z _i -z _j )

ρ _ij ∈R ^1×M ρ _ij

logets similarity distribution of teacher network modelExpressed as:

logets similarity distribution for student network modelExpressed as:

wherein,representing a sample x calculated from a first logits feature output by a teacher network model _i And the similarity distribution of the logits features of its first neighbor sample,/for the first neighbor sample >Representing a sample x calculated from a second logits feature output by the student network model _i And the logits feature similarity distribution of the first neighbor sample, wherein K represents the number of the neighbor samples;

step S5-2-3, constructing a neighborDomain logits relation distillation loss function L _NLR

wherein N represents the number of samples, K represents the number of neighbor samples, JS (·, ·) represents JS divergence,logits feature similarity distribution representing teacher network model, < >>And the logits characteristic similarity distribution of the student network model is represented.

6. A method for classifying flowers based on knowledge distillation as set forth in claim 1, wherein: in step S6, when constructing the total loss function, the total loss function L _total The method comprises the following steps:

L _total ＝L _CE +L _HL +αL _NFR +βL _NLR

wherein α and β represent the hyper-parameters of the balance loss term, L _CE Cross entropy loss function representing prediction result of student network model and label data, L _KL An alignment loss function, L, representing softening logs _NFR Distillation loss function representing characteristic relation of neighboring region, L _NLR Representing neighbor domain logits relation distillation loss functions;

cross entropy loss function L _CE The method comprises the following steps:

L _CE ＝H(z ^s ，y _true )

alignment loss function L _KL The method comprises the following steps:

L _HL ＝KL(softmax(z ^S )，softmax(z ^T ))

wherein KL (·, ·) is the KL function, softmax (·) represents the softmax function, z ^T Representing a first logits feature, z, of the teacher network model output ^S A second logits feature representing the output of the student network model.

7. A method for classifying flowers based on knowledge distillation as set forth in claim 1, wherein: in step S7, when updating parameters of the student network model, the specific manner is as follows:

wherein θ _S Representing parameters of the student network model, gamma representing learning rate,representing the total loss function L _total Gradients of student network parameters are calculated.

8. A knowledge distillation based flower classification system comprising:

9. A computer device, characterized by: comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the method of any of claims i to 7.

10. A computer-readable storage medium, characterized by: a computer program is stored which, when executed by a processor, causes the processor to perform the steps of the method according to any one of claims i to 7.