CN117058437A - Flower classification method, system, equipment and medium based on knowledge distillation - Google Patents

Flower classification method, system, equipment and medium based on knowledge distillation Download PDF

Info

Publication number
CN117058437A
CN117058437A CN202310721513.0A CN202310721513A CN117058437A CN 117058437 A CN117058437 A CN 117058437A CN 202310721513 A CN202310721513 A CN 202310721513A CN 117058437 A CN117058437 A CN 117058437A
Authority
CN
China
Prior art keywords
network model
neighbor
loss function
logits
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310721513.0A
Other languages
Chinese (zh)
Other versions
CN117058437B (en
Inventor
苟建平
辛晓梦
宋和平
马忠臣
陈雯柏
陈潇君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN202310721513.0A priority Critical patent/CN117058437B/en
Publication of CN117058437A publication Critical patent/CN117058437A/en
Application granted granted Critical
Publication of CN117058437B publication Critical patent/CN117058437B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures

Abstract

The invention discloses a flower classification method, system, equipment and medium based on knowledge distillation, belongs to the field of artificial intelligence, and aims to solve the technical problem that a knowledge distillation network model in the prior art has poor effect on flower identification and classification. The method adopts neighbor domain relation knowledge distillation to consider a neighbor domain structure as new relation knowledge, calculates a similarity matrix according to logits characteristics output by a teacher network to select neighbor samples, wherein each element in the similarity matrix generally represents similarity of characteristic representation between two samples, calculates neighbor domain characteristic relation distillation loss and neighbor neighborhood logits relation distillation loss by using the neighbor samples, introduces the neighbor domain characteristic relation distillation loss and the neighbor neighborhood logits relation distillation loss into a total loss function of training a student network model, and completes better knowledge transfer by establishing knowledge distillation neighbor domain structure knowledge, thereby improving the effect of flower identification and classification.

Description

Flower classification method, system, equipment and medium based on knowledge distillation
Technical Field
The invention belongs to the technical field of artificial intelligence, relates to flower classification, and in particular relates to a flower classification method, system, equipment and medium based on knowledge distillation.
Background
Deep neural networks have achieved great success in many areas, such as computer vision and natural language processing. As deep neural networks become deeper and wider, the need for computation and memory continues to increase, with the tradeoff between model performance and model size still being an important issue. There is a general need for a compact network that can be easily deployed on these limited computing power and memory edge computing devices.
Knowledge distillation is a simple and efficient model compression method, and is widely applied to different tasks, including visual recognition tasks such as image classification, image retrieval, semantic segmentation, target detection, action prediction and the like; natural language processing tasks such as machine translation. Generally, knowledge distillation works on the principle of transferring knowledge from a large teacher network to a small student network, where the teacher network provides additional supervision. To date, the knowledge for distillation can be divided into three categories: response-based, feature-based, and relationship-based knowledge. Distilling the simulated teacher model prediction results based on the responsive knowledge; feature-based knowledge distillation attempts to enable students to learn intermediate feature representations of the teacher model; the relationship-based approach facilitates network learning of structures between different samples.
The most recent relational distillation method mainly builds knowledge from all randomly selected samples, such as a similarity matrix of small batches of samples, and the local relations of the samples are not well studied. The SP shifted the activated pairwise similarity, RKD penalized structural differences by distance and angle, CRD used contrast loss to make positive samples more similar, and negative samples more different, ICKD passed knowledge of channel relationships.
The invention patent with application number 202210998890.4 discloses a flower identification and classification method and device, which shoots flowers to obtain images or directly selects to obtain images through an album; then preprocessing, and preprocessing the image to obtain a flower object; and finally, obtaining a final classification result from the preprocessed flower images through a flower recognition model. Inputting the preprocessed flower image into a preset flower recognition model for recognition and classification to obtain a classification result, wherein the flower recognition model is a machine learning model based on a Transformer structure, and specifically comprises a linear mapping layer, a plurality of Conv-Trans modules, a plurality of ResMLP modules and a full connection layer; the full-connection layer is constructed based on a student network model obtained by a knowledge distillation training mode. The flower identification model adopts a Transformer architecture design model, utilizes a self-attention mechanism to globally extract features from an image, focuses attention on flower parts and ignores complex backgrounds, thereby realizing accurate extraction of flower features and further realizing accurate classification, and solves the technical problems that the existing classification method adopts a convolution mode to extract local features of the image, is difficult to simultaneously focus on local and global key features, has imperfect feature extraction capability and leads to inaccurate classification.
The invention patent application with application number 202210412189.X also discloses a light flower identification method based on knowledge distillation, which comprises the following steps: s1, constructing a flower data set, and dividing the flower data set into a training set and a testing set; s2, selecting a teacher network and a student network; s3, initializing and training a teacher network to obtain a mature teacher network; s4, initializing a student network; s5, training the initialized student network by using the flower data set with the aid of a teacher network to obtain a mature student neural network; s6, setting a mature student neural network as an eval mode, and not carrying out back propagation; and inputting the flower picture to be identified into a mature student neural network, calculating through forward propagation, and outputting an identification result, so that the flower identification is finished. The recognition method uses an algorithm of knowledge distillation, and utilizes a heavy-weight network to assist in training a light-weight network, so that the loss in accuracy is reduced as much as possible while the model is greatly compressed, and a light-weight flower recognition model with greatly compressed model and higher accuracy is obtained.
As with the above-described invention patents, in the knowledge distillation method of the prior art, a similarity matrix or correlation matrix is mostly constructed among all samples in a small batch, and this complex relationship structure increases the difficulty of student network learning; in addition, the local structure knowledge is very easy to ignore, and rich characteristic representation information cannot be captured; finally, the training knowledge distillation network model has poor effect on flower identification and classification.
Disclosure of Invention
In order to solve the technical problem that the knowledge distillation network model in the prior art has poor effect on flower identification and classification, the invention provides a flower classification method, system, equipment and medium based on knowledge distillation, which are used for completing better knowledge transfer and improving the effect on flower identification and classification by establishing knowledge of a knowledge distillation neighbor domain structure.
In order to solve the technical problems, the invention adopts the following technical scheme:
a flower classification method based on knowledge distillation comprises the following steps:
step S1, obtaining flower image sample data
Obtaining flower image sample data and label data;
s2, constructing a knowledge distillation network model
The knowledge distillation network model comprises a teacher network model and a student network model, wherein the teacher network model and the student network model both adopt convolutional neural networks, and pretraining is carried out on the teacher network model;
step S3, extracting sample characteristics
Respectively inputting the flower image sample data into a pre-trained teacher network model and an untrained student network model, wherein a convolution layer of the teacher network model outputs a first middle feature map, a full-connection layer of the teacher network model outputs a first logits feature, a convolution layer of the student network model outputs a second middle feature map, and a full-connection layer of the student network model outputs a second logits feature;
S4, selecting a neighbor sample
Based on the first logits characteristics, constructing a similarity matrix between the flower image sample data, and selecting K nearest neighbor samples which are the most similar to each flower image sample data according to the similarity matrix;
s5, constructing a neighbor domain relation knowledge distillation loss function
The neighbor domain relation knowledge distillation loss function comprises a neighbor domain characteristic relation distillation loss function and a neighbor domain logits relation distillation loss function;
constructing a neighbor domain feature relation distillation loss function based on the first intermediate feature map and the second intermediate feature map of the K nearest neighbor samples of the selected flower image sample data;
constructing a near neighborhood logits relation distillation loss function based on the first logits characteristic and the second logits characteristic of the K nearest neighbor samples of the selected flower image sample data;
step S6, constructing a total loss function
The total loss function comprises a cross entropy loss function between a prediction result of the student network model and label data, an alignment loss function of softening logits and a neighbor domain relation knowledge distillation loss function;
step S7, training a student network model
Training a student network model by using the total loss function, and back-propagating, and updating parameters of the student network model to obtain a mature student network model;
Step S8, real-time classification of flower images
And acquiring a real-time flower image, inputting the flower image into a student network model, and outputting a classification result by the student network model.
Further, in step S1, the obtained flower image sample data is preprocessed, where the preprocessing includes clipping, scaling, and rotation.
Further, in step S4, when the similarity matrix between the flower image sample data is constructed, the specific manner is as follows:
for sample x i And x j The calculation formula of the similarity matrix is as follows:
wherein,<·,·>the dot product operation is represented by a graph, I.I 2 Represents the standardization of the L2 and,representing sample x i Logits feature output after inputting teacher network model, < ->Representing sample x j And inputting the logits characteristics output by the teacher network model.
Further, in step S5, a neighbor domain feature relation distillation loss function L is constructed NFR The specific mode is as follows:
step S5-1-1, calculating spatial similarity
For sample x i And its neighbor sample x j Calculating spatial similarity for the teacher network model and the student network model respectively;
spatial similarity of teacher network modelThe method comprises the following steps:
spatial similarity of student network modelThe method comprises the following steps:
wherein,representing teacher network model for sample x i Output first intermediate feature map, +.>Representing a student network model for sample x i Outputting a second intermediate feature map;
s5-1-2, constructing the feature similarity of the neighboring domain
The mixture is subjected to pooling, I.I 2 Representation of L2 normalization for sample x i And its neighbor sample x j Respectively constructing neighbor domain feature similarity for a teacher network model and a student network model;
neighbor domain feature similarity of teacher network modelThe method comprises the following steps:
wherein,sample x representing a first intermediate feature map output according to a teacher network model i And its first neighbor sample spatial similarity, K represents the number of neighbor samples;
neighbor domain feature similarity of student network modelThe method comprises the following steps:
wherein,sample x representing a first intermediate feature map output from a student network model i And its first neighbor sample spatial similarity, K represents the number of neighbor samples;
s5-1-3, constructing a distillation loss function L of a near neighborhood characteristic relation NFR
Neighbor domain feature relation distillation loss function L NFR The method comprises the following steps:
wherein N represents the number of samples, K represents the number of neighbor samples,representing the spatial similarity of the teacher's network model,representing the spatial similarity of the student network model.
Further, in step S5, a neighbor domain logits relationship distillation loss function L is constructed NLR The specific mode is as follows:
step S5-2-1, constructing sample x i And its neighbor sample x j Similarity between the two, and converting the similarity into a similarity distribution by using a softmax function, wherein the similarity distribution rho ij Expressed as:
ρ ij =softmax(z i -z i )
ρ ij ∈R 1×M
wherein z is i Representing a network model versus sample x i The output logits features, M represents the number of categories;
step S5-2-2 for sample x i And (3) respectively constructing logits similarity distribution for the teacher network model and the student network model:
logets similarity distribution of teacher network modelExpressed as:
logets similarity distribution for student network modelExpressed as:
wherein,representing a sample x calculated from a first logits feature output by a teacher network model i And the similarity distribution of the logits features of its first neighbor sample,/for the first neighbor sample>Representing a sample x calculated from a second logits feature output by the student network model i And the logits feature similarity distribution of the first neighbor sample, wherein K represents the number of the neighbor samples;
s5-2-3, constructing a near neighborhood logits relation distillation loss function L NLB
Neighbor domain feature relation distillation loss function L NLR The method comprises the following steps:
wherein N represents the number of samples and K represents the neighborThe number of samples, JS (·, ·) represents JS divergence, Logits feature similarity distribution representing teacher network model, < >>And the logits characteristic similarity distribution of the student network model is represented.
Further, in step S6, when constructing the total loss function, the total loss function L total The method comprises the following steps:
L totol =L c E+L KL +αL NFR +βL NLR
wherein a and β represent the hyper-parameters of the balance loss term, L CE Cross entropy loss function representing prediction result of student network model and label data, L RL An alignment loss function, L, representing softening logs NFR Distillation loss function representing characteristic relation of neighboring region, L NLR Representing neighbor domain logits relation distillation loss functions;
cross entropy loss function L CR The method comprises the following steps:
L CE =H(z S ,y trau )
wherein H (·, ·) represents cross entropy loss, z S A second logits feature, y, representing the output of the student network model true Tag data representing the sample;
alignment loss function L KL The method comprises the following steps:
L KL =KL(softmax(z S ),softmax(z T ))
wherein KL (·, ·) is the KL function, softmax (·) represents the softmax function, z T Representing a first logits feature, z, of the teacher network model output s A second logits feature representing the student network model output;
further, in step S7, when updating parameters of the student network model, the specific manner is as follows:
training a student network model using step total loss function, back propagation computationGradient and update network parameters, parameter θ s The method comprises the following steps:
wherein θ S Representing parameters of the student network model, gamma representing learning rate,representing the total loss function L tatcl Gradients of student network parameters are calculated.
A knowledge distillation based flower classification system comprising:
the flower image sample data acquisition module is used for acquiring flower image sample data and tag data;
the knowledge distillation network model construction module is used for constructing a knowledge distillation network model, wherein the knowledge distillation network model comprises a teacher network model and a student network model, the teacher network model and the student network model both adopt convolutional neural networks, and the teacher network model is pre-trained;
the sample feature extraction module is used for respectively inputting the flower image sample data into a pre-trained teacher network model and an untrained student network model, wherein a convolution layer of the teacher network model outputs a first middle feature map, a full-connection layer of the teacher network model outputs first logits features, a convolution layer of the student network model outputs a second middle feature map, and a full-connection layer of the student network model outputs second logits features;
the neighbor sample selection module is used for constructing a similarity matrix between the flower image sample data based on the first logits characteristics, and selecting K nearest neighbor samples which are the most similar to each flower image sample data according to the similarity matrix;
The neighbor domain relation construction module is used for constructing a neighbor domain relation knowledge distillation loss function, wherein the neighbor domain relation knowledge distillation loss function comprises a neighbor domain characteristic relation distillation loss function and a neighbor domain logits relation distillation loss function;
constructing a neighbor domain feature relation distillation loss function based on the first intermediate feature map and the second intermediate feature map of the K nearest neighbor samples of the selected flower image sample data;
constructing a near neighborhood logits relation distillation loss function based on the first logits characteristic and the second logits characteristic of the K nearest neighbor samples of the selected flower image sample data;
the total loss function construction module is used for constructing a total loss function, wherein the total loss function comprises a cross entropy loss function between a prediction result of the student network model and label data, an alignment loss function of softening logits and a neighbor domain relation knowledge distillation loss function;
the student network model training module is used for training the student network model by using the total loss function, and back-propagating, and updating parameters of the student network model to obtain a mature student network model;
the flower image real-time classification module is used for acquiring real-time flower images, inputting the flower images into the student network model, and outputting classification results by the student network model.
A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the method described above.
A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method described above.
Compared with the prior art, the invention has the beneficial effects that:
in the invention, neighbor domain relation knowledge distillation is adopted to consider a neighbor domain structure as new relation knowledge, a neighbor sample is selected after a similarity matrix is calculated by a logits feature output by a network, each element in the similarity matrix generally represents similarity of feature representation between two samples, neighbor domain feature relation distillation loss and neighbor neighborhood logits relation distillation loss are calculated by using the neighbor samples and are introduced into a total loss function for training a student network model, and better knowledge transfer is completed by establishing neighbor domain structure knowledge of knowledge distillation, so that the effect of identifying and classifying flowers is improved.
Drawings
Fig. 1 is a schematic flow chart of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. Embodiments of the present invention include, but are not limited to, the following examples.
Example 1
The present embodiment provides a flower classification method based on knowledge distillation, as shown in fig. 1, which includes the following steps:
step S1, obtaining flower image sample data
And obtaining flower image sample data and label data.
In addition, the acquired flower image sample data is preprocessed to amplify the sample data.
Preprocessing includes operations such as cropping, scaling, rotation, etc.
S2, constructing a knowledge distillation network model
The knowledge distillation network model comprises a teacher network model and a student network model, wherein the teacher network model and the student network model both adopt convolutional neural networks, and the teacher network model is pre-trained.
The convolutional neural network in the teacher network model and the student network model comprises a convolutional layer, a batch normalization layer, a ReLU layer, a pooling layer and a full connection layer.
Step S3, extracting sample characteristics
The flower image sample data are respectively input into a pre-trained teacher network model and an untrained student network model, a convolution layer of the teacher network model outputs a first middle feature map, a full-connection layer of the teacher network model outputs a first logits feature, a convolution layer of the student network model outputs a second middle feature map, and a full-connection layer of the student network model outputs a second logits feature.
S4, selecting a neighbor sample
Based on the first logits characteristics, a similarity matrix between the flower image sample data is constructed, and K nearest neighbor samples of each flower image sample data are selected according to the similarity matrix.
When the similarity matrix between the flower image sample data is constructed, a neighbor sample is selected by constructing a similarity matrix D, and K nearest neighbors are selected according to the similarity matrix D in each small batch. The specific method is as follows:
for sample x i And x j The calculation formula of the similarity matrix is as follows:
wherein,<·,·>the dot product operation is represented by a graph, I.I 2 Represents the standardization of the L2 and,representing sample x i Inputting logits output obtained by teacher network model, < >>Representing sample x j And inputting logits output obtained by the teacher network model.
By calculating the cosine values of the two samples, the larger the cosine value, the more similar the two samples are. The present embodiment selects, for each sample, the sample corresponding to the K largest values in the similarity matrix D as its K nearest neighbors.
S5, constructing a neighbor domain relation knowledge distillation loss function
The neighbor domain relationship knowledge distillation loss function comprises a neighbor domain feature relationship distillation loss function and a neighbor domain logits relationship distillation loss function:
Constructing a neighbor domain feature relation distillation loss function based on the first intermediate feature map and the second intermediate feature map of the K nearest neighbor samples of the selected flower image sample data;
a near neighborhood logits relationship distillation loss function is constructed based on the first logits features and the second logits features of the K nearest neighbor samples of the selected each flower image sample data.
Based on the selected neighbor samples, the embodiment respectively constructs a neighbor domain feature relation and a neighbor domain logits relation through intermediate features, calculates a neighbor domain feature relation distillation loss function and a neighbor domain logits distillation loss function, and forms a neighbor domain feature relation distillation loss function and a neighbor domain logits distillation loss function to form a neighbor domain feature relation knowledge distillation loss function. Therefore, step S5 may be divided into two parts, namely, a neighbor domain feature relation distillation loss function and a neighbor domain logits distillation loss function.
Constructing a neighbor domain feature relation distillation loss function L NFR The specific mode is as follows:
step S5-1-l, calculating spatial similarity
The present embodiment makes full use of the information in the feature map for sample x i And its neighbor sample x j Calculating spatial similarity for the teacher network model and the student network model respectively;
Spatial similarity of teacher network modelThe method comprises the following steps:
spatial similarity of student network modelThe method comprises the following steps:
wherein,representing teacher network model for sample x i Output first intermediate feature map, +.>Representing a student network model for sample x i And outputting a second intermediate feature map.
S5-1-2, constructing the feature similarity of the neighboring domain
The embodiment also evaluates the similarity between each pixel point of the sample and the pixel points of the neighboring samples, for sample x i And its neighbor sample x j Respectively constructing neighbor domain feature similarity for a teacher network model and a student network model;
neighbor domain feature similarity of teacher network modelThe method comprises the following steps:
wherein,sample x representing a first intermediate feature map output according to a teacher network model i Spatial similarity to its first neighbor sample,/->Sample x representing a first intermediate feature map output according to a teacher network model i And its K-th neighbor sample spatial similarity, K represents the number of neighbor samples;
neighbor domain feature similarity of student network modelThe method comprises the following steps:
wherein,sample x representing a first intermediate feature map output from a student network model i Spatial similarity to its first neighbor sample,/- >Sample x representing a first intermediate feature map output from a student network model i And its K-th neighbor sample spatial similarity, K represents the number of neighbor samples;
if the spatial resolution is different between the intermediate feature maps of the teacher network model and the student network model, an adaptive pool is used to align the resolutions.
S5-1-3, constructing a distillation loss function L of a near neighborhood characteristic relation NFR
Neighbor domain feature relation distillation loss function L NFR The method comprises the following steps:
wherein N represents the number of samples, K represents the number of neighbor samples,representing the spatial similarity of the teacher's network model,representing the spatial similarity of the student network model.
The probability of each class is obtained by softmax normalizing the logits, so the logits reflect the prediction result of the network model on the image classification space. The logarithm of the different samples may also reflect the new judgment of the network model on its similarity in the classification space. For this purpose, the present embodiment is provided with a neighbor domain logits relation distillation loss function L NLR
Constructing neighbor domain logitsRelation distillation loss function L NLR The specific mode is as follows:
in step S5-2-1, the present embodiment uses the row of similarity distribution to represent the similarity of the logits features, namely: construction sample x i And its neighbor sample x j Similarity between the two, and converting the similarity into a similarity distribution by using a softmax function, wherein the similarity distribution rho ij Expressed as:
ρ ij =softmax(z i -z i )
ρ ij ∈R 1×M
wherein z is i Representing a network model versus sample x i The output logits features, M, represent the number of categories.
Step S5-2-2 for sample x i And (3) respectively constructing logits similarity distribution for the teacher network model and the student network model:
logets similarity distribution of teacher network modelExpressed as:
logets similarity distribution for student network modelExpressed as:
wherein, the method comprises the steps of,representing a sample x calculated from a first logits feature output by a teacher network model i And the similarity distribution of the logits features of its first neighbor sample,/for the first neighbor sample>Representing a sample x calculated from a second logits feature output by the student network model i And the logits feature similarity distribution of the first neighbor sample, wherein K represents the number of the neighbor samples;
the goal of this embodiment is not to let the network model pair sample x i And x j The same judgment is made, and the consistency of the category similarity between two samples is maintained in the teacher network model and the student network model, so that a neighbor domain logits relation distillation loss function is constructed.
S5-2-3, constructing a near neighborhood logits relation distillation loss function L NLR
Neighbor domain feature relation distillation loss function L NLR The method comprises the following steps:
wherein N represents the number of samples, K represents the number of neighbor samples, JS (·, ·) represents JS divergence,representing a sample x calculated from a first logits feature output by a teacher network model i And the logs feature similarity distribution of its jth neighbor sample. />Representing a sample x calculated from a first logits feature output by a student network model i And the logs feature similarity distribution of its jth neighbor sample.
This embodiment uses Jenseb-Shannon (JS) divergence to measure differences in similarity distribution, unlike KL divergence, which is symmetrical. This embodiment desirably has a consistent determination of the structure of the two distributions so that no different values are produced due to variations in the target distribution.
Step S6, constructing a total loss function
The total loss function includes a cross entropy loss function between the prediction result of the student network model and the label data, an alignment loss function of softening logits, and a neighbor domain relationship knowledge distillation loss function.
The knowledge distillation of the neighbor relation is applied to different layers to reduce the difference of the neighbor relation of the teacher and the student. For this reason, the present embodiment builds the total loss function L when training the student network model total Can be expressed as:
L total =L cE +L KL +αL NFB +βL NLR
wherein α and β represent the hyper-parameters of the balance loss term, L CE Cross entropy loss function representing prediction result of student network model and label data, L KL An alignment loss function, L, representing softening logs NFB Distillation loss function representing characteristic relation of neighboring region, L NLB Representing neighbor domain logits relationship distillation loss function.
Cross entropy loss function L CE The method comprises the following steps:
L CE =H(z s ,y true )
wherein H (·, ·) represents cross entropy loss, z s Representing a second logits feature representing the output of the student network model, y true Tag data representing the sample.
Soft probability alignment loss function L between student network model and teacher network model KL The method comprises the following steps:
L KL =KL(softmax(z S ),softmax(z T ))
wherein KL (·, ·) is the KL function, softmax (·) represents the softmax function, z T Representing a first logits feature, z, of the teacher network model output s A second logits feature representing the student network model output;
step S7, training a student network model
And training the student network model by using the total loss function, and back-propagating, and updating parameters of the student network model to obtain a mature student network model.
When updating parameters of the student network model, the specific modes are as follows:
training a student network model using the step total loss function, back-propagating the calculated gradient, and updating the network parameters, parameter θ s The method comprises the following steps:
wherein θ S Representing parameters of the student network model, gamma representing learning rate,representing the total loss function L tatal Gradients of student network parameters are calculated.
Step S8, real-time classification of flower images
And acquiring a real-time flower image, inputting the flower image into a student network model, and outputting a classification result by the student network model.
Example 2
This embodiment provides a flower classification system based on knowledge distillation, as shown in fig. 1, which includes:
the present embodiment provides a flower classification method based on knowledge distillation, as shown in fig. 1, which includes the following steps:
the flower image sample data acquisition module is used for acquiring flower image sample data and tag data.
In addition, the acquired flower image sample data is preprocessed to amplify the sample data.
Preprocessing includes operations such as cropping, scaling, rotation, etc.
The knowledge distillation network model construction module is used for constructing a knowledge distillation network model, wherein the knowledge distillation network model comprises a teacher network model and a student network model, the teacher network model and the student network model both adopt convolutional neural networks, and the teacher network model is pre-trained.
The convolutional neural network in the teacher network model and the student network model comprises a convolutional layer, a batch normalization layer, a ReLU layer, a pooling layer and a full connection layer.
The sample feature extraction module is used for respectively inputting the flower image sample data into a pre-trained teacher network model and an untrained student network model, wherein a convolution layer of the teacher network model outputs a first middle feature map, a full-connection layer of the teacher network model outputs a first logits feature, a convolution layer of the student network model outputs a second middle feature map, and a full-connection layer of the student network model outputs a second logits feature.
The neighbor sample selection module is used for constructing a similarity matrix between the flower image sample data based on the first logits characteristics, and selecting K most similar neighbor samples of each flower image sample data according to the similarity matrix.
When the similarity matrix between the flower image sample data is constructed, a neighbor sample is selected by constructing a similarity matrix D, and K nearest neighbors are selected according to the similarity matrix D in each small batch. The specific method is as follows:
for sample x i And x j The calculation formula of the similarity matrix is as follows:
wherein,<·,·>the dot product operation is represented by a graph, I.I 2 Represents the standardization of the L2 and,representing sample x i Inputting logits output obtained by teacher network model, < >>Representing sample x j And inputting logits output obtained by the teacher network model.
By calculating the cosine values of the two samples, the larger the cosine value, the more similar the two samples are. The present embodiment selects, for each sample, the sample corresponding to the K largest values in the similarity matrix D as its K nearest neighbors.
The neighbor domain relation construction module is used for constructing a neighbor domain relation knowledge distillation loss function, wherein the neighbor domain relation knowledge distillation loss function comprises a neighbor domain characteristic relation distillation loss function and a neighbor domain logits relation distillation loss function;
constructing a neighbor domain feature relation distillation loss function based on the first intermediate feature map and the second intermediate feature map of the K nearest neighbor samples of the selected flower image sample data;
a near neighborhood logits relationship distillation loss function is constructed based on the first logits features and the second logits features of the K nearest neighbor samples of the selected each flower image sample data.
Based on the selected neighbor samples, the embodiment respectively constructs a neighbor domain feature relation and a neighbor domain logits relation through intermediate features, calculates a neighbor domain feature relation distillation loss function and a neighbor domain logits distillation loss function, and forms a neighbor domain feature relation distillation loss function and a neighbor domain logits distillation loss function to form a neighbor domain feature relation knowledge distillation loss function. Therefore, step S5 may be divided into two parts, namely, a neighbor domain feature relation distillation loss function and a neighbor domain logits distillation loss function.
Constructing a neighbor domain feature relation distillation loss function L NFR The specific mode is as follows:
step S5-1-1, calculating spatial similarity
The present embodiment makes full use of the information in the feature map for sample x i And its neighbor sample x j Calculating spatial similarity for the teacher network model and the student network model respectively;
spatial similarity of teacher network modelThe method comprises the following steps:
spatial phase of student network modelSimilarity degreeThe method comprises the following steps:
wherein f i T Representing teacher network model for sample x i The first intermediate feature map that is output is,representing a student network model for sample x i And outputting a second intermediate feature map.
S5-1-2, constructing the feature similarity of the neighboring domain
The embodiment also evaluates the similarity between each pixel point of the sample and the pixel points of the neighboring samples, for sample x i And its neighbor sample x j Respectively constructing neighbor domain feature similarity for a teacher network model and a student network model;
neighbor domain feature similarity of teacher network modelThe method comprises the following steps:
wherein,sample x representing a first intermediate feature map output according to a teacher network model i Spatial similarity to its first neighbor sample,/->Sample x representing a first intermediate feature map output according to a teacher network model i Spatial similarity to its Kth neighbor sample, K representing nearThe number of neighbor samples;
neighbor domain feature similarity of student network modelThe method comprises the following steps:
wherein,sample x representing a first intermediate feature map output from a student network model i Spatial similarity to its first neighbor sample,/->Sample x representing a first intermediate feature map output from a student network model i And its K-th neighbor sample spatial similarity, K represents the number of neighbor samples;
if the spatial resolution is different between the intermediate feature maps of the teacher network model and the student network model, an adaptive pool is used to align the resolutions.
S5-1-3, constructing a distillation loss function L of a near neighborhood characteristic relation NFR
Neighbor domain feature relation distillation loss function L NFR The method comprises the following steps:
wherein N represents the number of samples, K represents the number of neighbor samples,representing the spatial similarity of the teacher's network model.Representing student network modelSpatial similarity of (c) is determined.
The probability of each class is obtained by softmax normalizing the logits, so the logits reflect the prediction result of the network model on the image classification space. The logarithm of the different samples may also reflect the new judgment of the network model on its similarity in the classification space. For this purpose, the present embodiment is provided with a neighbor domain logits relation distillation loss function L NLR
Distillation loss function L is constructed by constructing near neighborhood logits relation NIR The specific mode is as follows:
in step S5-2-1, the present embodiment uses the row of similarity distribution to represent the similarity of the logits features, namely: construction sample x i And its neighbor sample x j Similarity between the two, and converting the similarity into a similarity distribution by using a softmax function, wherein the similarity distribution rho ij Expressed as:
ρ ij =softmax(z i -z f )
ρ ij ∈R 1×M
wherein z is i Representing a network model versus sample x i The output logits features, M, represent the number of categories.
Step S5-2-2 for sample x i K neighbor samples of the model are respectively used for constructing logits similarity distribution for a teacher network model and a student network model;
logets similarity distribution of teacher network modelExpressed as:
/>
logets similarity distribution for student network modelExpressed as:
wherein, the method comprises the steps of,representing a sample x calculated from a first logits feature output by a teacher network model i And the similarity distribution of the logits features of its first neighbor sample,/for the first neighbor sample>Representing a sample x calculated from a second logits feature output by the student network model i And the logits feature similarity distribution of the first neighbor sample, wherein K represents the number of the neighbor samples;
the goal of this embodiment is not to let the network model pair sample x i And x j The same judgment is made, and the consistency of the category similarity between two samples is maintained in the teacher network model and the student network model, so that a neighbor domain logits relation distillation loss function is constructed.
S5-2-3, constructing a near neighborhood logits relation distillation loss function L NLR
Neighbor domain feature relation distillation loss function L NLR The method comprises the following steps:
wherein N represents the number of samples, K represents the number of neighbor samples, js (·, ·) represents JS divergence,representing a sample x calculated from a first logits feature output by a teacher network model i And the similarity distribution of the logits features of its jth nearest neighbor sample,/for the sample>Representing a sample x calculated from a first logits feature output by a student network model i And the logs feature similarity distribution of its jth neighbor sample.
This embodiment uses Jenseb-Shannon (JS) divergence to measure differences in similarity distribution, unlike KL divergence, which is symmetrical. This embodiment desirably has a consistent determination of the structure of the two distributions so that no different values are produced due to variations in the target distribution.
And the total loss function construction module is used for constructing a total loss function, wherein the total loss function comprises a cross entropy loss function between a prediction result of the student network model and label data, an alignment loss function of softening logits and a neighbor domain relation knowledge distillation loss function.
The knowledge distillation of the neighbor relation is applied to different layers to reduce the difference of the neighbor relation of the teacher and the student. For this reason, the present embodiment builds the total loss function L when training the student network model total Can be expressed as:
L total =L CE +L KL +αL NFR +βL NLR
wherein α and β represent the hyper-parameters of the balance loss term, L CE Cross entropy loss function representing prediction result of student network model and label data, L KL An alignment loss function, L, representing softening logs NFR Distillation loss function representing characteristic relation of neighboring region, L NLR Representing neighbor domain logits relationship distillation loss function.
Cross entropy loss function L CE The method comprises the following steps:
L CE =H(z S ,y true )
wherein H (·, ·) represents cross entropy loss, z s Representing a second logits feature representing the output of the student network model, y true Tag data representing the sample.
Soft probability alignment loss function L between student network model and teacher network model RL The method comprises the following steps:
L KL =KL(softmax(z s ),softmax(z τ ))
wherein KL (·, ·) is the KL function, softmax (·) representssoftmax function, z T Representing a first logits feature, z, of the teacher network model output s A second logits feature representing the student network model output;
and the student network model training module is used for training the student network model by using the total loss function, and back-propagating, and updating parameters of the student network model to obtain a mature student network model.
When updating parameters of the student network model, the specific modes are as follows:
training a student network model using the step total loss function, back-propagating the calculated gradient, and updating the network parameters, parameter θ s The method comprises the following steps:
wherein θ S Representing parameters of the student network model, gamma representing learning rate,representing the total loss function L tarcl Gradients of student network parameters are calculated.
The flower image real-time classification module is used for acquiring real-time flower images, inputting the flower images into the student network model, and outputting classification results by the student network model.
Example 3
A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of a knowledge based distillation flower classification method.
The computer equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or D interface display memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. In other embodiments, the memory may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like. Of course, the memory may also include both internal storage units of the computer device and external storage devices. In this embodiment, the memory is typically used to store an operating system and various application software installed on the computer device, such as program codes of the flower classification method based on knowledge distillation. In addition, the memory may be used to temporarily store various types of data that have been output or are to be output.
The processor may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor is typically used to control the overall operation of the computer device. In this embodiment, the processor is configured to execute the program code stored in the memory or process data, such as the program code of the knowledge based distillation flower classification method.
Example 4
A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of a knowledge distillation based flower classification method.
Wherein the computer-readable storage medium stores an interface display program executable by at least one processor to cause the at least one processor to perform the steps of the knowledge distillation based flower classification method as described above.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server or a network device, etc.) to perform the method for classifying flowers based on knowledge distillation according to the embodiments of the present application.
The above is an embodiment of the present invention. The above embodiments and specific parameters in the embodiments are only for clearly describing the inventive verification process of the inventor, and are not intended to limit the scope of the invention, which is defined by the claims, and all equivalent structural changes made by applying the descriptions and the drawings of the invention are included in the scope of the invention.

Claims (10)

1. The flower classification method based on knowledge distillation is characterized by comprising the following steps:
step S1, obtaining flower image sample data
Obtaining flower image sample data and label data;
s2, constructing a knowledge distillation network model
The knowledge distillation network model comprises a teacher network model and a student network model, wherein the teacher network model and the student network model both adopt convolutional neural networks, and pretraining is carried out on the teacher network model;
step S3, extracting sample characteristics
Respectively inputting the flower image sample data into a pre-trained teacher network model and an untrained student network model, wherein a convolution layer of the teacher network model outputs a first middle feature map, a full-connection layer of the teacher network model outputs a first logits feature, a convolution layer of the student network model outputs a second middle feature map, and a full-connection layer of the student network model outputs a second logits feature;
S4, selecting a neighbor sample
Based on the first logits characteristics, constructing a similarity matrix between the flower image sample data, and selecting K nearest neighbor samples which are the most similar to each flower image sample data according to the similarity matrix;
s5, constructing a neighbor domain relation knowledge distillation loss function
The neighbor domain relation knowledge distillation loss function comprises a neighbor domain characteristic relation distillation loss function and a neighbor domain logits relation distillation loss function;
constructing a neighbor domain feature relation distillation loss function based on the first intermediate feature map and the second intermediate feature map of the K nearest neighbor samples of the selected flower image sample data;
constructing a near neighborhood logits relation distillation loss function based on the first logits characteristic and the second logits characteristic of the K nearest neighbor samples of the selected flower image sample data;
step S6, constructing a total loss function
The total loss function comprises a cross entropy loss function between a prediction result of the student network model and label data, an alignment loss function of softening logits and a neighbor domain relation knowledge distillation loss function;
step S7, training a student network model
Training a student network model by using the total loss function, and back-propagating, and updating parameters of the student network model to obtain a mature student network model;
Step S8, real-time classification of flower images
And acquiring a real-time flower image, inputting the flower image into a student network model, and outputting a classification result by the student network model.
2. A method for classifying flowers based on knowledge distillation as set forth in claim 1, wherein: in step S1, the obtained flower image sample data is preprocessed, where the preprocessing includes clipping, scaling and rotation.
3. A method for classifying flowers based on knowledge distillation as set forth in claim 1, wherein: in step S4, when the similarity matrix between the flower image sample data is constructed, the specific manner is as follows:
for sample x i And x j The calculation formula of the similarity matrix is as follows:
wherein,<·,·>the dot product operation is represented by a graph, I.I 2 Represents the standardization of the L2 and,representing sample x i Logits feature output after inputting teacher network model, < ->Representing sample x j And inputting the logits characteristics output by the teacher network model.
4. A method for classifying flowers based on knowledge distillation as set forth in claim 1, wherein: in step S5, a neighbor domain characteristic relation distillation loss function L is constructed NFR The specific mode is as follows:
step S5-1-1, calculating spatial similarity
For sample x i And its neighbor sample x j Calculating spatial similarity for the teacher network model and the student network model respectively;
spatial similarity of teacher network modelThe method comprises the following steps:
spatial similarity of student network modelThe method comprises the following steps:
wherein,representing teacher network model for sample x i Output first intermediate feature map, +.>Representing a student network model for sample x i Outputting a second intermediate feature map;
s5-1-2, constructing the feature similarity of the neighboring domain
The mixture is subjected to pooling, I.I 2 Representation of L2 normalization for sample x i And its neighbor sample x j Respectively constructing neighbor domain feature similarity for a teacher network model and a student network model;
neighbor domain feature similarity of teacher network modelThe method comprises the following steps:
wherein,sample x representing a first intermediate feature map output according to a teacher network model i And its first neighbor sample spatial similarity, K represents the number of neighbor samples;
neighbor domain feature similarity of student network modelThe method comprises the following steps:
wherein,sample x representing a first intermediate feature map output from a student network model i And its first neighbor sample spatial similarity, K represents the number of neighbor samples;
s5-1-3, constructing a distillation loss function L of a near neighborhood characteristic relation NFR
Neighbor domain feature relation distillation loss function L NFR The method comprises the following steps:
wherein N represents the number of samples, K represents the number of neighbor samples,representing spatial similarity of teacher network model, +.>Representing the spatial similarity of the student network model.
5. A method for classifying flowers based on knowledge distillation as set forth in claim 1, wherein: in step S5, the loss function L is distilled in the neighbor domain logits relation NLR The specific mode is as follows:
step S5-2-1, constructing sample x i And its neighbor sample x j Similarity between the two, and converting the similarity into a similarity distribution by using a softmax function, wherein the similarity is divided intoCloth ρ ij Expressed as:
ρ if =softmax(z i -z j )
ρ ij ∈R 1×M ρ ij
wherein z is i Representing a network model versus sample x i The output logits features, M represents the number of categories;
step S5-2-2 for sample x i K neighbor samples of the model are respectively used for constructing logits similarity distribution for a teacher network model and a student network model;
logets similarity distribution of teacher network modelExpressed as:
logets similarity distribution for student network modelExpressed as:
wherein,representing a sample x calculated from a first logits feature output by a teacher network model i And the similarity distribution of the logits features of its first neighbor sample,/for the first neighbor sample >Representing a sample x calculated from a second logits feature output by the student network model i And the logits feature similarity distribution of the first neighbor sample, wherein K represents the number of the neighbor samples;
step S5-2-3, constructing a neighborDomain logits relation distillation loss function L NLR
Neighbor domain feature relation distillation loss function L NLR The method comprises the following steps:
wherein N represents the number of samples, K represents the number of neighbor samples, JS (·, ·) represents JS divergence,logits feature similarity distribution representing teacher network model, < >>And the logits characteristic similarity distribution of the student network model is represented.
6. A method for classifying flowers based on knowledge distillation as set forth in claim 1, wherein: in step S6, when constructing the total loss function, the total loss function L total The method comprises the following steps:
L total =L CE +L HL +αL NFR +βL NLR
wherein α and β represent the hyper-parameters of the balance loss term, L CE Cross entropy loss function representing prediction result of student network model and label data, L KL An alignment loss function, L, representing softening logs NFR Distillation loss function representing characteristic relation of neighboring region, L NLR Representing neighbor domain logits relation distillation loss functions;
cross entropy loss function L CE The method comprises the following steps:
L CE =H(z s ,y true )
wherein H (·, ·) represents cross entropy loss, z S A second logits feature, y, representing the output of the student network model true Tag data representing the sample;
alignment loss function L KL The method comprises the following steps:
L HL =KL(softmax(z S ),softmax(z T ))
wherein KL (·, ·) is the KL function, softmax (·) represents the softmax function, z T Representing a first logits feature, z, of the teacher network model output S A second logits feature representing the output of the student network model.
7. A method for classifying flowers based on knowledge distillation as set forth in claim 1, wherein: in step S7, when updating parameters of the student network model, the specific manner is as follows:
training a student network model using the step total loss function, back-propagating the calculated gradient, and updating the network parameters, parameter θ S The method comprises the following steps:
wherein θ S Representing parameters of the student network model, gamma representing learning rate,representing the total loss function L total Gradients of student network parameters are calculated.
8. A knowledge distillation based flower classification system comprising:
the flower image sample data acquisition module is used for acquiring flower image sample data and tag data;
the knowledge distillation network model construction module is used for constructing a knowledge distillation network model, wherein the knowledge distillation network model comprises a teacher network model and a student network model, the teacher network model and the student network model both adopt convolutional neural networks, and the teacher network model is pre-trained;
The sample feature extraction module is used for respectively inputting the flower image sample data into a pre-trained teacher network model and an untrained student network model, wherein a convolution layer of the teacher network model outputs a first middle feature map, a full-connection layer of the teacher network model outputs first logits features, a convolution layer of the student network model outputs a second middle feature map, and a full-connection layer of the student network model outputs second logits features;
the neighbor sample selection module is used for constructing a similarity matrix between the flower image sample data based on the first logits characteristics, and selecting K nearest neighbor samples which are the most similar to each flower image sample data according to the similarity matrix;
the neighbor domain relation construction module is used for constructing a neighbor domain relation knowledge distillation loss function, wherein the neighbor domain relation knowledge distillation loss function comprises a neighbor domain characteristic relation distillation loss function and a neighbor domain logits relation distillation loss function;
constructing a neighbor domain feature relation distillation loss function based on the first intermediate feature map and the second intermediate feature map of the K nearest neighbor samples of the selected flower image sample data;
constructing a near neighborhood logits relation distillation loss function based on the first logits characteristic and the second logits characteristic of the K nearest neighbor samples of the selected flower image sample data;
The total loss function construction module is used for constructing a total loss function, wherein the total loss function comprises a cross entropy loss function between a prediction result of the student network model and label data, an alignment loss function of softening logits and a neighbor domain relation knowledge distillation loss function;
the student network model training module is used for training the student network model by using the total loss function, and back-propagating, and updating parameters of the student network model to obtain a mature student network model;
the flower image real-time classification module is used for acquiring real-time flower images, inputting the flower images into the student network model, and outputting classification results by the student network model.
9. A computer device, characterized by: comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the method of any of claims i to 7.
10. A computer-readable storage medium, characterized by: a computer program is stored which, when executed by a processor, causes the processor to perform the steps of the method according to any one of claims i to 7.
CN202310721513.0A 2023-06-16 2023-06-16 Flower classification method, system, equipment and medium based on knowledge distillation Active CN117058437B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310721513.0A CN117058437B (en) 2023-06-16 2023-06-16 Flower classification method, system, equipment and medium based on knowledge distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310721513.0A CN117058437B (en) 2023-06-16 2023-06-16 Flower classification method, system, equipment and medium based on knowledge distillation

Publications (2)

Publication Number Publication Date
CN117058437A true CN117058437A (en) 2023-11-14
CN117058437B CN117058437B (en) 2024-03-08

Family

ID=88666920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310721513.0A Active CN117058437B (en) 2023-06-16 2023-06-16 Flower classification method, system, equipment and medium based on knowledge distillation

Country Status (1)

Country Link
CN (1) CN117058437B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CA3076424A1 (en) * 2019-03-22 2020-09-22 Royal Bank Of Canada System and method for knowledge distillation between neural networks
US20210334543A1 (en) * 2020-04-28 2021-10-28 Ajou University Industry-Academic Cooperation Foundation Method for semantic segmentation based on knowledge distillation
CN113887698A (en) * 2021-08-25 2022-01-04 浙江大学 Overall knowledge distillation method and system based on graph neural network
US20220036194A1 (en) * 2021-10-18 2022-02-03 Intel Corporation Deep neural network optimization system for machine learning model scaling
CN114330580A (en) * 2021-12-31 2022-04-12 之江实验室 Robust knowledge distillation method based on ambiguity-oriented mutual label updating
CN114328952A (en) * 2021-12-17 2022-04-12 清华大学 Knowledge graph alignment method, device and equipment based on knowledge distillation
CN114444558A (en) * 2020-11-05 2022-05-06 佳能株式会社 Training method and training device for neural network for object recognition
CN114549893A (en) * 2022-01-17 2022-05-27 华南理工大学 Flower recognition system based on small sample learning
CN114626504A (en) * 2022-01-11 2022-06-14 南通大学 Model compression method based on group relation knowledge distillation
CN114758180A (en) * 2022-04-19 2022-07-15 电子科技大学 Knowledge distillation-based light flower recognition method
CN115359353A (en) * 2022-08-19 2022-11-18 广东工业大学 Flower identification and classification method and device
WO2023050738A1 (en) * 2021-09-29 2023-04-06 北京百度网讯科技有限公司 Knowledge distillation-based model training method and apparatus, and electronic device

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CA3076424A1 (en) * 2019-03-22 2020-09-22 Royal Bank Of Canada System and method for knowledge distillation between neural networks
US20210334543A1 (en) * 2020-04-28 2021-10-28 Ajou University Industry-Academic Cooperation Foundation Method for semantic segmentation based on knowledge distillation
CN114444558A (en) * 2020-11-05 2022-05-06 佳能株式会社 Training method and training device for neural network for object recognition
CN113887698A (en) * 2021-08-25 2022-01-04 浙江大学 Overall knowledge distillation method and system based on graph neural network
WO2023050738A1 (en) * 2021-09-29 2023-04-06 北京百度网讯科技有限公司 Knowledge distillation-based model training method and apparatus, and electronic device
US20220036194A1 (en) * 2021-10-18 2022-02-03 Intel Corporation Deep neural network optimization system for machine learning model scaling
CN114328952A (en) * 2021-12-17 2022-04-12 清华大学 Knowledge graph alignment method, device and equipment based on knowledge distillation
CN114330580A (en) * 2021-12-31 2022-04-12 之江实验室 Robust knowledge distillation method based on ambiguity-oriented mutual label updating
CN114626504A (en) * 2022-01-11 2022-06-14 南通大学 Model compression method based on group relation knowledge distillation
CN114549893A (en) * 2022-01-17 2022-05-27 华南理工大学 Flower recognition system based on small sample learning
CN114758180A (en) * 2022-04-19 2022-07-15 电子科技大学 Knowledge distillation-based light flower recognition method
CN115359353A (en) * 2022-08-19 2022-11-18 广东工业大学 Flower identification and classification method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHANGYONG SHU等: "Channel-Wise Knowledge Distillation for Dense Prediction", ICCV 2021 OPEN ACCESS, 31 December 2021 (2021-12-31), pages 5311 - 5320 *
ZHIXIAN YANG等: "Nearest Neighbor Knowledge Distillation for Neural Machine Translation", NAACL 2022 CONFERENCE, 6 May 2023 (2023-05-06), pages 1032 - 1043 *
沈项军等: "基于多重图像分割评价的图像对象定位方法", 模式识别与人工智能, vol. 25, no. 8, 15 August 2015 (2015-08-15), pages 760 - 767 *

Also Published As

Publication number Publication date
CN117058437B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
US10803359B2 (en) Image recognition method, apparatus, server, and storage medium
Li et al. Truncation cross entropy loss for remote sensing image captioning
CN114067160B (en) Small sample remote sensing image scene classification method based on embedded smooth graph neural network
CN110516095B (en) Semantic migration-based weak supervision deep hash social image retrieval method and system
CN112115783A (en) Human face characteristic point detection method, device and equipment based on deep knowledge migration
US20210406468A1 (en) Method and device for visual question answering, computer apparatus and medium
CN111401156B (en) Image identification method based on Gabor convolution neural network
CN111738169B (en) Handwriting formula recognition method based on end-to-end network model
CN110414616B (en) Remote sensing image dictionary learning and classifying method utilizing spatial relationship
CN111259940A (en) Target detection method based on space attention map
CN110929080A (en) Optical remote sensing image retrieval method based on attention and generation countermeasure network
CN110188827A (en) A kind of scene recognition method based on convolutional neural networks and recurrence autocoder model
He et al. Context-aware mathematical expression recognition: An end-to-end framework and a benchmark
CN114283285A (en) Cross consistency self-training remote sensing image semantic segmentation network training method and device
CN116912708A (en) Remote sensing image building extraction method based on deep learning
CN116524189A (en) High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
CN114973136A (en) Scene image recognition method under extreme conditions
CN110135435B (en) Saliency detection method and device based on breadth learning system
CN113642602B (en) Multi-label image classification method based on global and local label relation
CN117079276B (en) Semantic segmentation method, system, equipment and medium based on knowledge distillation
CN116758419A (en) Multi-scale target detection method, device and equipment for remote sensing image
CN117058437B (en) Flower classification method, system, equipment and medium based on knowledge distillation
CN115512340A (en) Intention detection method and device based on picture
CN114140664A (en) Training method of image processing model, and image similarity determining method and device
CN114692715A (en) Sample labeling method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant