CN113408570A - Image category identification method and device based on model distillation, storage medium and terminal - Google Patents

Image category identification method and device based on model distillation, storage medium and terminal Download PDF

Info

Publication number
CN113408570A
CN113408570A CN202110499204.4A CN202110499204A CN113408570A CN 113408570 A CN113408570 A CN 113408570A CN 202110499204 A CN202110499204 A CN 202110499204A CN 113408570 A CN113408570 A CN 113408570A
Authority
CN
China
Prior art keywords
model
teacher
trained
image
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110499204.4A
Other languages
Chinese (zh)
Inventor
廖丹萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Smart Video Security Innovation Center Co Ltd
Original Assignee
Zhejiang Smart Video Security Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Smart Video Security Innovation Center Co Ltd filed Critical Zhejiang Smart Video Security Innovation Center Co Ltd
Priority to CN202110499204.4A priority Critical patent/CN113408570A/en
Publication of CN113408570A publication Critical patent/CN113408570A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image category identification method, device, storage medium and terminal based on model distillation, wherein the method comprises the following steps: acquiring a target image to be classified, inputting the target image into a pre-trained student model, and outputting a plurality of class probability values; the pre-trained student model is generated based on model distillation training, the model distillation training generation is generated based on similarity training between different classification mapping vectors of a full connection layer in the pre-trained teacher model, and the similarity between the different classification mapping vectors is a cosine value of an included angle between the different classification mapping vectors; and identifying a target category of the target image to be classified based on the plurality of category probability values. Therefore, the similarity between different classification mapping vectors of the full connection layer in the teacher model based on pre-training is adopted to guide the student model to train, so that the student model can learn the characteristics of the similarity with the teacher model, and the student model has a simple structure and few parameters, so that the running speed of a hardware platform is increased, and the image classification efficiency is further improved.

Description

Image category identification method and device based on model distillation, storage medium and terminal
Technical Field
The invention relates to the technical field of computer vision, in particular to an image category identification method and device based on model distillation, a storage medium and a terminal.
Background
In recent years, deep neural networks have enabled the performance of many computer vision tasks to unprecedented levels. The more complex the model structure of the neural network is, the more the parameters are, the richer the knowledge that the network can learn, and the better the learning effect is. However, the high amount of storage space and computing resources make it difficult to apply the large network model to various mobile platforms, and therefore, designing a network model that is lighter and has good performance becomes one of the key researches for landing application of computer vision algorithms.
In the prior art, model lightweight generally adopts a model compression method to reduce the consumption of the model on calculation space and time by performing parameter clipping, weight decomposition or model distillation on a large model. However, prior relations of classification mapping vectors between classes cannot be effectively utilized in model training in the prior art, so that when image classification is performed by using a trained model, a classification result is not accurate enough.
Disclosure of Invention
The embodiment of the application provides an image category identification method and device based on model distillation, a storage medium and a terminal. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
In a first aspect, an embodiment of the present application provides an image class identification method based on model distillation, including:
acquiring a target image to be classified;
inputting a target image to be classified into a pre-trained student model, and outputting a plurality of class probability values corresponding to the target image;
the pre-trained student model is generated based on model distillation training, the model distillation training generation is generated based on similarity training between different classification mapping vectors of a full connection layer in the pre-trained teacher model, and the similarity between the different classification mapping vectors is a cosine value of an included angle between the different classification mapping vectors;
and identifying a target category of the target image to be classified based on the plurality of category probability values.
Optionally, identifying the target category of the target image to be classified based on the plurality of category probability values includes:
selecting a maximum category probability value of a plurality of category probability values;
identifying a target category corresponding to the selected maximum category probability value;
and determining the target class as the class of the target image to be classified.
Optionally, the pre-trained student model at least comprises a feature extraction layer, a full connection layer and a normalization layer;
inputting a target image to be classified into a pre-trained student model, and outputting a plurality of category probability values corresponding to the target image, wherein the category probability values comprise:
inputting a target image to be classified into a feature extraction layer for feature extraction to generate target features;
inputting the target characteristics into a full connection layer, and outputting a plurality of category confidence degrees corresponding to the target image;
and inputting the confidence degrees of the multiple categories into a normalization layer, and outputting multiple category probability values corresponding to the target image.
Optionally, generating a pre-trained teacher model according to the following steps, including:
collecting various types of image sets to generate model training samples;
creating a teacher model;
inputting the model training sample into a teacher model for training, and generating a trained teacher model;
and determining the trained teacher model as a pre-trained teacher model.
Optionally, generating a pre-trained student model according to the following steps, including:
constructing a first similarity matrix S aiming at different classification mapping vectors of all connection layers in a pre-trained teacher modelteacher∈Rk×kK is the number of classes, R is a real number;
creating a student model; wherein the parameter quantity of the student model is less than that of the teacher model;
constructing a second similarity matrix S aiming at different category classification mapping vectors of all connection layers in the student modelstudent∈Rk×k
According to the first similarity matrix Steacher∈Rk×kWith the second similarity matrix Steacher∈Rk×kConstructing a target loss function of the student model;
associating the target loss function to the student model to generate the student model after the association function;
acquiring an nth image from a model training sample, and inputting the nth image into a student model after a correlation function for training;
and when the iterative training times of the model are smaller than a preset value, continuing to execute the step of acquiring the (n + 1) th image from the model training sample, and when n +1 is larger than the model training sample, randomly arranging the sequence of the images in the model training sample, and resetting n to be 1.
Optionally, the constructing a first similarity matrix for different classification mapping vectors of the fully-connected layer in the pre-trained teacher model includes:
calculating cosine values of included angles among different classes of classification mapping vectors of all connection layers in a pre-trained teacher model to generate a first similarity matrix;
and the number of the first and second groups,
calculating cosine values of included angles among different classification mapping vectors of all connection layers in the student model to generate a second similarity matrix;
wherein, the similarity matrix calculation formula is as follows:
Steacher(i,j)=cosine(Si,Sj) Wherein S isteacher(i, j) a classification mapping vector S representing a class iiClass mapping vector S with class jjAnd (5) mapping cosine values of vector included angles in a classified mode.
Optionally, the calculation formula of the target loss function is:
Figure BDA0003055708200000031
Figure BDA0003055708200000032
where λ is the weight of the similarity matrix loss function, LdistillDistillation loss for student model, LdistillThe calculation formula of (2) is as follows: l isdistill=∑i-pi×logqiP and q are vectors output by the normalization layer of the teacher model and the student model respectively;
a normalization layer may be defined as:
Figure BDA0003055708200000033
wherein the vector Z is the vector of the output of the fully connected layer, qiRepresenting a probability value, Z, of the ith categoryiZ, dimension of the output vector representing the full connected layerjAnd j dimension representing the output vector of the full connection layer, and T is a parameter for controlling the smoothness of the output probability.
In a second aspect, an embodiment of the present application provides an image class identification apparatus based on model distillation, including:
the image acquisition module is used for acquiring a target image to be classified;
the probability value output module is used for inputting the target images to be classified into a pre-trained student model and outputting a plurality of class probability values corresponding to the target images;
the pre-trained student model is generated based on model distillation training, the model distillation training generation is generated based on similarity training between different classification mapping vectors of a full connection layer in the pre-trained teacher model, and the similarity between the different classification mapping vectors is a cosine value of an included angle between the different classification mapping vectors;
and the category identification module is used for identifying the target category of the target image to be classified based on the plurality of category probability values.
In a third aspect, embodiments of the present application provide a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.
In a fourth aspect, an embodiment of the present application provides a terminal, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
in the embodiment of the application, the image class recognition device based on model distillation firstly acquires target images to be classified and inputs the target images into a pre-trained student model, and then outputs a plurality of class probability values, wherein the pre-trained student model is generated based on model distillation training, the model distillation training generation is generated based on similarity training between different class classification mapping vectors of a full connection layer in a pre-trained teacher model, the similarity between the different class classification mapping vectors is a cosine value of an included angle between the different class classification mapping vectors, and finally the target classes of the target images to be classified are recognized based on the plurality of class probability values. Therefore, the similarity between different classification mapping vectors of the full connection layer in the teacher model based on pre-training is adopted to guide the student model to train, so that the student model can learn the characteristics of the similarity with the teacher model, and the student model has a simple structure and few parameters, so that the running speed of a hardware platform is increased, and the image classification efficiency is further improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a schematic flowchart of an image class identification method based on model distillation according to an embodiment of the present application;
FIG. 2 is a schematic block diagram of a process of image class identification based on model distillation according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of an apparatus for identifying image types based on model distillation according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a terminal according to an embodiment of the present application.
Detailed Description
The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
The image class identification method based on model distillation provided by the embodiment of the present application will be described in detail below with reference to fig. 1 to 2. The method may be implemented in dependence on a computer program, executable on a model distillation based image class identification device based on the von neumann architecture. The computer program may be integrated into the application or may run as a separate tool-like application. The image category identification device based on model distillation in the embodiment of the present application may be a user terminal, including but not limited to: personal computers, tablet computers, handheld devices, in-vehicle devices, wearable devices, computing devices or other processing devices connected to a wireless modem, and the like. The user terminals may be called different names in different networks, for example: user equipment, access terminal, subscriber unit, subscriber station, mobile station, remote terminal, mobile device, user terminal, wireless communication device, user agent or user equipment, cellular telephone, cordless telephone, Personal Digital Assistant (PDA), terminal equipment in a 5G network or future evolution network, and the like.
Referring to fig. 1, a schematic flow chart of an image class identification method based on model distillation is provided for an embodiment of the present application. As shown in fig. 1, the method of the embodiment of the present application may include the following steps:
s101, acquiring a target image to be classified;
the target image to be classified is an image used for testing the performance of a student model or an image acquired when the student model is applied to a classification application scene.
Generally, when the target image to be classified is an image for testing the performance of a student model, the target image to be classified may be obtained from a test sample, may also be an image obtained from a user terminal, and may also be an image downloaded from a cloud. When the target image to be classified is an image acquired when the student model is applied to the classification application scene, the image to be classified may be an image acquired in real time by the image acquisition device.
In a possible implementation manner, after the training of the student model based on the teacher model is finished and the student model after the training is finished is deployed in an actual application scene, the object sensor or the object monitoring algorithm triggers the photographing function of image acquisition and photographing to acquire a target image entering a monitoring area after detecting that an object enters the camera monitoring area, and finally the target image is determined as a target image to be classified.
In another possible implementation manner, after training of a student model based on teacher model training is finished, the image classification performance of the trained student model needs to be detected, a user downloads any image with an object from a sample test set, a local gallery or a cloud through a user terminal, and the image is determined as a target image to be classified.
S102, inputting a target image to be classified into a pre-trained student model, and outputting a plurality of class probability values corresponding to the target image; the pre-trained student model is generated based on model distillation training, the model distillation training generation is generated based on similarity training between different classification mapping vectors of a full connection layer in the pre-trained teacher model, and the similarity between the different classification mapping vectors is a cosine value of an included angle between the different classification mapping vectors;
in the model distillation method, an algorithm firstly trains a large model with higher precision, which is called a teacher model. And then, the knowledge learned in the teacher model is used for guiding and training a student model with a small number of parameters. The student model improves the performance of the student model by learning information which is beneficial to classification in the teacher model. The student model has less parameters and high running speed, so that the student model can be conveniently deployed on various hardware platforms.
Typically, both teacher models and student models are created through a neural network, preferably a convolutional neural network.
In the fully connected classification layer, the more similar the classes, the more similar the mapping vectors are, i.e. the smaller the included angle between the vectors. The more dissimilar classes, the more dissimilar the class mapping vectors are. For example, the ImageNet dataset includes categories of dogs, wolves, airplanes, and the like. By analyzing the existing big model trained on ImageNet, we find that the included angle of the classification mapping vector of the dog and the wolf is small, and the included angle of the mapping vector of the dog and the airplane is large. Therefore, the similarity of the classification mapping vectors can be used as an index for measuring the similarity of the classes.
It is desirable that the student models can learn similarities between classes in the teacher network during training to better model data distribution. That is, if the category a and the category B have a high degree of similarity in the teacher model, then in the student model, it is desirable that the degree of similarity between the category a and the category B can be maintained. If category a and category B have a lower degree of similarity in the teacher model, then category a and category B remain dissimilar in the student model. That is, it is desirable that the similarity between class mapping vectors in the student model classification layer can be consistent with that in the teacher model.
In the embodiment of the application, when a pre-trained teacher model is generated, firstly, a plurality of types of image sets are collected to generate a model training sample, then, the teacher model is created, then, the model training sample is input into the teacher model for training, a trained teacher model is generated, and finally, the trained teacher model is determined as the pre-trained teacher model.
In the embodiment of the application, when a pre-trained student model is generated, a first similarity matrix S is firstly constructed for different classification mapping vectors of all connection layers in the pre-trained teacher modelteacher∈Rk×kK is the number of classes, R is a real number; then, a student model is created, wherein the parameter quantity of the student model is smaller than that of the teacher model, and then a second similarity matrix S is constructed according to different classes of classification mapping vectors of all connection layers in the student modelstudent∈Rk×kThen according to the first similarity matrix Steacher∈Rk×kWith the second similarity matrix Steacher∈Rk×kConstructing a target loss function of a student model, then associating the target loss function to the student model to generate the student model after the association function, finally obtaining an nth image from a model training sample, inputting the nth image into the student model after the association function for training, when the iterative training times of the model is less than a preset value, continuing to execute the step of obtaining an (n + 1) th image from the model training sample, and when n +1 is greater than the model training sample, carrying out the step of obtaining a model training sampleThe order of the images in (1) is randomly arranged, and n is reset to 1.
Specifically, the calculation formula of the target loss function is as follows:
Figure BDA0003055708200000081
Figure BDA0003055708200000082
where λ is the weight of the similarity matrix loss function, LdistillDistillation loss for student model, LdistillThe calculation formula of (2) is as follows: l isdistill=∑i-pi×logqiP and q are vectors output by the normalization layer of the teacher model and the student model respectively; a normalization layer may be defined as:
Figure BDA0003055708200000083
wherein the vector Z is the vector of the output of the fully connected layer, qiRepresenting a probability value, Z, of the ith categoryiZ, dimension of the output vector representing the full connected layerjAnd j dimension representing the output vector of the full connection layer, and T is a parameter for controlling the smoothness of the output probability.
Further, a first similarity matrix S is constructed by aiming at different classification mapping vectors of the full connection layer in the pre-trained teacher modelteacherFirstly, calculating cosine values of included angles among different classification mapping vectors of all connection layers in a pre-trained teacher model to generate a first similarity matrix; calculating cosine values of included angles among different classification mapping vectors of all connection layers in the student model to generate a second similarity matrix; wherein, the cosine value calculation formula in the first cosine value set is: steacher(i,j)=cosine(Si,Sj) Wherein S isteacher(i, j) a classification mapping vector S representing a class iiClass mapping vector S with class jjAnd (5) mapping cosine values of vector included angles in a classified mode.
In a possible implementation manner, after the target image to be classified is acquired based on step S101, when the target image to be classified is input into a pre-trained student model for processing, the pre-trained student model at least includes a feature extraction layer, a full connection layer, and a normalization layer, the target image to be classified is input into the feature extraction layer for feature extraction to generate a target feature, the target feature is input into the full connection layer to output a plurality of class confidence coefficients corresponding to the target image, and finally the plurality of class confidence coefficients are input into the normalization layer to output a plurality of class probability values corresponding to the target image.
For example, the probability values output after the image is processed by the pre-trained student model are as follows: the animal type probability is 23%, the human body type probability is 67%, and the other type probability is 10%, and it can be known from the output probability that the maximum probability is 67% of the human body type probability, and therefore, the object in the image is the human body type.
S103, identifying the target category of the target image to be classified based on the plurality of category probability values.
In a possible implementation manner, after obtaining a plurality of category probability values of an image to be classified, a maximum probability value of the plurality of category probability values is selected first, then a target category corresponding to the selected maximum probability value is identified, and finally the target category is determined as a category to which the target image to be classified belongs.
For example, as shown in fig. 2, fig. 2 is a schematic process diagram of an image category identification process based on model distillation, which is provided by the present application, and is implemented by first obtaining a target image, then inputting the target image into a pre-trained student model, inputting a probability value 1, a probability value 2, a probability value 3, and a probability value n after model processing, then selecting a probability value with a maximum probability value from a plurality of output probabilities, and determining a category corresponding to the probability value with the maximum probability value as a category to which the image finally belongs.
In the embodiment of the application, the image class recognition device based on model distillation firstly acquires target images to be classified and inputs the target images into a pre-trained student model, and then outputs a plurality of class probability values, wherein the pre-trained student model is generated based on model distillation training, the model distillation training generation is generated based on similarity training between different class classification mapping vectors of a full connection layer in a pre-trained teacher model, the similarity between the different class classification mapping vectors is a cosine value of an included angle between the different class classification mapping vectors, and finally the target classes of the target images to be classified are recognized based on the plurality of class probability values. Therefore, the similarity between different classification mapping vectors of the full connection layer in the teacher model based on pre-training is adopted to guide the student model to train, so that the student model can learn the characteristics of the similarity with the teacher model, and the student model has a simple structure and few parameters, so that the running speed of a hardware platform is increased, and the image classification efficiency is further improved.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.
Referring to fig. 3, a schematic structural diagram of an image class identification apparatus based on model distillation according to an exemplary embodiment of the present invention is shown. The image category identification device based on model distillation can be realized by software, hardware or a combination of the two to form all or part of a terminal. The device 1 comprises an image acquisition module 10, a probability value output module 20 and a category identification module 30.
The image acquisition module 10 is used for acquiring a target image to be classified;
a probability value output module 20, configured to input a target image to be classified into a pre-trained student model, and output a plurality of class probability values corresponding to the target image;
the pre-trained student model is generated based on model distillation training, the model distillation training generation is generated based on similarity training between different classification mapping vectors of a full connection layer in the pre-trained teacher model, and the similarity between the different classification mapping vectors is a cosine value of an included angle between the different classification mapping vectors;
and the category identification module 30 is used for identifying the target category of the target image to be classified based on a plurality of category probability values.
It should be noted that, when the image classification recognition apparatus based on model distillation provided in the above embodiment executes the image classification recognition method based on model distillation, the above-mentioned division of the functional modules is merely exemplified, and in practical applications, the above-mentioned function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to complete all or part of the above-mentioned functions. In addition, the image category identification device based on model distillation and the image category identification method based on model distillation provided by the above embodiments belong to the same concept, and details of implementation processes are shown in the method embodiments, which are not described herein again.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the embodiment of the application, the image class recognition device based on model distillation firstly acquires target images to be classified and inputs the target images into a pre-trained student model, and then outputs a plurality of class probability values, wherein the pre-trained student model is generated based on model distillation training, the model distillation training generation is generated based on similarity training between different class classification mapping vectors of a full connection layer in a pre-trained teacher model, the similarity between the different class classification mapping vectors is a cosine value of an included angle between the different class classification mapping vectors, and finally the target classes of the target images to be classified are recognized based on the plurality of class probability values. Therefore, the similarity between different classification mapping vectors of the full connection layer in the teacher model based on pre-training is adopted to guide the student model to train, so that the student model can learn the characteristics of the similarity with the teacher model, and the student model has a simple structure and few parameters, so that the running speed of a hardware platform is increased, and the image classification efficiency is further improved.
The present invention also provides a computer readable medium having stored thereon program instructions that, when executed by a processor, implement the method for model distillation based image class identification provided by the above-described method embodiments.
The present invention also provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method for model distillation based image class identification of the various method embodiments described above.
Please refer to fig. 4, which provides a schematic structural diagram of a terminal according to an embodiment of the present application. As shown in fig. 4, terminal 1000 can include: at least one processor 1001, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002.
Wherein a communication bus 1002 is used to enable connective communication between these components.
The user interface 1003 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Processor 1001 may include one or more processing cores, among other things. The processor 1001 interfaces various components throughout the electronic device 1000 using various interfaces and lines to perform various functions of the electronic device 1000 and to process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1005 and invoking data stored in the memory 1005. Alternatively, the processor 1001 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1001 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 1001, but may be implemented by a single chip.
The Memory 1005 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1005 includes a non-transitory computer-readable medium. The memory 1005 may be used to store an instruction, a program, code, a set of codes, or a set of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 4, a memory 1005, which is one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a model distillation based image class identification application.
In the terminal 1000 shown in fig. 4, the user interface 1003 is mainly used as an interface for providing input for a user, and acquiring data input by the user; and the processor 1001 may be configured to invoke the model distillation based image class identification application stored in the memory 1005 and specifically perform the following operations:
acquiring a target image to be classified;
inputting a target image to be classified into a pre-trained student model, and outputting a plurality of class probability values corresponding to the target image;
the pre-trained student model is generated based on model distillation training, the model distillation training generation is generated based on similarity training between different classification mapping vectors of a full connection layer in the pre-trained teacher model, and the similarity between the different classification mapping vectors is a cosine value of an included angle between the different classification mapping vectors;
and identifying a target category of the target image to be classified based on the plurality of category probability values.
In one embodiment, the processor 1001 specifically performs the following operations when performing the identification of the target class of the target image to be classified based on the plurality of class probability values:
selecting a maximum category probability value of a plurality of category probability values;
identifying a target category corresponding to the selected maximum category probability value;
and determining the target class as the class of the target image to be classified.
In one embodiment, when the processor 1001 inputs the target image to be classified into a pre-trained student model and outputs a plurality of class probability values corresponding to the target image, the following operations are specifically performed:
inputting a target image to be classified into a feature extraction layer for feature extraction to generate target features;
inputting the target characteristics into a full connection layer, and outputting a plurality of category confidence degrees corresponding to the target image;
and inputting the confidence degrees of the multiple categories into a normalization layer, and outputting multiple category probability values corresponding to the target image.
In one embodiment, the processor 1001, when executing generating the pre-trained teacher model, specifically performs the following operations:
collecting various types of image sets to generate model training samples;
creating a teacher model;
inputting the model training sample into a teacher model for training, and generating a trained teacher model;
and determining the trained teacher model as a pre-trained teacher model.
In one embodiment, the processor 1001, when performing generating the pre-trained student model, specifically performs the following operations:
constructing a first similarity matrix S aiming at different classification mapping vectors of all connection layers in a pre-trained teacher modelteacher∈Rk×kK is the number of classes, R is a real number;
creating a student model; wherein the parameter quantity of the student model is less than that of the teacher model;
constructing second similarity according to different classification mapping vectors of all connection layers in student modelMatrix Sstudent∈Rk×k
According to the first similarity matrix Steacher∈Rk×kWith the second similarity matrix Steacher∈Rk×kConstructing a target loss function of the student model;
associating the target loss function to the student model to generate the student model after the association function;
acquiring an nth image from a model training sample, and inputting the nth image into a student model after a correlation function for training;
and when the iterative training times of the model are smaller than a preset value, continuing to execute the step of acquiring the (n + 1) th image from the model training sample, and when n +1 is larger than the model training sample, randomly arranging the sequence of the images in the model training sample, and resetting n to be 1.
In one embodiment, the processor 1001, when performing the construction of the first similarity matrix for different category classification mapping vectors of the fully connected layers within the pre-trained teacher model, specifically performs the following operations:
calculating cosine values of included angles among different classes of classification mapping vectors of all connection layers in a pre-trained teacher model to generate a first similarity matrix;
and the number of the first and second groups,
calculating cosine values of included angles among different classification mapping vectors of all connection layers in the student model to generate a second similarity matrix;
wherein, the cosine value calculation formula in the first cosine value set is:
Steacher(i,j)=cosine(Si,Sj) Wherein S isteacher(i, j) a classification mapping vector S representing a class iiClass mapping vector S with class jjAnd (5) mapping cosine values of vector included angles in a classified mode.
In the embodiment of the application, the image class recognition device based on model distillation firstly acquires target images to be classified and inputs the target images into a pre-trained student model, and then outputs a plurality of class probability values, wherein the pre-trained student model is generated based on model distillation training, the model distillation training generation is generated based on similarity training between different class classification mapping vectors of a full connection layer in a pre-trained teacher model, the similarity between the different class classification mapping vectors is a cosine value of an included angle between the different class classification mapping vectors, and finally the target classes of the target images to be classified are recognized based on the plurality of class probability values. Therefore, the similarity between different classification mapping vectors of the full connection layer in the teacher model based on pre-training is adopted to guide the student model to train, so that the student model can learn the characteristics of the similarity with the teacher model, and the student model has a simple structure and few parameters, so that the running speed of a hardware platform is increased, and the image classification efficiency is further improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program to instruct associated hardware, and the program for image class identification based on model distillation can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory or a random access memory.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims (10)

1. An image class identification method based on model distillation, which is characterized by comprising the following steps:
acquiring a target image to be classified;
inputting the target image to be classified into a pre-trained student model, and outputting a plurality of category probability values corresponding to the target image;
the pre-trained student model is generated based on model distillation training, the model distillation training generation is generated based on similarity training between different classification mapping vectors of a full connection layer in a pre-trained teacher model, and the similarity between the different classification mapping vectors is a cosine value of an included angle between the different classification mapping vectors;
identifying a target class of the target image to be classified based on the plurality of class probability values.
2. The method of claim 1, wherein identifying the target class of the target image to be classified based on the plurality of class probability values comprises:
selecting a maximum category probability value of the plurality of category probability values;
identifying a target category corresponding to the selected maximum category probability value;
and determining the target class as the class of the target image to be classified.
3. The method of claim 1, wherein the pre-trained student model comprises at least a feature extraction layer, a full connection layer, and a normalization layer;
the inputting the target image to be classified into a pre-trained student model, and outputting a plurality of category probability values corresponding to the target image, includes:
inputting the target image to be classified into the feature extraction layer for feature extraction to generate target features;
inputting the target features into the full-connection layer, and outputting a plurality of category confidence degrees corresponding to the target image;
and inputting the plurality of category confidence degrees into the normalization layer, and outputting a plurality of category probability values corresponding to the target image.
4. The method of claim 1, wherein generating a pre-trained teacher model comprises:
collecting various types of image sets to generate model training samples;
creating a teacher model;
inputting the model training sample into the teacher model for training, and then generating a trained teacher model;
and determining the trained teacher model as a pre-trained teacher model.
5. The method of claim 4, wherein generating a pre-trained student model comprises:
constructing a first similarity matrix S aiming at different classification mapping vectors of all connection layers in the pre-trained teacher modelteacher∈Rk×kK is the number of classes, R is a real number;
creating a student model; wherein the parameter quantity of the student model is smaller than the parameter quantity of the teacher model;
constructing a second similarity matrix S aiming at different category classification mapping vectors of all connection layers in the student modelstudent∈Rk×k
According to the first similarity matrix Steacher∈Rk×kWith said second similarity matrix Steacher∈Rk×kConstructing an objective loss function of the student model;
associating the target loss function to the student model to generate a student model after an association function;
acquiring an nth image from the model training sample, and inputting the nth image into the student model after the correlation function for training;
and when the iterative training times of the model are smaller than a preset value, continuing to execute the step of acquiring the (n + 1) th image from the model training sample, and when the n +1 is larger than the model training sample, randomly arranging the sequence of the images in the model training sample, and resetting n to be 1.
6. The method of claim 6, wherein constructing a first similarity matrix for different class classification mapping vectors for fully connected layers within the pre-trained teacher model comprises:
calculating cosine values of included angles among different classes of classification mapping vectors of all connection layers in the pre-trained teacher model to generate a first similarity matrix;
and the number of the first and second groups,
calculating cosine values of included angles among different classification mapping vectors of all connection layers in the student model to generate a second similarity matrix;
wherein, the similarity matrix calculation formula is as follows:
Steacher(i,j)=cosine(Si,Sj) Wherein S isteacher(i, j) a classification mapping vector S representing a class iiClass mapping vector S with class jjAnd (5) mapping cosine values of vector included angles in a classified mode.
7. The method of claim 5, wherein the objective loss function is calculated by:
Figure FDA0003055708190000031
where λ is the weight of the similarity matrix loss function, LdistillFor the distillation loss of the student model, LdistillThe calculation formula of (2) is as follows: l isdistill=∑i-pi×logqiP and q are vectors output by the normalization layer of the teacher model and the student model respectively;
the normalization layer may be defined as:
Figure FDA0003055708190000032
wherein the vector Z is the vector of the output of the fully connected layer, qiRepresenting a probability value, Z, of the ith categoryiZ, dimension of the output vector representing the full connected layerjAnd j dimension representing the output vector of the full connection layer, and T is a parameter for controlling the smoothness of the output probability.
8. An image class identification device based on model distillation, characterized in that the device comprises:
the image acquisition module is used for acquiring a target image to be classified;
the probability value output module is used for inputting the target image to be classified into a pre-trained student model and outputting a plurality of category probability values corresponding to the target image;
the pre-trained student model is generated based on model distillation training, the model distillation training generation is generated based on similarity training between different classification mapping vectors of a full connection layer in a pre-trained teacher model, and the similarity between the different classification mapping vectors is a cosine value of an included angle between the different classification mapping vectors;
and the class identification module is used for identifying the target class of the target image to be classified based on the plurality of class probability values.
9. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the method steps according to any of claims 1-7.
10. A terminal, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-7.
CN202110499204.4A 2021-05-08 2021-05-08 Image category identification method and device based on model distillation, storage medium and terminal Pending CN113408570A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110499204.4A CN113408570A (en) 2021-05-08 2021-05-08 Image category identification method and device based on model distillation, storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110499204.4A CN113408570A (en) 2021-05-08 2021-05-08 Image category identification method and device based on model distillation, storage medium and terminal

Publications (1)

Publication Number Publication Date
CN113408570A true CN113408570A (en) 2021-09-17

Family

ID=77678295

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110499204.4A Pending CN113408570A (en) 2021-05-08 2021-05-08 Image category identification method and device based on model distillation, storage medium and terminal

Country Status (1)

Country Link
CN (1) CN113408570A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837270A (en) * 2021-09-18 2021-12-24 广东人工智能与先进计算研究院 Target identification method, device, equipment and storage medium
CN114419378A (en) * 2022-03-28 2022-04-29 杭州未名信科科技有限公司 Image classification method and device, electronic equipment and medium
CN114663714A (en) * 2022-05-23 2022-06-24 阿里巴巴(中国)有限公司 Image classification and ground object classification method and device
CN116028891A (en) * 2023-02-16 2023-04-28 之江实验室 Industrial anomaly detection model training method and device based on multi-model fusion
CN116758618A (en) * 2023-08-16 2023-09-15 苏州浪潮智能科技有限公司 Image recognition method, training device, electronic equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018147392A (en) * 2017-03-08 2018-09-20 日本電信電話株式会社 Model learning device, score calculation device, method, data structure, and program
CN109697257A (en) * 2018-12-18 2019-04-30 天罡网(北京)安全科技有限公司 It is a kind of based on the network information retrieval method presorted with feature learning anti-noise
US20190385086A1 (en) * 2018-06-13 2019-12-19 Fujitsu Limited Method of knowledge transferring, information processing apparatus and storage medium
CN111950638A (en) * 2020-08-14 2020-11-17 厦门美图之家科技有限公司 Image classification method and device based on model distillation and electronic equipment
CN112116030A (en) * 2020-10-13 2020-12-22 浙江大学 Image classification method based on vector standardization and knowledge distillation
CN112184508A (en) * 2020-10-13 2021-01-05 上海依图网络科技有限公司 Student model training method and device for image processing
KR20210029110A (en) * 2019-09-05 2021-03-15 고려대학교 산학협력단 Method and apparatus for few-shot image classification based on deep learning
CN112560978A (en) * 2020-12-23 2021-03-26 北京市商汤科技开发有限公司 Image processing method and device, electronic device and storage medium
CN112559784A (en) * 2020-11-02 2021-03-26 浙江智慧视频安防创新中心有限公司 Image classification method and system based on incremental learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018147392A (en) * 2017-03-08 2018-09-20 日本電信電話株式会社 Model learning device, score calculation device, method, data structure, and program
US20190385086A1 (en) * 2018-06-13 2019-12-19 Fujitsu Limited Method of knowledge transferring, information processing apparatus and storage medium
CN109697257A (en) * 2018-12-18 2019-04-30 天罡网(北京)安全科技有限公司 It is a kind of based on the network information retrieval method presorted with feature learning anti-noise
KR20210029110A (en) * 2019-09-05 2021-03-15 고려대학교 산학협력단 Method and apparatus for few-shot image classification based on deep learning
CN111950638A (en) * 2020-08-14 2020-11-17 厦门美图之家科技有限公司 Image classification method and device based on model distillation and electronic equipment
CN112116030A (en) * 2020-10-13 2020-12-22 浙江大学 Image classification method based on vector standardization and knowledge distillation
CN112184508A (en) * 2020-10-13 2021-01-05 上海依图网络科技有限公司 Student model training method and device for image processing
CN112559784A (en) * 2020-11-02 2021-03-26 浙江智慧视频安防创新中心有限公司 Image classification method and system based on incremental learning
CN112560978A (en) * 2020-12-23 2021-03-26 北京市商汤科技开发有限公司 Image processing method and device, electronic device and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837270A (en) * 2021-09-18 2021-12-24 广东人工智能与先进计算研究院 Target identification method, device, equipment and storage medium
CN114419378A (en) * 2022-03-28 2022-04-29 杭州未名信科科技有限公司 Image classification method and device, electronic equipment and medium
CN114663714A (en) * 2022-05-23 2022-06-24 阿里巴巴(中国)有限公司 Image classification and ground object classification method and device
CN116028891A (en) * 2023-02-16 2023-04-28 之江实验室 Industrial anomaly detection model training method and device based on multi-model fusion
CN116028891B (en) * 2023-02-16 2023-07-14 之江实验室 Industrial anomaly detection model training method and device based on multi-model fusion
CN116758618A (en) * 2023-08-16 2023-09-15 苏州浪潮智能科技有限公司 Image recognition method, training device, electronic equipment and storage medium
CN116758618B (en) * 2023-08-16 2024-01-09 苏州浪潮智能科技有限公司 Image recognition method, training device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112434721B (en) Image classification method, system, storage medium and terminal based on small sample learning
CN109214343B (en) Method and device for generating face key point detection model
CN111797893B (en) Neural network training method, image classification system and related equipment
CN111401516B (en) Searching method for neural network channel parameters and related equipment
CN113408570A (en) Image category identification method and device based on model distillation, storage medium and terminal
WO2019232772A1 (en) Systems and methods for content identification
CN111950596A (en) Training method for neural network and related equipment
CN110738102A (en) face recognition method and system
CN111414915B (en) Character recognition method and related equipment
CN113065635A (en) Model training method, image enhancement method and device
CN112580720A (en) Model training method and device
CN113095475A (en) Neural network training method, image processing method and related equipment
WO2021190433A1 (en) Method and device for updating object recognition model
CN111694954B (en) Image classification method and device and electronic equipment
CN114896067A (en) Automatic generation method and device of task request information, computer equipment and medium
CN112529149A (en) Data processing method and related device
CN116994021A (en) Image detection method, device, computer readable medium and electronic equipment
CN110069997B (en) Scene classification method and device and electronic equipment
CN115238909A (en) Data value evaluation method based on federal learning and related equipment thereof
CN113627421A (en) Image processing method, model training method and related equipment
CN111445545B (en) Text transfer mapping method and device, storage medium and electronic equipment
CN113408571B (en) Image classification method and device based on model distillation, storage medium and terminal
CN115713669B (en) Image classification method and device based on inter-class relationship, storage medium and terminal
CN112257840A (en) Neural network processing method and related equipment
CN111611917A (en) Model training method, feature point detection device, feature point detection equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210917