CN113408570A

CN113408570A - Image category identification method and device based on model distillation, storage medium and terminal

Info

Publication number: CN113408570A
Application number: CN202110499204.4A
Authority: CN
Inventors: 廖丹萍
Original assignee: Zhejiang Smart Video Security Innovation Center Co Ltd
Current assignee: Zhejiang Smart Video Security Innovation Center Co Ltd
Priority date: 2021-05-08
Filing date: 2021-05-08
Publication date: 2021-09-17

Abstract

The invention discloses an image category identification method, device, storage medium and terminal based on model distillation, wherein the method comprises the following steps: acquiring a target image to be classified, inputting the target image into a pre-trained student model, and outputting a plurality of class probability values; the pre-trained student model is generated based on model distillation training, the model distillation training generation is generated based on similarity training between different classification mapping vectors of a full connection layer in the pre-trained teacher model, and the similarity between the different classification mapping vectors is a cosine value of an included angle between the different classification mapping vectors; and identifying a target category of the target image to be classified based on the plurality of category probability values. Therefore, the similarity between different classification mapping vectors of the full connection layer in the teacher model based on pre-training is adopted to guide the student model to train, so that the student model can learn the characteristics of the similarity with the teacher model, and the student model has a simple structure and few parameters, so that the running speed of a hardware platform is increased, and the image classification efficiency is further improved.

Description

Image category identification method and device based on model distillation, storage medium and terminal

Technical Field

The invention relates to the technical field of computer vision, in particular to an image category identification method and device based on model distillation, a storage medium and a terminal.

Background

In recent years, deep neural networks have enabled the performance of many computer vision tasks to unprecedented levels. The more complex the model structure of the neural network is, the more the parameters are, the richer the knowledge that the network can learn, and the better the learning effect is. However, the high amount of storage space and computing resources make it difficult to apply the large network model to various mobile platforms, and therefore, designing a network model that is lighter and has good performance becomes one of the key researches for landing application of computer vision algorithms.

In the prior art, model lightweight generally adopts a model compression method to reduce the consumption of the model on calculation space and time by performing parameter clipping, weight decomposition or model distillation on a large model. However, prior relations of classification mapping vectors between classes cannot be effectively utilized in model training in the prior art, so that when image classification is performed by using a trained model, a classification result is not accurate enough.

Disclosure of Invention

The embodiment of the application provides an image category identification method and device based on model distillation, a storage medium and a terminal. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.

In a first aspect, an embodiment of the present application provides an image class identification method based on model distillation, including:

acquiring a target image to be classified;

inputting a target image to be classified into a pre-trained student model, and outputting a plurality of class probability values corresponding to the target image;

the pre-trained student model is generated based on model distillation training, the model distillation training generation is generated based on similarity training between different classification mapping vectors of a full connection layer in the pre-trained teacher model, and the similarity between the different classification mapping vectors is a cosine value of an included angle between the different classification mapping vectors;

and identifying a target category of the target image to be classified based on the plurality of category probability values.

Optionally, identifying the target category of the target image to be classified based on the plurality of category probability values includes:

selecting a maximum category probability value of a plurality of category probability values;

identifying a target category corresponding to the selected maximum category probability value;

and determining the target class as the class of the target image to be classified.

Optionally, the pre-trained student model at least comprises a feature extraction layer, a full connection layer and a normalization layer;

inputting a target image to be classified into a pre-trained student model, and outputting a plurality of category probability values corresponding to the target image, wherein the category probability values comprise:

inputting a target image to be classified into a feature extraction layer for feature extraction to generate target features;

inputting the target characteristics into a full connection layer, and outputting a plurality of category confidence degrees corresponding to the target image;

and inputting the confidence degrees of the multiple categories into a normalization layer, and outputting multiple category probability values corresponding to the target image.

Optionally, generating a pre-trained teacher model according to the following steps, including:

collecting various types of image sets to generate model training samples;

creating a teacher model;

inputting the model training sample into a teacher model for training, and generating a trained teacher model;

and determining the trained teacher model as a pre-trained teacher model.

Optionally, generating a pre-trained student model according to the following steps, including:

constructing a first similarity matrix S aiming at different classification mapping vectors of all connection layers in a pre-trained teacher model_teacher∈R^k×kK is the number of classes, R is a real number;

creating a student model; wherein the parameter quantity of the student model is less than that of the teacher model;

constructing a second similarity matrix S aiming at different category classification mapping vectors of all connection layers in the student model_student∈R^k×k；

According to the first similarity matrix S_teacher∈R^k×kWith the second similarity matrix S_teacher∈R^k×kConstructing a target loss function of the student model;

associating the target loss function to the student model to generate the student model after the association function;

acquiring an nth image from a model training sample, and inputting the nth image into a student model after a correlation function for training;

and when the iterative training times of the model are smaller than a preset value, continuing to execute the step of acquiring the (n + 1) th image from the model training sample, and when n +1 is larger than the model training sample, randomly arranging the sequence of the images in the model training sample, and resetting n to be 1.

Optionally, the constructing a first similarity matrix for different classification mapping vectors of the fully-connected layer in the pre-trained teacher model includes:

calculating cosine values of included angles among different classes of classification mapping vectors of all connection layers in a pre-trained teacher model to generate a first similarity matrix;

and the number of the first and second groups,

calculating cosine values of included angles among different classification mapping vectors of all connection layers in the student model to generate a second similarity matrix;

wherein, the similarity matrix calculation formula is as follows:

S_teacher(i，j)＝cosine(S_i，S_j) Wherein S is_teacher(i, j) a classification mapping vector S representing a class i_iClass mapping vector S with class j_jAnd (5) mapping cosine values of vector included angles in a classified mode.

Optionally, the calculation formula of the target loss function is:

where λ is the weight of the similarity matrix loss function, L_distillDistillation loss for student model, L_distillThe calculation formula of (2) is as follows: l is_distill＝∑_i-p_i×logq_iP and q are vectors output by the normalization layer of the teacher model and the student model respectively;

a normalization layer may be defined as:

wherein the vector Z is the vector of the output of the fully connected layer, q_iRepresenting a probability value, Z, of the ith category_iZ, dimension of the output vector representing the full connected layer_jAnd j dimension representing the output vector of the full connection layer, and T is a parameter for controlling the smoothness of the output probability.

In a second aspect, an embodiment of the present application provides an image class identification apparatus based on model distillation, including:

the image acquisition module is used for acquiring a target image to be classified;

the probability value output module is used for inputting the target images to be classified into a pre-trained student model and outputting a plurality of class probability values corresponding to the target images;

and the category identification module is used for identifying the target category of the target image to be classified based on the plurality of category probability values.

In a third aspect, embodiments of the present application provide a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.

In a fourth aspect, an embodiment of the present application provides a terminal, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

in the embodiment of the application, the image class recognition device based on model distillation firstly acquires target images to be classified and inputs the target images into a pre-trained student model, and then outputs a plurality of class probability values, wherein the pre-trained student model is generated based on model distillation training, the model distillation training generation is generated based on similarity training between different class classification mapping vectors of a full connection layer in a pre-trained teacher model, the similarity between the different class classification mapping vectors is a cosine value of an included angle between the different class classification mapping vectors, and finally the target classes of the target images to be classified are recognized based on the plurality of class probability values. Therefore, the similarity between different classification mapping vectors of the full connection layer in the teacher model based on pre-training is adopted to guide the student model to train, so that the student model can learn the characteristics of the similarity with the teacher model, and the student model has a simple structure and few parameters, so that the running speed of a hardware platform is increased, and the image classification efficiency is further improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a schematic flowchart of an image class identification method based on model distillation according to an embodiment of the present application;

FIG. 2 is a schematic block diagram of a process of image class identification based on model distillation according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of an apparatus for identifying image types based on model distillation according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a terminal according to an embodiment of the present application.

Detailed Description

The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.

It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The image class identification method based on model distillation provided by the embodiment of the present application will be described in detail below with reference to fig. 1 to 2. The method may be implemented in dependence on a computer program, executable on a model distillation based image class identification device based on the von neumann architecture. The computer program may be integrated into the application or may run as a separate tool-like application. The image category identification device based on model distillation in the embodiment of the present application may be a user terminal, including but not limited to: personal computers, tablet computers, handheld devices, in-vehicle devices, wearable devices, computing devices or other processing devices connected to a wireless modem, and the like. The user terminals may be called different names in different networks, for example: user equipment, access terminal, subscriber unit, subscriber station, mobile station, remote terminal, mobile device, user terminal, wireless communication device, user agent or user equipment, cellular telephone, cordless telephone, Personal Digital Assistant (PDA), terminal equipment in a 5G network or future evolution network, and the like.

Referring to fig. 1, a schematic flow chart of an image class identification method based on model distillation is provided for an embodiment of the present application. As shown in fig. 1, the method of the embodiment of the present application may include the following steps:

s101, acquiring a target image to be classified;

the target image to be classified is an image used for testing the performance of a student model or an image acquired when the student model is applied to a classification application scene.

Generally, when the target image to be classified is an image for testing the performance of a student model, the target image to be classified may be obtained from a test sample, may also be an image obtained from a user terminal, and may also be an image downloaded from a cloud. When the target image to be classified is an image acquired when the student model is applied to the classification application scene, the image to be classified may be an image acquired in real time by the image acquisition device.

In a possible implementation manner, after the training of the student model based on the teacher model is finished and the student model after the training is finished is deployed in an actual application scene, the object sensor or the object monitoring algorithm triggers the photographing function of image acquisition and photographing to acquire a target image entering a monitoring area after detecting that an object enters the camera monitoring area, and finally the target image is determined as a target image to be classified.

In another possible implementation manner, after training of a student model based on teacher model training is finished, the image classification performance of the trained student model needs to be detected, a user downloads any image with an object from a sample test set, a local gallery or a cloud through a user terminal, and the image is determined as a target image to be classified.

S102, inputting a target image to be classified into a pre-trained student model, and outputting a plurality of class probability values corresponding to the target image; the pre-trained student model is generated based on model distillation training, the model distillation training generation is generated based on similarity training between different classification mapping vectors of a full connection layer in the pre-trained teacher model, and the similarity between the different classification mapping vectors is a cosine value of an included angle between the different classification mapping vectors;

in the model distillation method, an algorithm firstly trains a large model with higher precision, which is called a teacher model. And then, the knowledge learned in the teacher model is used for guiding and training a student model with a small number of parameters. The student model improves the performance of the student model by learning information which is beneficial to classification in the teacher model. The student model has less parameters and high running speed, so that the student model can be conveniently deployed on various hardware platforms.

Typically, both teacher models and student models are created through a neural network, preferably a convolutional neural network.

In the fully connected classification layer, the more similar the classes, the more similar the mapping vectors are, i.e. the smaller the included angle between the vectors. The more dissimilar classes, the more dissimilar the class mapping vectors are. For example, the ImageNet dataset includes categories of dogs, wolves, airplanes, and the like. By analyzing the existing big model trained on ImageNet, we find that the included angle of the classification mapping vector of the dog and the wolf is small, and the included angle of the mapping vector of the dog and the airplane is large. Therefore, the similarity of the classification mapping vectors can be used as an index for measuring the similarity of the classes.

It is desirable that the student models can learn similarities between classes in the teacher network during training to better model data distribution. That is, if the category a and the category B have a high degree of similarity in the teacher model, then in the student model, it is desirable that the degree of similarity between the category a and the category B can be maintained. If category a and category B have a lower degree of similarity in the teacher model, then category a and category B remain dissimilar in the student model. That is, it is desirable that the similarity between class mapping vectors in the student model classification layer can be consistent with that in the teacher model.

In the embodiment of the application, when a pre-trained teacher model is generated, firstly, a plurality of types of image sets are collected to generate a model training sample, then, the teacher model is created, then, the model training sample is input into the teacher model for training, a trained teacher model is generated, and finally, the trained teacher model is determined as the pre-trained teacher model.

In the embodiment of the application, when a pre-trained student model is generated, a first similarity matrix S is firstly constructed for different classification mapping vectors of all connection layers in the pre-trained teacher model_teacher∈R^k×kK is the number of classes, R is a real number; then, a student model is created, wherein the parameter quantity of the student model is smaller than that of the teacher model, and then a second similarity matrix S is constructed according to different classes of classification mapping vectors of all connection layers in the student model_student∈R^k×kThen according to the first similarity matrix S_teacher∈R^k×kWith the second similarity matrix S_teacher∈R^k×kConstructing a target loss function of a student model, then associating the target loss function to the student model to generate the student model after the association function, finally obtaining an nth image from a model training sample, inputting the nth image into the student model after the association function for training, when the iterative training times of the model is less than a preset value, continuing to execute the step of obtaining an (n + 1) th image from the model training sample, and when n +1 is greater than the model training sample, carrying out the step of obtaining a model training sampleThe order of the images in (1) is randomly arranged, and n is reset to 1.

Specifically, the calculation formula of the target loss function is as follows:

where λ is the weight of the similarity matrix loss function, L_distillDistillation loss for student model, L_distillThe calculation formula of (2) is as follows: l is_distill＝∑_i-p_i×logq_iP and q are vectors output by the normalization layer of the teacher model and the student model respectively; a normalization layer may be defined as:

Further, a first similarity matrix S is constructed by aiming at different classification mapping vectors of the full connection layer in the pre-trained teacher model_teacherFirstly, calculating cosine values of included angles among different classification mapping vectors of all connection layers in a pre-trained teacher model to generate a first similarity matrix; calculating cosine values of included angles among different classification mapping vectors of all connection layers in the student model to generate a second similarity matrix; wherein, the cosine value calculation formula in the first cosine value set is: s_teacher(i，j)＝cosine(S_i，S_j) Wherein S is_teacher(i, j) a classification mapping vector S representing a class i_iClass mapping vector S with class j_jAnd (5) mapping cosine values of vector included angles in a classified mode.

In a possible implementation manner, after the target image to be classified is acquired based on step S101, when the target image to be classified is input into a pre-trained student model for processing, the pre-trained student model at least includes a feature extraction layer, a full connection layer, and a normalization layer, the target image to be classified is input into the feature extraction layer for feature extraction to generate a target feature, the target feature is input into the full connection layer to output a plurality of class confidence coefficients corresponding to the target image, and finally the plurality of class confidence coefficients are input into the normalization layer to output a plurality of class probability values corresponding to the target image.

For example, the probability values output after the image is processed by the pre-trained student model are as follows: the animal type probability is 23%, the human body type probability is 67%, and the other type probability is 10%, and it can be known from the output probability that the maximum probability is 67% of the human body type probability, and therefore, the object in the image is the human body type.

S103, identifying the target category of the target image to be classified based on the plurality of category probability values.

In a possible implementation manner, after obtaining a plurality of category probability values of an image to be classified, a maximum probability value of the plurality of category probability values is selected first, then a target category corresponding to the selected maximum probability value is identified, and finally the target category is determined as a category to which the target image to be classified belongs.

For example, as shown in fig. 2, fig. 2 is a schematic process diagram of an image category identification process based on model distillation, which is provided by the present application, and is implemented by first obtaining a target image, then inputting the target image into a pre-trained student model, inputting a probability value 1, a probability value 2, a probability value 3, and a probability value n after model processing, then selecting a probability value with a maximum probability value from a plurality of output probabilities, and determining a category corresponding to the probability value with the maximum probability value as a category to which the image finally belongs.

The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.

Referring to fig. 3, a schematic structural diagram of an image class identification apparatus based on model distillation according to an exemplary embodiment of the present invention is shown. The image category identification device based on model distillation can be realized by software, hardware or a combination of the two to form all or part of a terminal. The device 1 comprises an image acquisition module 10, a probability value output module 20 and a category identification module 30.

The image acquisition module 10 is used for acquiring a target image to be classified;

a probability value output module 20, configured to input a target image to be classified into a pre-trained student model, and output a plurality of class probability values corresponding to the target image;

and the category identification module 30 is used for identifying the target category of the target image to be classified based on a plurality of category probability values.

It should be noted that, when the image classification recognition apparatus based on model distillation provided in the above embodiment executes the image classification recognition method based on model distillation, the above-mentioned division of the functional modules is merely exemplified, and in practical applications, the above-mentioned function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to complete all or part of the above-mentioned functions. In addition, the image category identification device based on model distillation and the image category identification method based on model distillation provided by the above embodiments belong to the same concept, and details of implementation processes are shown in the method embodiments, which are not described herein again.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The present invention also provides a computer readable medium having stored thereon program instructions that, when executed by a processor, implement the method for model distillation based image class identification provided by the above-described method embodiments.

The present invention also provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method for model distillation based image class identification of the various method embodiments described above.

Please refer to fig. 4, which provides a schematic structural diagram of a terminal according to an embodiment of the present application. As shown in fig. 4, terminal 1000 can include: at least one processor 1001, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002.

Wherein a communication bus 1002 is used to enable connective communication between these components.

The user interface 1003 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.

The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.

Processor 1001 may include one or more processing cores, among other things. The processor 1001 interfaces various components throughout the electronic device 1000 using various interfaces and lines to perform various functions of the electronic device 1000 and to process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1005 and invoking data stored in the memory 1005. Alternatively, the processor 1001 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1001 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 1001, but may be implemented by a single chip.

The Memory 1005 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1005 includes a non-transitory computer-readable medium. The memory 1005 may be used to store an instruction, a program, code, a set of codes, or a set of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 4, a memory 1005, which is one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a model distillation based image class identification application.

In the terminal 1000 shown in fig. 4, the user interface 1003 is mainly used as an interface for providing input for a user, and acquiring data input by the user; and the processor 1001 may be configured to invoke the model distillation based image class identification application stored in the memory 1005 and specifically perform the following operations:

acquiring a target image to be classified;

In one embodiment, the processor 1001 specifically performs the following operations when performing the identification of the target class of the target image to be classified based on the plurality of class probability values:

In one embodiment, when the processor 1001 inputs the target image to be classified into a pre-trained student model and outputs a plurality of class probability values corresponding to the target image, the following operations are specifically performed:

In one embodiment, the processor 1001, when executing generating the pre-trained teacher model, specifically performs the following operations:

collecting various types of image sets to generate model training samples;

creating a teacher model;

and determining the trained teacher model as a pre-trained teacher model.

In one embodiment, the processor 1001, when performing generating the pre-trained student model, specifically performs the following operations:

constructing second similarity according to different classification mapping vectors of all connection layers in student modelMatrix S_student∈R^k×k；

In one embodiment, the processor 1001, when performing the construction of the first similarity matrix for different category classification mapping vectors of the fully connected layers within the pre-trained teacher model, specifically performs the following operations:

and the number of the first and second groups,

wherein, the cosine value calculation formula in the first cosine value set is:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program to instruct associated hardware, and the program for image class identification based on model distillation can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory or a random access memory.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. An image class identification method based on model distillation, which is characterized by comprising the following steps:

acquiring a target image to be classified;

inputting the target image to be classified into a pre-trained student model, and outputting a plurality of category probability values corresponding to the target image;

the pre-trained student model is generated based on model distillation training, the model distillation training generation is generated based on similarity training between different classification mapping vectors of a full connection layer in a pre-trained teacher model, and the similarity between the different classification mapping vectors is a cosine value of an included angle between the different classification mapping vectors;

identifying a target class of the target image to be classified based on the plurality of class probability values.

2. The method of claim 1, wherein identifying the target class of the target image to be classified based on the plurality of class probability values comprises:

selecting a maximum category probability value of the plurality of category probability values;

3. The method of claim 1, wherein the pre-trained student model comprises at least a feature extraction layer, a full connection layer, and a normalization layer;

the inputting the target image to be classified into a pre-trained student model, and outputting a plurality of category probability values corresponding to the target image, includes:

inputting the target image to be classified into the feature extraction layer for feature extraction to generate target features;

inputting the target features into the full-connection layer, and outputting a plurality of category confidence degrees corresponding to the target image;

and inputting the plurality of category confidence degrees into the normalization layer, and outputting a plurality of category probability values corresponding to the target image.

4. The method of claim 1, wherein generating a pre-trained teacher model comprises:

collecting various types of image sets to generate model training samples;

creating a teacher model;

inputting the model training sample into the teacher model for training, and then generating a trained teacher model;

and determining the trained teacher model as a pre-trained teacher model.

5. The method of claim 4, wherein generating a pre-trained student model comprises:

constructing a first similarity matrix S aiming at different classification mapping vectors of all connection layers in the pre-trained teacher model_teacher∈R^k×kK is the number of classes, R is a real number;

creating a student model; wherein the parameter quantity of the student model is smaller than the parameter quantity of the teacher model;

According to the first similarity matrix S_teacher∈R^k×kWith said second similarity matrix S_teacher∈R^k×kConstructing an objective loss function of the student model;

associating the target loss function to the student model to generate a student model after an association function;

acquiring an nth image from the model training sample, and inputting the nth image into the student model after the correlation function for training;

and when the iterative training times of the model are smaller than a preset value, continuing to execute the step of acquiring the (n + 1) th image from the model training sample, and when the n +1 is larger than the model training sample, randomly arranging the sequence of the images in the model training sample, and resetting n to be 1.

6. The method of claim 6, wherein constructing a first similarity matrix for different class classification mapping vectors for fully connected layers within the pre-trained teacher model comprises:

calculating cosine values of included angles among different classes of classification mapping vectors of all connection layers in the pre-trained teacher model to generate a first similarity matrix;

and the number of the first and second groups,

wherein, the similarity matrix calculation formula is as follows:

7. The method of claim 5, wherein the objective loss function is calculated by:

where λ is the weight of the similarity matrix loss function, L_distillFor the distillation loss of the student model, L_distillThe calculation formula of (2) is as follows: l is_distill＝∑_i-p_i×logq_iP and q are vectors output by the normalization layer of the teacher model and the student model respectively;

the normalization layer may be defined as:

8. An image class identification device based on model distillation, characterized in that the device comprises:

the probability value output module is used for inputting the target image to be classified into a pre-trained student model and outputting a plurality of category probability values corresponding to the target image;

and the class identification module is used for identifying the target class of the target image to be classified based on the plurality of class probability values.

9. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the method steps according to any of claims 1-7.

10. A terminal, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-7.