CN114663941A

CN114663941A - Feature detection method, model merging method, device, and medium

Info

Publication number: CN114663941A
Application number: CN202210264286.9A
Authority: CN
Inventors: 曾梦萍
Original assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Current assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Priority date: 2022-03-17
Filing date: 2022-03-17
Publication date: 2022-06-24

Abstract

The invention discloses a model merging method, which comprises the following steps: firstly, obtaining a plurality of face training images without labels, respectively inputting a first face image into a plurality of teacher models, obtaining first characteristic information output by each teacher model, and taking the first characteristic information output by all the teacher models as the labels of the first face images; inputting the first face image carrying the label into a student model, and acquiring second characteristic information which is output by the student model and corresponds to a plurality of characteristics; and finally, calculating the distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, adjusting parameters of the student model according to the distillation loss, and performing the previous training step until the student model converges. Therefore, the student model has the feature detection capability of all teacher models, the purposes of model combination and model occupation reduction are achieved, and edge deployment is conveniently realized in one step. In addition, a feature detection method, apparatus, and medium are also presented.

Description

Feature detection method, model merging method, device, and medium

Technical Field

The present invention relates to the field of computer vision technology, and in particular, to a feature detection method, a model merging method, a device, and a medium.

Background

Due to the variation of actual traffic, more and more traffic is more prone to edge deployment. Edge deployment is mainly used for embedded devices, and is mainly integrated into the embedded devices by packaging and packaging models into Software Development Kits (SDKs), and data processing and model reasoning are performed on terminal devices. However, the number of the existing models is too large, and if the existing models are deployed one by one, edge deployment is very disadvantageous.

Disclosure of Invention

Based on this, it is necessary to provide a feature detection method, a model merging method, a device and a medium to solve the problem that the existing model is not favorable for edge deployment.

A model merging method applied to a model set, the model set including a plurality of teacher models and a student model with the same network structure, the plurality of teacher models having converged, different teacher models being used for detecting different features, and the student model having not converged, the method comprising:

acquiring a plurality of face training images without carrying labels, respectively inputting first face images into the plurality of teacher models, acquiring first characteristic information output by each teacher model, and taking the first characteristic information output by all the teacher models as the labels of the first face images; the first face image is any one of the plurality of face training images without carrying labels, and first characteristic information output by different teacher models is different;

inputting a first face image carrying a label into the student model, and acquiring second feature information corresponding to a plurality of features output by the student model;

and calculating distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, adjusting parameters of the student models according to the distillation loss, and returning to execute the step of inputting the first face images into the plurality of teacher models respectively until the student models converge.

In one embodiment, the first characteristic information and the second characteristic information each include a characteristic severity and a confidence level, the distillation loss function includes a plurality of characteristic distillation loss functions, one characteristic distillation loss function corresponds to one characteristic, and one characteristic distillation loss function includes one soft tag loss function and one hard tag loss function;

calculating the distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, wherein the calculation comprises the following steps:

calculating the soft label loss of the target feature according to the feature severity corresponding to the target feature in the first feature information and the feature severity corresponding to the target feature in the second feature information through the soft label loss function; wherein the target feature is any one of the features detected by the plurality of teacher models;

calculating the hard tag loss of the target feature according to the confidence degree corresponding to the target feature in the first feature information and the feature severity degree corresponding to the target feature in the second feature information through the hard tag loss function;

calculating a first product of a first weight and the soft tag loss and a second product of a second weight and the hard tag loss, and taking the sum of the first product and the second product as the characteristic distillation loss corresponding to the target characteristic; wherein a sum of the first weight and the second weight is 1;

and calculating the sum of the characteristic distillation losses corresponding to all the characteristics to obtain the distillation loss.

In one embodiment, the soft tag loss function is formulated as:

wherein N is the total degree of the feature severity corresponding to the target feature;

indicating the value of the degree j of the characteristic severity corresponding to the target characteristic in the first characteristic information under the condition that the temperature is T;

and indicating the value of the degree j of the characteristic severity corresponding to the target characteristic in the second characteristic information under the condition that the temperature is equal to T.

In one of the embodiments of the present invention,

wherein v is_jIndicating the output of the target characteristics on the full connection layer in the teacher model in the degree j;

indicating the sum of all outputs on the fully connected layer in the teacher model;

wherein z is_jIndicating the output of the degree j on the full connection layer in the student model corresponding to the target feature;

indicating the sum of all outputs on the fully connected layer in the student model.

In one embodiment, the hard tag loss function is formulated as:

wherein M is_jAnd the confidence degree corresponding to the target feature in the first feature information is obtained.

In one embodiment, before calculating the sum of the distillation losses of the features corresponding to all the features, the method further includes:

assigning a corresponding characteristic weight to the characteristic distillation loss corresponding to each characteristic; wherein the characteristic weight is determined based on the rate of decrease of the distillation loss for different characteristics.

In one embodiment, the feature weight of feature a is expressed as:

wherein L is_a(t-1) indicating the characteristic distillation loss of characteristic a at the t-1 st step of adjusting the parameter; l is a radical of an alcohol_a(t-2) indicates the characteristic distillation loss of characteristic a at the t-2 th step of adjusting the parameter.

In one specific embodiment, the first feature information and the second feature information each include: feature location information for a plurality of features;

the method further comprises the following steps:

calculating the position loss of the target feature according to feature position information corresponding to the target feature in the first feature information and feature position information corresponding to the target feature in the second feature information through a preset position loss function; wherein the target feature is any one of the features detected by the plurality of teacher models;

adjusting parameters of the student model according to the location loss of all features.

In one embodiment, the position loss function is:

wherein i ∈ Pos indicates a frame with an object, and m ∈ { cx, cy, w, h } indicates a center position of the frame(cx, cy), width w, height h;

indicating whether the feature position information in the first feature information is matched with the feature position information corresponding to the target feature in the second feature information with respect to the degree k, wherein the matching is 1 and the mismatching is 0;

indicating feature location information in the first feature information,

indicating feature location information in the second feature information.

A method of feature detection, the method comprising:

acquiring a face image to be detected, and inputting the face image to be detected into a student model to obtain second feature information of a plurality of features corresponding to the face image to be detected; the student model is obtained through the model combination method training.

A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to perform the steps of the model merging method and the feature detection method described above.

A terminal device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the model merging method and the feature detection method described above.

The invention provides a feature detection method, a model merging method, equipment and a medium, wherein a plurality of human face training images without labels are obtained firstly, a first human face image is input into a plurality of teacher models respectively, first feature information output by each teacher model is obtained, and the first feature information output by all the teacher models is used as the labels of the first human face images; inputting the first face image carrying the label into a student model, and acquiring second characteristic information which is output by the student model and corresponds to a plurality of characteristics; and finally, calculating the distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, adjusting parameters of the student model according to the distillation loss, and returning to execute the step of inputting the first face image into the plurality of teacher models respectively until the student model converges. Therefore, the invention guides one student model through a plurality of convergent teacher models until the student model converges, so that the student model has the feature detection capability of all the teacher models, the purposes of model combination and model occupation reduction are achieved, and edge deployment is conveniently realized in one step.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Wherein:

FIG. 1 is a schematic flow chart diagram of a model merging method in one embodiment;

FIG. 2 is a schematic diagram of a face image with feature information in one embodiment;

fig. 3 is a block diagram of a terminal device in one embodiment.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Specifically, the "features" in the present invention refer to various details that need to be detected in the face image, and include, but are not limited to, details such as "blackheads", "wrinkles", and "black eyes". For the convenience of illustration, the following description will mainly use two features of "black head" and "wrinkle" as examples. It will of course be appreciated that the "blackheads" and "wrinkles" in the following schemes may equally be replaced by any other feature, or the number of features may be additionally increased.

As shown in fig. 1, fig. 1 is a schematic flow chart of a model merging method in an embodiment. The model merging method is applied to a model set, the model set comprises a plurality of teacher models and a student model with the same network structure, the teacher models are converged, different teacher models are used for detecting different characteristics, and the student model is not converged. The unconverged student model is trained by a plurality of converged teacher models, so that the student model has the feature detection capability of all the teacher models, and model combination is completed.

The method for merging models in this embodiment provides steps including:

102, obtaining a plurality of human face training images without labels, inputting a first human face image into a plurality of teacher models respectively, obtaining first characteristic information output by each teacher model, and taking the first characteristic information output by all the teacher models as the labels of the first human face image.

Firstly, shooting a plurality of face training images without labels by using a shooting device, or calling some face training images without labels from a database, or obtaining a plurality of face training images without labels by using other methods, which is not specifically limited herein.

It is understood that the obtained face training images are different, and the reasons for the difference include the difference of the face angle and/or the difference of the image size. Therefore, certain preprocessing operations are required to be performed on the obtained face training images, and the preprocessing operations include: according to a face key point algorithm, for example, a face 68-point Landmark model algorithm of a Dlib library is used to obtain the central positions of two eyeballs and the central position of a nose, then the two central positions are connected and combined with a vertical line to calculate the angle theta of left-right rotation of the face, and finally, a rotation change matrix is used to adjust the face image by taking the nose coordinate information as the center, wherein the specific calculation formula is as follows:

wherein x and y are two-dimensional coordinates of pixels in the original face training image respectively, and x^′And y^′And (4) adjusting the two-dimensional coordinates of the pixels in the face training image. Thus, the angle correction can be carried out on the distorted face training image.

And intercepting a face effective area according to the corrected image by taking the nose center coordinate as the center and the maximum distance between face key points as the length according to the face key point coordinate, so that only the effective face image part is predicted subsequently, and the processing efficiency is properly improved. And performing scale normalization operation to obtain face training images with consistent sizes, wherein the normalized sizes are unified to 1024 x 1024. Therefore, the initially obtained face training images all reach a certain processing standard.

Further, any one of the plurality of human face training images without carrying the label is selected as a first human face image and is respectively input into the plurality of converged teacher models. In this embodiment, because the method is applied to an embedded device, the plurality of teacher models may use lightweight convolutional neural network identification feature information, such as ssd-mobilenetV 2. Of course, lightweight models such as YOLO and Resnet may be selected as the teacher model, but it is assumed that the teacher model is trained in advance to converge. Wherein, the parameters of different characteristics are different, so that the first characteristic information output by different teacher models is different.

In this embodiment, the first feature information output by each teacher model includes feature position information, feature severity, and confidence, taking blackheads and wrinkles as an example. The characteristic position information is displayed in a square frame form and is used for marking the position of the characteristic, for example, the characteristic position information output by the blackhead teacher model is used for indicating the position of the blackhead, and the characteristic position information output by the wrinkle teacher model is used for indicating the position of the wrinkle. Feature severity is used to indicate the significance, quantitative degree, etc. of a feature; for example, the higher the severity of the feature output by the blackhead teacher model, the more pronounced the number of blackheads indicated. The higher the severity of the feature output by the wrinkle teacher model, the longer and more pronounced the wrinkle is indicated. The confidence level is used for indicating the confidence level of the feature severity, and the higher the confidence level is, the more credible the obtained feature severity is.

And the first characteristic information output by all the teacher models is used as a label of the first face image to replace the manual work to finish the labeling work of the first face image, so that the workload of manual labeling is reduced. In addition, the method is different from the previous detection output one-hot labeling form, and the confidence is introduced as a label, so that the possibility of student model learning is softened.

And 104, inputting the first face image carrying the label into a student model, and acquiring second characteristic information which is output by the student model and corresponds to a plurality of characteristics.

In this embodiment, since the network structure of the constructed student model is consistent with that of the teacher model, after the first face image carrying the label is input into the student model, the obtained second feature information also includes feature position information, feature severity and confidence. However, since the student model does not converge, at the beginning of training, the actual first feature information and the second feature information have a significant difference, and the parameters of the student model need to be adjusted through the step 106, so that the difference between the first feature information and the second feature information is reduced.

And 106, calculating the distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, and adjusting parameters of the student model according to the distillation loss.

In this embodiment, the distillation loss function includes a plurality of characteristic distillation loss functions, and the distillation loss function KD _ loss is composed of a plurality of characteristic distillation losses. For example, to detect blackheads and wrinkles, the predetermined characteristic distillation loss function includes a blackhead distillation loss function and a wrinkle distillation loss function, and the distillation loss is correspondingly composed of blackhead distillation loss and wrinkle distillation loss. Further, the method comprisesOne characteristic distillation loss function corresponds to one characteristic, and one characteristic distillation loss function comprises a soft tag loss function and a hard tag loss function. The blackhead distillation loss KD _ loss is calculated as a blackhead distillation loss function₁The description is given for the sake of example.

And calculating the soft label loss of the target feature according to the feature severity corresponding to the target feature (blackhead) in the first feature information and the feature severity corresponding to the target feature (blackhead) in the second feature information by using a soft label loss function.

Specifically, the formula of the soft tag loss function is:

where N is the total degree of severity of the target feature (blackhead), and for example, if the severity of the blackhead is set to 5, N is 5.

The value of the degree of severity j of the feature corresponding to the target feature in the first feature information is indicated for the feature severity corresponding to the target feature (blackhead) in the first feature information when the temperature is T.

Wherein v is_jIndicating the output of the target feature (blackhead) at degree j on the fully connected layer in the teacher model.

Indicating the sum of all outputs on the fully connected layer in the teacher model.

The feature corresponding to the target feature in the second feature information is severeAnd a degree indicating a value of degree j of the degree of severity of the feature corresponding to the target feature in the second feature information in the case where the temperature is T.

Wherein z is_jIndicating the output at degree j on the fully connected layer in the student model corresponding to the target feature (blackhead).

Meanwhile, calculating the hard tag loss L of the target feature according to the confidence degree corresponding to the target feature (blackhead) in the first feature information and the feature severity corresponding to the target feature (blackhead) in the second feature information by using a hard tag loss function_{out_label}。

Specifically, the formula of the hard tag loss function is:

Finally, the blackhead distillation loss KD _ loss is obtained based on the following formula₁：

KD_loss₁＝L_soft*α+L_{out_label}*(1-α)

Where α is the first weight and 1- α is the second weight.

According to the empirical parameter setting, let T be 20 and α be 0.8 in the above formula. Of course, the calculation of the distillation loss function of other characteristics is the same, and will not be described again.

Furthermore, considering that different features are different in the distribution and characteristics of the face, for example, wrinkles and blackheads are different in the distribution and characteristics of the face, the difficulty levels of learning different features may also be different. Therefore, when constructing KD _ loss, loss of different characteristics is constructed separately, and different weights are given, so that learning of different characteristics by the distillation loss function can be balanced, and the training progress of the characteristic distillation loss function can not be obviously different.

Specifically, a corresponding characteristic weight is given to the characteristic distillation loss corresponding to each characteristic; wherein the characteristic weight is determined based on the reduction rate of the distillation loss of different characteristics.

Illustratively, the feature weight of feature a is represented as:

wherein L is_a(t-1) indicating the characteristic distillation loss of characteristic a at the t-1 st step of adjusting the parameter; l is_a(t-2) indicates the characteristic distillation loss of characteristic a at the t-2 th step of adjusting the parameter. From this equation, the more rapidly the characteristic distillation loss of the characteristic a decreases, the simpler the instruction learning difficulty, and therefore the β is set_aThe smaller. Conversely, when the characteristic distillation loss of the characteristic a decreases more slowly, it is more difficult to indicate learning, and therefore, the β is set_aThe larger. Thus, the distillation loss function is more biased to the more difficult side in the learning process, and the training progress of each characteristic distillation loss function is basically consistent.

Accordingly, the distillation loss function is expressed as:

KD_loss＝β₁KD-loss₁+β₂KD_loss₂…+β_MKD_loss_M

where M indicates the number of features, for example, the detected features include "black heads", "wrinkles", and "black eyes", and then M is 3.

In addition, for the feature position information, the position loss of the target feature (for example, the black head) is calculated according to the feature position information corresponding to the target feature (for example, the black head) in the first feature information and the feature position information corresponding to the target feature (for example, the black head) in the second feature information through a preset position loss function. Of course, the target feature may be any other one of the plurality of teacher-model-detected features, such as a wrinkle.

Specifically, the position loss function is:

wherein i belongs to Pos and indicates a frame with an object, and m belongs to { cx, cy, w, h } and indicates the central position (cx, cy), width w and height h of the frame;

indicating feature location information in the first feature information,

indicating feature location information in the second feature information.

And adjusting parameters of the student model according to the position loss of all the characteristics, so that the position information of each characteristic output by the student model continuously approaches the position information which is marked by the sample and is about the same characteristic and has the same degree k, and the student model can learn the detection capability of the teacher model about the position information of all the characteristics.

If the student model has converged through the training of step 102-106, the training is stopped. If the student model is not converged after the training of step 102-106, the step 102 is executed again until the student model is converged.

The model merging method comprises the steps of firstly obtaining a plurality of face training images without labels, respectively inputting a first face image into a plurality of teacher models, obtaining first characteristic information output by each teacher model, and taking the first characteristic information output by all the teacher models as the labels of the first face images; inputting the first face image carrying the label into a student model, and acquiring second characteristic information which is output by the student model and corresponds to a plurality of characteristics; and finally, calculating the distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, adjusting parameters of the student model according to the distillation loss, and returning to execute the step of inputting the first face image into the plurality of teacher models respectively until the student model converges. Therefore, the invention guides one student model through a plurality of convergent teacher models until the student model converges, so that the student model has the feature detection capability of all the teacher models, the purposes of model combination and model occupation reduction are achieved, and edge deployment is conveniently realized in one step.

Furthermore, the trained student model is deployed in the embedded device, when feature detection is required to be performed on the face image to be detected, the face image to be detected can be acquired based on the embedded device, the face image to be detected is input into the converged student model, feature detection is performed through the converged student model, and finally an effect graph similar to the second feature information containing multiple features shown in fig. 2 is obtained in the embedded device. Therefore, the purpose of combining the models and reducing the occupied amount of the models can be realized, and the edge deployment is facilitated.

Fig. 3 shows an internal configuration diagram of a terminal device in one embodiment. As shown in fig. 3, the terminal device includes a processor, a memory, and a network interface connected by a system bus. The memory comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium of the terminal device stores an operating system and also stores a computer program, and when the computer program is executed by a processor, the computer program can enable the processor to realize a model merging method and a feature detection method. The internal memory may also store a computer program that, when executed by the processor, causes the processor to perform a model merging method and a feature detection method. It will be understood by those skilled in the art that the structure shown in fig. 3 is a block diagram of only a portion of the structure associated with the inventive arrangements, and does not constitute a limitation on the terminal device to which the inventive arrangements are applied, and that a particular terminal device may include more or less components than those shown, or combine certain components, or have a different arrangement of components.

A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer program: the method comprises the steps of obtaining a plurality of face training images without labels, inputting first face images into a plurality of teacher models respectively, obtaining first characteristic information output by each teacher model, and taking the first characteristic information output by all the teacher models as labels of the first face images; the first face image is any one of a plurality of face training images without labels, and first characteristic information output by different teacher models is different; inputting the first face image carrying the label into a student model, and acquiring second characteristic information which is output by the student model and corresponds to a plurality of characteristics; and calculating the distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, adjusting parameters of the student model according to the distillation loss, and returning to execute the step of inputting the first face image into the plurality of teacher models respectively until the student model converges.

And performing the steps of: and acquiring a face image to be detected, and inputting the face image to be detected into the student model to obtain second feature information of a plurality of features corresponding to the face image to be detected.

A computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of: the method comprises the steps of obtaining a plurality of face training images without labels, inputting first face images into a plurality of teacher models respectively, obtaining first characteristic information output by each teacher model, and taking the first characteristic information output by all the teacher models as labels of the first face images; the first face image is any one of a plurality of face training images without labels, and first characteristic information output by different teacher models is different; inputting the first face image carrying the label into a student model, and acquiring second characteristic information which is output by the student model and corresponds to a plurality of characteristics; and calculating the distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, adjusting parameters of the student model according to the distillation loss, and returning to execute the step of inputting the first face image into the plurality of teacher models respectively until the student model converges.

And performing the steps of: and acquiring a face image to be detected, and inputting the face image to be detected into the student model to obtain second characteristic information of a plurality of characteristics corresponding to the face image to be detected.

It should be noted that the above-mentioned feature detection method, model merging method, device and computer-readable storage medium belong to a general inventive concept, and the contents in the embodiments of the feature detection method, model merging method, device and computer-readable storage medium are mutually applicable.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A model merging method applied to a model set including a plurality of teacher models and a student model having the same network structure, the plurality of teacher models each having converged, different teacher models being used to detect different features, and the student model having not converged, the method comprising:

2. The method of claim 1, wherein the first characteristic information and the second characteristic information each comprise a characteristic severity and a confidence level, the distillation loss function comprises a plurality of characteristic distillation loss functions, one characteristic distillation loss function for each characteristic, one characteristic distillation loss function comprising one soft tag loss function and one hard tag loss function;

calculating a first product of a first weight and the soft tag loss and a second product of a second weight and the hard tag loss, and taking the sum of the first product and the second product as the characteristic distillation loss corresponding to the target characteristic; wherein the sum of the first weight and the second weight is 1;

3. The method of claim 2, wherein the soft tag loss function is formulated as:

4. The method of claim 3,

5. The method of claim 3, wherein the hard tag loss function is formulated as:

6. The method of claim 2, wherein said calculating the sum of the characteristic distillation losses for all of the characteristics further comprises:

7. The method of claim 6, wherein the feature weight of feature a is expressed as:

wherein L is_a(t-1) indicating the characteristic distillation loss of characteristic a at the t-1 st step of adjusting the parameter; l is_a(t-2) indicates the characteristic distillation loss of characteristic a at the t-2 th step of adjusting the parameter.

8. The method of claim 1, wherein the first feature information and the second feature information each comprise: feature location information for a plurality of features;

the method further comprises the following steps:

and adjusting parameters of the student model according to the position loss of all the characteristics.

9. The method of claim 8, wherein the position loss function is:

indicating feature location information in the first feature information,

indicating feature location information in the second feature information.

10. A method of feature detection, the method comprising:

acquiring a face image to be detected, and inputting the face image to be detected into a student model to obtain second feature information of a plurality of features corresponding to the face image to be detected; wherein the student model is trained by the method of any one of claims 1 to 9.

11. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 10.

12. A terminal device comprising a memory and a processor, characterized in that the memory stores a computer program which, when executed by the processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 10.