CN114663941A - Feature detection method, model merging method, device, and medium - Google Patents

Feature detection method, model merging method, device, and medium Download PDF

Info

Publication number
CN114663941A
CN114663941A CN202210264286.9A CN202210264286A CN114663941A CN 114663941 A CN114663941 A CN 114663941A CN 202210264286 A CN202210264286 A CN 202210264286A CN 114663941 A CN114663941 A CN 114663941A
Authority
CN
China
Prior art keywords
feature
characteristic
information
model
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210264286.9A
Other languages
Chinese (zh)
Inventor
曾梦萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Original Assignee
Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Shuliantianxia Intelligent Technology Co Ltd filed Critical Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Priority to CN202210264286.9A priority Critical patent/CN114663941A/en
Publication of CN114663941A publication Critical patent/CN114663941A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a model merging method, which comprises the following steps: firstly, obtaining a plurality of face training images without labels, respectively inputting a first face image into a plurality of teacher models, obtaining first characteristic information output by each teacher model, and taking the first characteristic information output by all the teacher models as the labels of the first face images; inputting the first face image carrying the label into a student model, and acquiring second characteristic information which is output by the student model and corresponds to a plurality of characteristics; and finally, calculating the distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, adjusting parameters of the student model according to the distillation loss, and performing the previous training step until the student model converges. Therefore, the student model has the feature detection capability of all teacher models, the purposes of model combination and model occupation reduction are achieved, and edge deployment is conveniently realized in one step. In addition, a feature detection method, apparatus, and medium are also presented.

Description

Feature detection method, model merging method, device, and medium
Technical Field
The present invention relates to the field of computer vision technology, and in particular, to a feature detection method, a model merging method, a device, and a medium.
Background
Due to the variation of actual traffic, more and more traffic is more prone to edge deployment. Edge deployment is mainly used for embedded devices, and is mainly integrated into the embedded devices by packaging and packaging models into Software Development Kits (SDKs), and data processing and model reasoning are performed on terminal devices. However, the number of the existing models is too large, and if the existing models are deployed one by one, edge deployment is very disadvantageous.
Disclosure of Invention
Based on this, it is necessary to provide a feature detection method, a model merging method, a device and a medium to solve the problem that the existing model is not favorable for edge deployment.
A model merging method applied to a model set, the model set including a plurality of teacher models and a student model with the same network structure, the plurality of teacher models having converged, different teacher models being used for detecting different features, and the student model having not converged, the method comprising:
acquiring a plurality of face training images without carrying labels, respectively inputting first face images into the plurality of teacher models, acquiring first characteristic information output by each teacher model, and taking the first characteristic information output by all the teacher models as the labels of the first face images; the first face image is any one of the plurality of face training images without carrying labels, and first characteristic information output by different teacher models is different;
inputting a first face image carrying a label into the student model, and acquiring second feature information corresponding to a plurality of features output by the student model;
and calculating distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, adjusting parameters of the student models according to the distillation loss, and returning to execute the step of inputting the first face images into the plurality of teacher models respectively until the student models converge.
In one embodiment, the first characteristic information and the second characteristic information each include a characteristic severity and a confidence level, the distillation loss function includes a plurality of characteristic distillation loss functions, one characteristic distillation loss function corresponds to one characteristic, and one characteristic distillation loss function includes one soft tag loss function and one hard tag loss function;
calculating the distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, wherein the calculation comprises the following steps:
calculating the soft label loss of the target feature according to the feature severity corresponding to the target feature in the first feature information and the feature severity corresponding to the target feature in the second feature information through the soft label loss function; wherein the target feature is any one of the features detected by the plurality of teacher models;
calculating the hard tag loss of the target feature according to the confidence degree corresponding to the target feature in the first feature information and the feature severity degree corresponding to the target feature in the second feature information through the hard tag loss function;
calculating a first product of a first weight and the soft tag loss and a second product of a second weight and the hard tag loss, and taking the sum of the first product and the second product as the characteristic distillation loss corresponding to the target characteristic; wherein a sum of the first weight and the second weight is 1;
and calculating the sum of the characteristic distillation losses corresponding to all the characteristics to obtain the distillation loss.
In one embodiment, the soft tag loss function is formulated as:
Figure BDA0003551998390000021
wherein N is the total degree of the feature severity corresponding to the target feature;
Figure BDA0003551998390000022
indicating the value of the degree j of the characteristic severity corresponding to the target characteristic in the first characteristic information under the condition that the temperature is T;
Figure BDA0003551998390000031
and indicating the value of the degree j of the characteristic severity corresponding to the target characteristic in the second characteristic information under the condition that the temperature is equal to T.
In one of the embodiments of the present invention,
Figure BDA0003551998390000032
wherein v isjIndicating the output of the target characteristics on the full connection layer in the teacher model in the degree j;
Figure BDA0003551998390000033
indicating the sum of all outputs on the fully connected layer in the teacher model;
Figure BDA0003551998390000034
wherein z isjIndicating the output of the degree j on the full connection layer in the student model corresponding to the target feature;
Figure BDA0003551998390000035
indicating the sum of all outputs on the fully connected layer in the student model.
In one embodiment, the hard tag loss function is formulated as:
Figure BDA0003551998390000036
wherein M isjAnd the confidence degree corresponding to the target feature in the first feature information is obtained.
In one embodiment, before calculating the sum of the distillation losses of the features corresponding to all the features, the method further includes:
assigning a corresponding characteristic weight to the characteristic distillation loss corresponding to each characteristic; wherein the characteristic weight is determined based on the rate of decrease of the distillation loss for different characteristics.
In one embodiment, the feature weight of feature a is expressed as:
Figure BDA0003551998390000041
wherein L isa(t-1) indicating the characteristic distillation loss of characteristic a at the t-1 st step of adjusting the parameter; l is a radical of an alcohola(t-2) indicates the characteristic distillation loss of characteristic a at the t-2 th step of adjusting the parameter.
In one specific embodiment, the first feature information and the second feature information each include: feature location information for a plurality of features;
the method further comprises the following steps:
calculating the position loss of the target feature according to feature position information corresponding to the target feature in the first feature information and feature position information corresponding to the target feature in the second feature information through a preset position loss function; wherein the target feature is any one of the features detected by the plurality of teacher models;
adjusting parameters of the student model according to the location loss of all features.
In one embodiment, the position loss function is:
Figure BDA0003551998390000042
wherein i ∈ Pos indicates a frame with an object, and m ∈ { cx, cy, w, h } indicates a center position of the frame(cx, cy), width w, height h;
Figure BDA0003551998390000043
indicating whether the feature position information in the first feature information is matched with the feature position information corresponding to the target feature in the second feature information with respect to the degree k, wherein the matching is 1 and the mismatching is 0;
Figure BDA0003551998390000044
indicating feature location information in the first feature information,
Figure BDA0003551998390000045
indicating feature location information in the second feature information.
A method of feature detection, the method comprising:
acquiring a face image to be detected, and inputting the face image to be detected into a student model to obtain second feature information of a plurality of features corresponding to the face image to be detected; the student model is obtained through the model combination method training.
A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to perform the steps of the model merging method and the feature detection method described above.
A terminal device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the model merging method and the feature detection method described above.
The invention provides a feature detection method, a model merging method, equipment and a medium, wherein a plurality of human face training images without labels are obtained firstly, a first human face image is input into a plurality of teacher models respectively, first feature information output by each teacher model is obtained, and the first feature information output by all the teacher models is used as the labels of the first human face images; inputting the first face image carrying the label into a student model, and acquiring second characteristic information which is output by the student model and corresponds to a plurality of characteristics; and finally, calculating the distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, adjusting parameters of the student model according to the distillation loss, and returning to execute the step of inputting the first face image into the plurality of teacher models respectively until the student model converges. Therefore, the invention guides one student model through a plurality of convergent teacher models until the student model converges, so that the student model has the feature detection capability of all the teacher models, the purposes of model combination and model occupation reduction are achieved, and edge deployment is conveniently realized in one step.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Wherein:
FIG. 1 is a schematic flow chart diagram of a model merging method in one embodiment;
FIG. 2 is a schematic diagram of a face image with feature information in one embodiment;
fig. 3 is a block diagram of a terminal device in one embodiment.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Specifically, the "features" in the present invention refer to various details that need to be detected in the face image, and include, but are not limited to, details such as "blackheads", "wrinkles", and "black eyes". For the convenience of illustration, the following description will mainly use two features of "black head" and "wrinkle" as examples. It will of course be appreciated that the "blackheads" and "wrinkles" in the following schemes may equally be replaced by any other feature, or the number of features may be additionally increased.
As shown in fig. 1, fig. 1 is a schematic flow chart of a model merging method in an embodiment. The model merging method is applied to a model set, the model set comprises a plurality of teacher models and a student model with the same network structure, the teacher models are converged, different teacher models are used for detecting different characteristics, and the student model is not converged. The unconverged student model is trained by a plurality of converged teacher models, so that the student model has the feature detection capability of all the teacher models, and model combination is completed.
The method for merging models in this embodiment provides steps including:
102, obtaining a plurality of human face training images without labels, inputting a first human face image into a plurality of teacher models respectively, obtaining first characteristic information output by each teacher model, and taking the first characteristic information output by all the teacher models as the labels of the first human face image.
Firstly, shooting a plurality of face training images without labels by using a shooting device, or calling some face training images without labels from a database, or obtaining a plurality of face training images without labels by using other methods, which is not specifically limited herein.
It is understood that the obtained face training images are different, and the reasons for the difference include the difference of the face angle and/or the difference of the image size. Therefore, certain preprocessing operations are required to be performed on the obtained face training images, and the preprocessing operations include: according to a face key point algorithm, for example, a face 68-point Landmark model algorithm of a Dlib library is used to obtain the central positions of two eyeballs and the central position of a nose, then the two central positions are connected and combined with a vertical line to calculate the angle theta of left-right rotation of the face, and finally, a rotation change matrix is used to adjust the face image by taking the nose coordinate information as the center, wherein the specific calculation formula is as follows:
Figure BDA0003551998390000071
wherein x and y are two-dimensional coordinates of pixels in the original face training image respectively, and xAnd yAnd (4) adjusting the two-dimensional coordinates of the pixels in the face training image. Thus, the angle correction can be carried out on the distorted face training image.
And intercepting a face effective area according to the corrected image by taking the nose center coordinate as the center and the maximum distance between face key points as the length according to the face key point coordinate, so that only the effective face image part is predicted subsequently, and the processing efficiency is properly improved. And performing scale normalization operation to obtain face training images with consistent sizes, wherein the normalized sizes are unified to 1024 x 1024. Therefore, the initially obtained face training images all reach a certain processing standard.
Further, any one of the plurality of human face training images without carrying the label is selected as a first human face image and is respectively input into the plurality of converged teacher models. In this embodiment, because the method is applied to an embedded device, the plurality of teacher models may use lightweight convolutional neural network identification feature information, such as ssd-mobilenetV 2. Of course, lightweight models such as YOLO and Resnet may be selected as the teacher model, but it is assumed that the teacher model is trained in advance to converge. Wherein, the parameters of different characteristics are different, so that the first characteristic information output by different teacher models is different.
In this embodiment, the first feature information output by each teacher model includes feature position information, feature severity, and confidence, taking blackheads and wrinkles as an example. The characteristic position information is displayed in a square frame form and is used for marking the position of the characteristic, for example, the characteristic position information output by the blackhead teacher model is used for indicating the position of the blackhead, and the characteristic position information output by the wrinkle teacher model is used for indicating the position of the wrinkle. Feature severity is used to indicate the significance, quantitative degree, etc. of a feature; for example, the higher the severity of the feature output by the blackhead teacher model, the more pronounced the number of blackheads indicated. The higher the severity of the feature output by the wrinkle teacher model, the longer and more pronounced the wrinkle is indicated. The confidence level is used for indicating the confidence level of the feature severity, and the higher the confidence level is, the more credible the obtained feature severity is.
And the first characteristic information output by all the teacher models is used as a label of the first face image to replace the manual work to finish the labeling work of the first face image, so that the workload of manual labeling is reduced. In addition, the method is different from the previous detection output one-hot labeling form, and the confidence is introduced as a label, so that the possibility of student model learning is softened.
And 104, inputting the first face image carrying the label into a student model, and acquiring second characteristic information which is output by the student model and corresponds to a plurality of characteristics.
In this embodiment, since the network structure of the constructed student model is consistent with that of the teacher model, after the first face image carrying the label is input into the student model, the obtained second feature information also includes feature position information, feature severity and confidence. However, since the student model does not converge, at the beginning of training, the actual first feature information and the second feature information have a significant difference, and the parameters of the student model need to be adjusted through the step 106, so that the difference between the first feature information and the second feature information is reduced.
And 106, calculating the distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, and adjusting parameters of the student model according to the distillation loss.
In this embodiment, the distillation loss function includes a plurality of characteristic distillation loss functions, and the distillation loss function KD _ loss is composed of a plurality of characteristic distillation losses. For example, to detect blackheads and wrinkles, the predetermined characteristic distillation loss function includes a blackhead distillation loss function and a wrinkle distillation loss function, and the distillation loss is correspondingly composed of blackhead distillation loss and wrinkle distillation loss. Further, the method comprisesOne characteristic distillation loss function corresponds to one characteristic, and one characteristic distillation loss function comprises a soft tag loss function and a hard tag loss function. The blackhead distillation loss KD _ loss is calculated as a blackhead distillation loss function1The description is given for the sake of example.
And calculating the soft label loss of the target feature according to the feature severity corresponding to the target feature (blackhead) in the first feature information and the feature severity corresponding to the target feature (blackhead) in the second feature information by using a soft label loss function.
Specifically, the formula of the soft tag loss function is:
Figure BDA0003551998390000081
where N is the total degree of severity of the target feature (blackhead), and for example, if the severity of the blackhead is set to 5, N is 5.
Figure BDA0003551998390000091
The value of the degree of severity j of the feature corresponding to the target feature in the first feature information is indicated for the feature severity corresponding to the target feature (blackhead) in the first feature information when the temperature is T.
Figure BDA0003551998390000092
Wherein v isjIndicating the output of the target feature (blackhead) at degree j on the fully connected layer in the teacher model.
Figure BDA0003551998390000093
Indicating the sum of all outputs on the fully connected layer in the teacher model.
Figure BDA0003551998390000094
The feature corresponding to the target feature in the second feature information is severeAnd a degree indicating a value of degree j of the degree of severity of the feature corresponding to the target feature in the second feature information in the case where the temperature is T.
Figure BDA0003551998390000095
Wherein z isjIndicating the output at degree j on the fully connected layer in the student model corresponding to the target feature (blackhead).
Figure BDA0003551998390000096
Indicating the sum of all outputs on the fully connected layer in the student model.
Meanwhile, calculating the hard tag loss L of the target feature according to the confidence degree corresponding to the target feature (blackhead) in the first feature information and the feature severity corresponding to the target feature (blackhead) in the second feature information by using a hard tag loss functionout_label
Specifically, the formula of the hard tag loss function is:
Figure BDA0003551998390000097
wherein M isjAnd the confidence degree corresponding to the target feature in the first feature information is obtained.
Finally, the blackhead distillation loss KD _ loss is obtained based on the following formula1
KD_loss1=Lsoft*α+Lout_label*(1-α)
Where α is the first weight and 1- α is the second weight.
According to the empirical parameter setting, let T be 20 and α be 0.8 in the above formula. Of course, the calculation of the distillation loss function of other characteristics is the same, and will not be described again.
Furthermore, considering that different features are different in the distribution and characteristics of the face, for example, wrinkles and blackheads are different in the distribution and characteristics of the face, the difficulty levels of learning different features may also be different. Therefore, when constructing KD _ loss, loss of different characteristics is constructed separately, and different weights are given, so that learning of different characteristics by the distillation loss function can be balanced, and the training progress of the characteristic distillation loss function can not be obviously different.
Specifically, a corresponding characteristic weight is given to the characteristic distillation loss corresponding to each characteristic; wherein the characteristic weight is determined based on the reduction rate of the distillation loss of different characteristics.
Illustratively, the feature weight of feature a is represented as:
Figure BDA0003551998390000101
wherein L isa(t-1) indicating the characteristic distillation loss of characteristic a at the t-1 st step of adjusting the parameter; l isa(t-2) indicates the characteristic distillation loss of characteristic a at the t-2 th step of adjusting the parameter. From this equation, the more rapidly the characteristic distillation loss of the characteristic a decreases, the simpler the instruction learning difficulty, and therefore the β is setaThe smaller. Conversely, when the characteristic distillation loss of the characteristic a decreases more slowly, it is more difficult to indicate learning, and therefore, the β is setaThe larger. Thus, the distillation loss function is more biased to the more difficult side in the learning process, and the training progress of each characteristic distillation loss function is basically consistent.
Accordingly, the distillation loss function is expressed as:
KD_loss=β1KD-loss12KD_loss2…+βMKD_lossM
where M indicates the number of features, for example, the detected features include "black heads", "wrinkles", and "black eyes", and then M is 3.
In addition, for the feature position information, the position loss of the target feature (for example, the black head) is calculated according to the feature position information corresponding to the target feature (for example, the black head) in the first feature information and the feature position information corresponding to the target feature (for example, the black head) in the second feature information through a preset position loss function. Of course, the target feature may be any other one of the plurality of teacher-model-detected features, such as a wrinkle.
Specifically, the position loss function is:
Figure BDA0003551998390000111
wherein i belongs to Pos and indicates a frame with an object, and m belongs to { cx, cy, w, h } and indicates the central position (cx, cy), width w and height h of the frame;
Figure BDA0003551998390000112
indicating whether the feature position information in the first feature information is matched with the feature position information corresponding to the target feature in the second feature information with respect to the degree k, wherein the matching is 1 and the mismatching is 0;
Figure BDA0003551998390000113
indicating feature location information in the first feature information,
Figure BDA0003551998390000114
indicating feature location information in the second feature information.
And adjusting parameters of the student model according to the position loss of all the characteristics, so that the position information of each characteristic output by the student model continuously approaches the position information which is marked by the sample and is about the same characteristic and has the same degree k, and the student model can learn the detection capability of the teacher model about the position information of all the characteristics.
If the student model has converged through the training of step 102-106, the training is stopped. If the student model is not converged after the training of step 102-106, the step 102 is executed again until the student model is converged.
The model merging method comprises the steps of firstly obtaining a plurality of face training images without labels, respectively inputting a first face image into a plurality of teacher models, obtaining first characteristic information output by each teacher model, and taking the first characteristic information output by all the teacher models as the labels of the first face images; inputting the first face image carrying the label into a student model, and acquiring second characteristic information which is output by the student model and corresponds to a plurality of characteristics; and finally, calculating the distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, adjusting parameters of the student model according to the distillation loss, and returning to execute the step of inputting the first face image into the plurality of teacher models respectively until the student model converges. Therefore, the invention guides one student model through a plurality of convergent teacher models until the student model converges, so that the student model has the feature detection capability of all the teacher models, the purposes of model combination and model occupation reduction are achieved, and edge deployment is conveniently realized in one step.
Furthermore, the trained student model is deployed in the embedded device, when feature detection is required to be performed on the face image to be detected, the face image to be detected can be acquired based on the embedded device, the face image to be detected is input into the converged student model, feature detection is performed through the converged student model, and finally an effect graph similar to the second feature information containing multiple features shown in fig. 2 is obtained in the embedded device. Therefore, the purpose of combining the models and reducing the occupied amount of the models can be realized, and the edge deployment is facilitated.
Fig. 3 shows an internal configuration diagram of a terminal device in one embodiment. As shown in fig. 3, the terminal device includes a processor, a memory, and a network interface connected by a system bus. The memory comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium of the terminal device stores an operating system and also stores a computer program, and when the computer program is executed by a processor, the computer program can enable the processor to realize a model merging method and a feature detection method. The internal memory may also store a computer program that, when executed by the processor, causes the processor to perform a model merging method and a feature detection method. It will be understood by those skilled in the art that the structure shown in fig. 3 is a block diagram of only a portion of the structure associated with the inventive arrangements, and does not constitute a limitation on the terminal device to which the inventive arrangements are applied, and that a particular terminal device may include more or less components than those shown, or combine certain components, or have a different arrangement of components.
A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer program: the method comprises the steps of obtaining a plurality of face training images without labels, inputting first face images into a plurality of teacher models respectively, obtaining first characteristic information output by each teacher model, and taking the first characteristic information output by all the teacher models as labels of the first face images; the first face image is any one of a plurality of face training images without labels, and first characteristic information output by different teacher models is different; inputting the first face image carrying the label into a student model, and acquiring second characteristic information which is output by the student model and corresponds to a plurality of characteristics; and calculating the distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, adjusting parameters of the student model according to the distillation loss, and returning to execute the step of inputting the first face image into the plurality of teacher models respectively until the student model converges.
And performing the steps of: and acquiring a face image to be detected, and inputting the face image to be detected into the student model to obtain second feature information of a plurality of features corresponding to the face image to be detected.
A computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of: the method comprises the steps of obtaining a plurality of face training images without labels, inputting first face images into a plurality of teacher models respectively, obtaining first characteristic information output by each teacher model, and taking the first characteristic information output by all the teacher models as labels of the first face images; the first face image is any one of a plurality of face training images without labels, and first characteristic information output by different teacher models is different; inputting the first face image carrying the label into a student model, and acquiring second characteristic information which is output by the student model and corresponds to a plurality of characteristics; and calculating the distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, adjusting parameters of the student model according to the distillation loss, and returning to execute the step of inputting the first face image into the plurality of teacher models respectively until the student model converges.
And performing the steps of: and acquiring a face image to be detected, and inputting the face image to be detected into the student model to obtain second characteristic information of a plurality of characteristics corresponding to the face image to be detected.
It should be noted that the above-mentioned feature detection method, model merging method, device and computer-readable storage medium belong to a general inventive concept, and the contents in the embodiments of the feature detection method, model merging method, device and computer-readable storage medium are mutually applicable.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (12)

1. A model merging method applied to a model set including a plurality of teacher models and a student model having the same network structure, the plurality of teacher models each having converged, different teacher models being used to detect different features, and the student model having not converged, the method comprising:
acquiring a plurality of face training images without carrying labels, respectively inputting first face images into the plurality of teacher models, acquiring first characteristic information output by each teacher model, and taking the first characteristic information output by all the teacher models as the labels of the first face images; the first face image is any one of the plurality of face training images without carrying labels, and first characteristic information output by different teacher models is different;
inputting a first face image carrying a label into the student model, and acquiring second feature information corresponding to a plurality of features output by the student model;
and calculating distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, adjusting parameters of the student models according to the distillation loss, and returning to execute the step of inputting the first face images into the plurality of teacher models respectively until the student models converge.
2. The method of claim 1, wherein the first characteristic information and the second characteristic information each comprise a characteristic severity and a confidence level, the distillation loss function comprises a plurality of characteristic distillation loss functions, one characteristic distillation loss function for each characteristic, one characteristic distillation loss function comprising one soft tag loss function and one hard tag loss function;
calculating the distillation loss according to the first characteristic information and the second characteristic information through a preset distillation loss function, wherein the calculation comprises the following steps:
calculating the soft label loss of the target feature according to the feature severity corresponding to the target feature in the first feature information and the feature severity corresponding to the target feature in the second feature information through the soft label loss function; wherein the target feature is any one of the features detected by the plurality of teacher models;
calculating the hard tag loss of the target feature according to the confidence degree corresponding to the target feature in the first feature information and the feature severity degree corresponding to the target feature in the second feature information through the hard tag loss function;
calculating a first product of a first weight and the soft tag loss and a second product of a second weight and the hard tag loss, and taking the sum of the first product and the second product as the characteristic distillation loss corresponding to the target characteristic; wherein the sum of the first weight and the second weight is 1;
and calculating the sum of the characteristic distillation losses corresponding to all the characteristics to obtain the distillation loss.
3. The method of claim 2, wherein the soft tag loss function is formulated as:
Figure FDA0003551998380000021
wherein N is the total degree of the feature severity corresponding to the target feature;
Figure FDA0003551998380000022
indicating the value of the degree j of the characteristic severity corresponding to the target characteristic in the first characteristic information under the condition that the temperature is T;
Figure FDA0003551998380000023
and indicating the value of the degree j of the characteristic severity corresponding to the target characteristic in the second characteristic information under the condition that the temperature is equal to T.
4. The method of claim 3,
Figure FDA0003551998380000024
wherein v isjIndicating the output of the target characteristics on the full connection layer in the teacher model in the degree j;
Figure FDA0003551998380000025
indicating the sum of all outputs on the fully connected layer in the teacher model;
Figure FDA0003551998380000026
wherein z isjIndicating the output of the degree j on the full connection layer in the student model corresponding to the target feature;
Figure FDA0003551998380000031
indicating the sum of all outputs on the fully connected layer in the student model.
5. The method of claim 3, wherein the hard tag loss function is formulated as:
Figure FDA0003551998380000032
wherein M isjAnd the confidence degree corresponding to the target feature in the first feature information is obtained.
6. The method of claim 2, wherein said calculating the sum of the characteristic distillation losses for all of the characteristics further comprises:
assigning a corresponding characteristic weight to the characteristic distillation loss corresponding to each characteristic; wherein the characteristic weight is determined based on the rate of decrease of the distillation loss for different characteristics.
7. The method of claim 6, wherein the feature weight of feature a is expressed as:
Figure FDA0003551998380000033
wherein L isa(t-1) indicating the characteristic distillation loss of characteristic a at the t-1 st step of adjusting the parameter; l isa(t-2) indicates the characteristic distillation loss of characteristic a at the t-2 th step of adjusting the parameter.
8. The method of claim 1, wherein the first feature information and the second feature information each comprise: feature location information for a plurality of features;
the method further comprises the following steps:
calculating the position loss of the target feature according to feature position information corresponding to the target feature in the first feature information and feature position information corresponding to the target feature in the second feature information through a preset position loss function; wherein the target feature is any one of the features detected by the plurality of teacher models;
and adjusting parameters of the student model according to the position loss of all the characteristics.
9. The method of claim 8, wherein the position loss function is:
Figure FDA0003551998380000034
wherein i belongs to Pos and indicates a frame with an object, and m belongs to { cx, cy, w, h } and indicates the central position (cx, cy), width w and height h of the frame;
Figure FDA0003551998380000041
indicating whether the feature position information in the first feature information is matched with the feature position information corresponding to the target feature in the second feature information with respect to the degree k, wherein the matching is 1 and the mismatching is 0;
Figure FDA0003551998380000042
indicating feature location information in the first feature information,
Figure FDA0003551998380000043
indicating feature location information in the second feature information.
10. A method of feature detection, the method comprising:
acquiring a face image to be detected, and inputting the face image to be detected into a student model to obtain second feature information of a plurality of features corresponding to the face image to be detected; wherein the student model is trained by the method of any one of claims 1 to 9.
11. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 10.
12. A terminal device comprising a memory and a processor, characterized in that the memory stores a computer program which, when executed by the processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 10.
CN202210264286.9A 2022-03-17 2022-03-17 Feature detection method, model merging method, device, and medium Pending CN114663941A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210264286.9A CN114663941A (en) 2022-03-17 2022-03-17 Feature detection method, model merging method, device, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210264286.9A CN114663941A (en) 2022-03-17 2022-03-17 Feature detection method, model merging method, device, and medium

Publications (1)

Publication Number Publication Date
CN114663941A true CN114663941A (en) 2022-06-24

Family

ID=82030313

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210264286.9A Pending CN114663941A (en) 2022-03-17 2022-03-17 Feature detection method, model merging method, device, and medium

Country Status (1)

Country Link
CN (1) CN114663941A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113052144A (en) * 2021-04-30 2021-06-29 平安科技(深圳)有限公司 Training method, device and equipment of living human face detection model and storage medium
CN113610146A (en) * 2021-08-03 2021-11-05 江西鑫铂瑞科技有限公司 Method for realizing image classification based on knowledge distillation enhanced by interlayer feature extraction
US20220076136A1 (en) * 2020-09-09 2022-03-10 Peyman PASSBAN Method and system for training a neural network model using knowledge distillation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220076136A1 (en) * 2020-09-09 2022-03-10 Peyman PASSBAN Method and system for training a neural network model using knowledge distillation
CN113052144A (en) * 2021-04-30 2021-06-29 平安科技(深圳)有限公司 Training method, device and equipment of living human face detection model and storage medium
CN113610146A (en) * 2021-08-03 2021-11-05 江西鑫铂瑞科技有限公司 Method for realizing image classification based on knowledge distillation enhanced by interlayer feature extraction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
葛仕明;赵胜伟;刘文瑜;李晨钰;: "基于深度特征蒸馏的人脸识别", 北京交通大学学报, no. 06, 15 December 2017 (2017-12-15) *

Similar Documents

Publication Publication Date Title
US11238311B2 (en) Method for image classification, computer device, and storage medium
CN108968991B (en) Hand bone X-ray film bone age assessment method, device, computer equipment and storage medium
CN110245662B (en) Detection model training method and device, computer equipment and storage medium
CN110135406B (en) Image recognition method and device, computer equipment and storage medium
CN111310624B (en) Occlusion recognition method, occlusion recognition device, computer equipment and storage medium
US11915514B2 (en) Method and apparatus for detecting facial key points, computer device, and storage medium
CN109086711B (en) Face feature analysis method and device, computer equipment and storage medium
US20210216878A1 (en) Deep learning-based coregistration
CN108960062A (en) Correct method, apparatus, computer equipment and the storage medium of invoice image
CN110516541B (en) Text positioning method and device, computer readable storage medium and computer equipment
CN107886082B (en) Method and device for detecting mathematical formulas in images, computer equipment and storage medium
CN110889446A (en) Face image recognition model training and face image recognition method and device
CN110287836B (en) Image classification method and device, computer equipment and storage medium
CN111144398A (en) Target detection method, target detection device, computer equipment and storage medium
CN112241952B (en) Brain midline identification method, device, computer equipment and storage medium
CN111860582B (en) Image classification model construction method and device, computer equipment and storage medium
CN112464945A (en) Text recognition method, device and equipment based on deep learning algorithm and storage medium
CN109829484B (en) Clothing classification method and equipment and computer-readable storage medium
CN111832561A (en) Character sequence recognition method, device, equipment and medium based on computer vision
CN112464860A (en) Gesture recognition method and device, computer equipment and storage medium
CN114663941A (en) Feature detection method, model merging method, device, and medium
CN109063601B (en) Lip print detection method and device, computer equipment and storage medium
CN116091596A (en) Multi-person 2D human body posture estimation method and device from bottom to top
CN114663942A (en) Feature detection method, model training method, device, and medium
WO2021073150A1 (en) Data detection method and apparatus, and computer device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination