CN118014048A - Low-illumination face detection model construction method, device and terminal - Google Patents

Low-illumination face detection model construction method, device and terminal Download PDF

Info

Publication number
CN118014048A
CN118014048A CN202410167554.4A CN202410167554A CN118014048A CN 118014048 A CN118014048 A CN 118014048A CN 202410167554 A CN202410167554 A CN 202410167554A CN 118014048 A CN118014048 A CN 118014048A
Authority
CN
China
Prior art keywords
model
domain
target
loss
supervision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410167554.4A
Other languages
Chinese (zh)
Inventor
陈芳林
高明君
裴文杰
卢光明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202410167554.4A priority Critical patent/CN118014048A/en
Publication of CN118014048A publication Critical patent/CN118014048A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a method, a device and a terminal for constructing a low-illumination face detection model. The method comprises the following steps: acquiring a source domain and a target domain; acquiring an initial model, wherein the initial model comprises a student model and a teacher model; performing supervision training on the student model based on the source domain to obtain initial model parameters and first supervision loss; and performing interactive supervision learning on the student model and the teacher model based on the initial model parameters and the first supervision loss to obtain a target detection model. The method for constructing the low-illumination face detection model can better identify the face in the low-illumination environment and improve the detection performance of the face detection model.

Description

Low-illumination face detection model construction method, device and terminal
Technical Field
The present invention relates to the field of image detection technologies, and in particular, to a method, an apparatus, and a terminal for constructing a low-illuminance face detection model.
Background
The face detection task aims at accurately locating all faces in a given image, and in recent years, through the continuous development of deep learning, the technology is also widely applied to downstream tasks such as face alignment, face recognition, face attribute analysis and the like. Although face detection in a conventional scene has excellent detection performance, low-illumination face detection is still very challenging due to factors such as uneven brightness, high noise ratio, low contrast, and the like.
In the prior art, the research method for the low-illumination face is mostly based on supervision training of mass tags, however, the cost for acquiring tag data in a low-illumination scene is high, the difficulty is high, the generalization capability is poor, and the prediction performance of the existing low-illumination face detection model is not accurate enough.
Accordingly, there is a need for improvement and advancement in the art.
Disclosure of Invention
Aiming at the defects in the prior art, a method, a device and a terminal for constructing a low-illumination face detection model are provided, and the problem that the prediction performance of the low-illumination face detection model in the prior art is not accurate enough is solved.
In a first aspect of the present invention, a method for constructing a low-illuminance face detection model is provided, including:
Acquiring a source domain and a target domain, wherein the source domain is a conventional illuminance data set marked with face information, and the target domain is a low illuminance data set not marked with face information;
acquiring an initial model, wherein the initial model comprises a student model and a teacher model, and the student model and the teacher model have the same structure;
performing supervision training on the student model based on the source domain to obtain initial model parameters and first supervision loss;
and performing interactive supervision learning on the student model and the teacher model based on the initial model parameters and the first supervision loss to obtain a target detection model.
The method for constructing the low-illumination face detection model, wherein the interactive supervised learning is performed on the student model and the teacher model based on the initial model parameters and the first supervision loss to obtain a target detection model, comprises the following steps:
copying the initial model parameters to the teacher model;
preprocessing the source domain and the target domain to obtain a low-illumination source domain, a first target domain and a second target domain;
performing supervision training on the student model based on the low-illumination source domain to obtain a second supervision loss;
inputting the first target domain into the teacher model for prediction to obtain an initial pseudo tag, wherein the initial pseudo tag is a face tag of an image in the first target domain predicted by the teacher model;
Performing unsupervised training on the student model based on the initial pseudo tag and the second target domain to obtain unsupervised loss;
Optimizing the student model based on the first supervised loss, the second supervised loss, and the unsupervised loss;
Based on an EMA updating strategy, acquiring a sliding average value of the sequence of the student model, and updating the teacher model;
and repeating the step of preprocessing the source domain and the target domain until the times of updating the teacher model and the student model reach preset times, so as to obtain the target detection model.
The method for constructing the low-illumination face detection model, wherein the preprocessing is performed on the source domain and the target domain to obtain a low-illumination source domain, a first target domain and a second target domain, includes:
Performing night migration processing on the image in the source domain to obtain the low-illumination source domain;
performing weak enhancement processing on the image in the target domain to obtain the first target domain;
and performing strong enhancement processing on the image in the target domain to obtain the second target domain.
The method for constructing the low-illumination face detection model, wherein the performing unsupervised training on the student model based on the initial pseudo tag and the second target domain comprises the following steps:
Obtaining a target confidence value threshold, filtering the initial pseudo tag based on the target confidence value threshold, and removing redundant frames in the initial pseudo tag through NMS operation to obtain a target pseudo tag;
And the target pseudo tag is used as the labeling data of the second target domain and is input to the student model together with the second target domain for training, so that the unsupervised loss is obtained.
The method for constructing the low-illumination face detection model, wherein the optimizing the student model based on the first supervision loss, the second supervision loss and the unsupervised loss, further comprises:
When the student model performs supervision training and unsupervised training, predicting a prediction domain label of input data, wherein the prediction domain label comprises a first label and a second label, the first label represents that an input source is source domain data, and the second label represents that the input source is target domain data;
and acquiring an input source real domain label, and acquiring target countermeasures based on the real domain label and the predicted domain label.
The low-illumination face detection model construction method, wherein optimizing the student model based on the first supervision loss, the second supervision loss and the unsupervised loss, further comprises:
Acquiring positive sample data and negative sample data, wherein the positive sample data comprises a first image group in the first target domain and a second image group in the second target domain, the first image group and the second image group are identical to corresponding images in the target domain, and the negative sample data comprises all images in the first target domain except for the first image group;
And performing self-supervision training on the student model based on the positive sample data and the negative sample data to obtain target self-supervision loss.
The low-illumination face detection model construction method, wherein the optimizing the student model based on the first supervision loss, the second supervision loss and the unsupervised loss by the root comprises:
optimizing the student model based on a target loss function;
the objective loss function is:
Wherein, ,/>,/>Is a weight balance factor,/>,/>Respectively said first supervision loss and said second supervision loss,/>For the unsupervised loss,/>To combat losses,/>, for the targetSelf-supervising the loss for the target.
In a second aspect of the present invention, there is provided a low-illuminance face detection model construction apparatus including:
The data acquisition module is used for acquiring a source domain and a target domain, wherein the source domain is a conventional illumination data set marked with face information, and the target domain is a low illumination data set not marked with face information;
The building module is used for obtaining an initial model, wherein the initial model comprises a student model and a teacher model, and the student model and the teacher model have the same structure;
The initial module is used for performing supervision training on the student model based on the source domain to obtain initial model parameters and first supervision loss;
And the optimization module is used for performing interactive supervision learning on the student model and the teacher model based on the initial model parameters and the first supervision loss to obtain a target detection model.
In a third aspect of the present invention, there is provided a terminal comprising: the processor, the storage medium that is communicatively connected with the processor, the storage medium is suitable for storing a plurality of instructions, and the processor is suitable for calling the instructions in the storage medium to execute the steps of implementing the low-illumination face detection model construction method according to any one of the above.
In a fourth aspect of the present invention, there is provided a storage medium storing one or more programs executable by one or more processors to implement the steps of the low-illuminance face detection model construction method according to any one of the above.
The beneficial effects are that: compared with the prior art, the invention provides a method, a device and a terminal for constructing a low-illumination face detection model, wherein in the method for constructing the low-illumination face detection model, a source domain and a target domain are obtained, wherein the source domain is a conventional illumination data set marked with face information, the target domain is a low-illumination data set not marked with face information, then an initial model is obtained, the initial model comprises a student model and a teacher model, the structures of the student model and the teacher model are the same, then supervision training is carried out on the student model based on the source domain to obtain initial model parameters and first supervision loss, and finally interactive supervision learning is carried out on the student model and the teacher model based on the initial model parameters and the first supervision loss to obtain a target detection model. The face detection model constructed by the low-illumination face detection model construction method provided by the invention can better identify the face in the low-illumination environment and improve the detection performance of the face detection model.
Drawings
Fig. 1 is a schematic diagram of a face detection method based on image enhancement in the prior art in an embodiment of a low-illuminance face detection model construction method provided by the present invention;
fig. 2 is a schematic diagram of a face detection method based on image darkening in the prior art in an embodiment of the low-illuminance face detection model construction method provided by the present invention;
Fig. 3 is a schematic diagram of a face detection method based on countermeasure learning in the prior art in an embodiment of the low-illuminance face detection model construction method provided by the present invention;
FIG. 4 is a flowchart of an embodiment of a method for constructing a low-illuminance face detection model provided by the present invention;
Fig. 5 is a schematic diagram of a face detection model in an embodiment of a method for constructing a low-illuminance face detection model according to the present invention;
Fig. 6 is a schematic diagram of an image level domain adaptive structure in an embodiment of the method for constructing a low-illuminance face detection model according to the present invention;
Fig. 7 is a schematic diagram of a night migration module result in an embodiment of the low-illuminance face detection model construction method provided by the present invention;
FIG. 8 is a schematic diagram of a feature level domain adaptive structure in an embodiment of a method for constructing a low-illuminance face detection model according to the present invention;
fig. 9 is a schematic diagram of a classifier in the challenge learning field in an embodiment of the method for constructing a low-illuminance face detection model provided by the present invention;
fig. 10 is a schematic diagram of a multi-layer perceptron in an embodiment of a method for constructing a low-illumination face detection model according to the present invention;
FIG. 11 is a schematic diagram of a progressive scale module in an embodiment of a method for constructing a low-illumination face detection model according to the present invention;
Fig. 12 is a visual comparison chart of experimental results in an embodiment of the low-illumination face detection model construction method provided by the invention;
Fig. 13 is a schematic structural diagram of an embodiment of a low-illuminance face detection model building apparatus provided by the present invention;
Fig. 14 is a schematic structural diagram of an embodiment of a terminal provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and more specific, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.
It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The low-illumination face detection model construction method provided by the invention can be applied to a terminal with computing capability, and the terminal can execute the low-illumination face detection model construction method provided by the invention to detect and position the target position in the image to be processed.
Example 1
In the prior art, image translation methods are mainly classified into two types, image enhancement and image darkening. The image enhancement method is based on the traditional digital image processing, the Retinex theory and the countermeasure generation mode, performs end-to-end training in a paired or unpaired mode, and finally enhances the low-illumination target domain image to the conventional illumination level, and adopts a model pre-trained in a conventional scene for detection, as shown in fig. 1. The image darkening method is based on style migration to realize image conversion of an asymmetric data set, source domain data in a conventional scene is converted into a target domain in a low-illumination scene, a pre-training model is obtained based on training of the converted source domain data, and face detection of the target domain in low illumination is realized, as shown in fig. 2.
The field self-adaptive method is mainly divided into two types of countermeasure learning and self-learning. The antagonism study introduces the domain classifier to judge the input, the characteristics or the output of the model, and the domain invariance characteristics are gradually generated by the antagonism training guide model of the characteristic extractor and the classifier, so that the detection performance of the target domain is improved, as shown in figure 3. The self-learning method generates pseudo tag (pseudo labels) auxiliary prediction of a target domain based on a preset confidence value threshold or priori knowledge in training.
In the prior art, the evaluation design of the method is based on human eye vision of the image instead of machine vision, and when the method is applied to downstream tasks such as detection, segmentation and the like, the optimization target deviation is often present, and potential target features can be eliminated or blocked, so that the improvement effect is limited. The domain adaptive method based on the countermeasure learning supervises the feature extractor to generate the domain invariance features to process the domain gap, and the method generally has better effect in classification tasks, but when facing complex face detection tasks and larger domain gap, the auxiliary countermeasure loss is insufficient to provide enough information learning correlation for the model. The self-learning method provides supervision information for the target domain data by generating the pseudo tag, but the method is very limited by the accuracy degree of the pseudo tag, and the model is required to generate the high-quality pseudo tag in the target domain.
Therefore, in this embodiment, based on the teacher-student network, the student model is optimized by generating pseudo tags through the teacher model, and at the same time, the student model is updated gradually through EMA, and the training process is guided by the supervised loss of the source domain and the unsupervised loss of the target domain, so as to achieve better detection performance. Meanwhile, the embodiment also introduces night migration and countermeasure learning strategies to respectively carry out image-level domain adaptation and feature-level domain adaptation, further reduces the domain gap and promotes knowledge migration. Further, as the problem that the target domain has large face scale variation and large small scale faces is solved, in the embodiment, a progressive scale method is also provided, so that the model can learn face information of different scales from large to small, thereby generating a target domain pseudo tag with higher quality, and improving the perception capability of the model on target domain data in a contrast learning mode.
Specifically, in this embodiment, a method for constructing a low-illuminance face detection model is provided. As shown in fig. 4, the method for constructing the low-illumination face detection model provided by the invention comprises the following steps:
S100, acquiring a source domain and a target domain, wherein the source domain is a conventional illumination data set marked with face information, and the target domain is a low-illumination data set not marked with face information.
In the present embodiment, it will containConventional illumination dataset/>, with annotation dataViewed as source domain, include/>Low-illumination dataset/>, without annotation dataIs regarded as a target domain, in whichAnd representing the coordinates of the real frame of the human face. The present embodiment is to use/>、/>A cross-domain face detector is trained, so that the cross-domain face detector can obtain good performance in a target domain scene, and the overall structure of the model is shown in figure 5.
S200, acquiring an initial model, wherein the initial model comprises a student model and a teacher model, and the student model and the teacher model have the same structure.
Referring to fig. 5, fig. 5 is a structural diagram of the initial model, and it can be seen that the initial model includes two models with identical structures, which are respectively the student modelsAnd the teacher model/>. The student model is updated by adopting a back propagation and gradient descent algorithm, and the teacher model is updated by adopting an index moving average (Exponential Moving Average, EMA) strategy of the student model.
And S300, performing supervision training on the student model based on the source domain to obtain initial model parameters and first supervision loss.
Specifically, in the method for constructing a low-illuminance face detection model according to the present embodiment, first, the data of the source domain is passed throughInitialization is performed using source domain image/>And corresponding real face marker box/>Performing supervision training on the student model to obtain the initial model parameters and the first supervision loss, wherein a calculation formula of the first supervision loss is as follows:
Wherein the method comprises the steps of Class cross entropy penalty representing prediction box,/>L1 regression loss for predicted and true frames.
In this embodiment, a DSFD face detector model based on vgg is adopted as an infrastructure of a teacher-student model, and the student model is supervised and trained ten thousand times, wherein the learning rate 1 r=1e-3 and the batch_size=8.
And S400, performing interactive supervision learning on the student model and the teacher model based on the initial model parameters and the first supervision loss to obtain a target detection model.
Specifically, the embodiment optimizes the student model by generating a pseudo tag by using the teacher model based on a teacher-student network, meanwhile, the student model gradually updates the teacher model through EMA, and the training process is guided by the supervision loss of the source domain and the non-supervision loss of the target domain, so as to obtain better detection performance. Meanwhile, image-level domain adaptation and feature-level domain adaptation are respectively carried out through night migration and countermeasure learning strategies, so that the domain gap is further reduced, and knowledge migration is promoted. The method is characterized in that the target domain has the problem of large face scale variation and large small-scale faces, and the method further comprises a progressive scale method, so that the model can learn face information of different scales from large to small, thereby generating a target domain pseudo tag with higher quality, and improving the perception capability of the model on target domain data in a contrast learning mode.
Specifically, the training process is guided by the supervised loss of the source domain and the unsupervised loss of the target domain to achieve better detection performance.
Specifically, the performing interactive supervised learning on the student model and the teacher model based on the initial model parameters and the first supervision loss to obtain a target detection model includes:
s410, copying the initial model parameters to the teacher model.
Specifically, after the initialization phase is finished, the initial model parameters, namely weights, obtained through training of the student model are copied to the teacher model, so that the teacher-student model can jointly complete detection of the target domain data through mutual learning.
S420, preprocessing the source domain and the target domain to obtain a low-illumination source domain, a first target domain and a second target domain.
And after the initialization stage or the initialization is finished, preprocessing the source domain and the target domain to obtain a low-illumination source domain, a first target domain and a second target domain.
The preprocessing the source domain and the target domain to obtain a low-illumination source domain, a first target domain and a second target domain, including:
s421, performing night migration processing on the image in the source domain to obtain the low-illumination source domain;
S422, performing weak enhancement processing on the image in the target domain to obtain the first target domain;
s423, performing strong enhancement processing on the image in the target domain to obtain the second target domain.
Unlike the image in the source domain, the target domain image tends to have a series of complex problems such as high noise, low contrast, uneven brightness, etc. under the low illumination condition, and a huge gap between fields will prevent the knowledge migration across fields, so that the prediction of the teacher model is inclined toward the source domain. Therefore, in this embodiment, in order to solve this problem and improve the mutual learning efficiency, a "night shift" and an opposing learning module are proposed, to respectively complete the domain adaptation of the image level and the feature level, reduce the inter-domain gap, and improve the detection performance, and the structure is shown in fig. 6.
In this embodiment, the night migration process is performed on the image in the source domain to obtain the low-illumination source domain, and the low-illumination source domain is obtained by simulating the environment of the low-illumination domain through a series of random image processing operations and migrating the image in the source domain to a space close to the image in the target domainThe model is caused to learn further about the feature distribution of the target domain.
Referring to fig. 6, in this embodiment, a night migration module in a student model randomly adopts strategies such as gaussian blur, gamma correction, contrast adjustment, gaussian noise, random occlusion, and the like on an image in the source domain, so that the image level of the image is closer to that of the image in the target domain, and a partial result diagram is shown in fig. 7.
In order to improve the quality of model detection, in this embodiment, weak enhancement processing is performed on the image in the target domain to obtain the first target domain; Then, the image in the target domain is subjected to strong enhancement processing to obtain the second target domain/>. Specifically, the image of the second target domain after strong enhancement is sent to a student model to increase the model detection difficulty, and the image of the first target domain after weak enhancement is sent to a teacher model to ensure the accuracy of the predicted pseudo tag. Wherein in this embodiment, weak enhancement includes random horizontal flipping and scaling, and strong enhancement includes random horizontal flipping, scaling, random color dithering, gaussian blurring, gray-scale transformation, random erasure, and the like.
And S430, performing supervision training on the student model based on the low-illumination source domain to obtain a second supervision loss.
Specifically, data passing through the low-illuminance source domainInitializing, using the low-illumination source domain image/>And corresponding real face marker box/>Performing supervision training on the student model to obtain the second supervision loss, wherein the calculation formula of the second supervision loss is as follows:
Wherein the method comprises the steps of Class cross entropy penalty representing prediction box,/>L1 regression loss for predicted and true frames.
S440, inputting the first target domain into the teacher model for prediction to obtain an initial pseudo tag, wherein the initial pseudo tag is a face tag of an image in the first target domain predicted by the teacher model;
S450, performing unsupervised training on the student model based on the initial pseudo tag and the second target domain to obtain unsupervised loss.
Specifically, the first target domain is input into the teacher model for prediction, and an initial pseudo tag is obtained, wherein the initial pseudo tag is a face tag of an image in the first target domain predicted by the teacher model. And then performing unsupervised training on the student model based on the initial pseudo tag and the second target domain, wherein the unsupervised training comprises the following steps: and obtaining a target confidence value threshold, filtering the initial pseudo tag based on the target confidence value threshold, removing redundant frames in the initial pseudo tag through NMS operation to obtain a target pseudo tag, and then inputting the target pseudo tag serving as labeling data of the second target domain and the second target domain to the student model together for training to obtain the unsupervised loss.
Because the target field lacks a true label, in this embodiment, the student model is trained by treating the prediction result of the teacher model as a pseudo tag. Specifically, for the initial pseudo tag output by the teacher model, a confidence value threshold is setFiltering the initial pseudo tag, removing redundant frames through NMS operation to obtain the target pseudo tag, and supervising training of the student model to obtain the unsupervised loss, wherein the unsupervised loss calculation formula is as follows:
Wherein, And representing the target pseudo tag generated after the first target domain data passes through the teacher model.
S460, optimizing the student model based on the first supervised loss, the second supervised loss, and the unsupervised loss.
In this embodiment, the optimizing the student model based on the first supervised loss, the second supervised loss, and the unsupervised loss further comprises:
s461, predicting a prediction domain label of input data when the student model performs supervision training and unsupervised training, wherein the prediction domain label comprises a first label and a second label, the first label represents that an input source is source domain data, and the second label represents that the input source is target domain data;
and acquiring an input source real domain label, and acquiring target countermeasures based on the real domain label and the predicted domain label.
Specifically, in the present embodiment, an countermeasure learning module is further included to complete the domain adaptation of the feature level, reduce the inter-domain gap, and improve the detection performance, the structure of which is shown in fig. 8.
The optimizing the student model based on the first supervised loss, the second supervised loss, and the unsupervised loss, further comprises:
When the student model performs supervision training and unsupervised training, predicting a prediction domain label of input data, wherein the prediction domain label comprises a first label and a second label, the first label represents that an input source is source domain data, and the second label represents that the input source is target domain data;
and acquiring an input source real domain label, and acquiring target countermeasures based on the real domain label and the predicted domain label.
In the present embodiment, feature level adaptation is completed by aligning feature distribution between two fields by countermeasure learning. The challenge learning introduces an additional domain arbiter D after the feature extractor E of the detection model, as shown in fig. 9.
The objective of the domain discriminator is to judge whether the input feature is from a source domain or a target domain, and the feature extractor is used for confusing the judgment of the domain discriminator, and the two are trained together in a countermeasure mode to promote the feature extractor to gradually generate the domain invariance feature.
In this embodiment, the domain arbiter removes the gradient inversion layer (GRL), and is essentially a classifier, and inputs the features extracted and refined by the feature extraction and refinement of the student model detector, that is, the features that are sent to the final classification and regression branch, and outputs the final result as the prediction p (the source domain or the target domain) of the feature domain through a series of convolutions Leakyrelu.
The Target counter loss function is a standard two-class cross entropy loss function, and the prediction p and the true value are calculated to obtain two-class loss after the Source domain label is defined as 0 (source_label=0) and the Target domain label is defined as 1 (target_label=1)
During the countermeasure training, the discriminator D is expected to clearly distinguish between the domain features, i.eThe smaller the loss, the better, and the feature extractor (E) wants to confuse the judgment of the arbiter, i.e./>The greater the loss, the better, the opposite of the optimization objectives, resembling a max-min game. Through the countermeasure training, the model feature extractor can gradually generate the domain invariance features, and the feature alignment, namely the domain self-adaption of the feature level, is realized.
In this embodiment, the initial calculation formula of the countermeasures loss is:
Wherein the method comprises the steps of Representing the prediction of the input features by the domain arbiter and the classification loss of the real feature labels. To further enable end-to-end training of the whole model, in this embodiment a gradient inversion layer (GRADIENT REVERSE LAYER, GRL) is added between the feature extractor and the domain arbiter. The gradient overturning layer can be regarded as an identity operator in forward propagation, but can transfer a negative constant in backward propagation, so that gradient inversion is realized, the updating of the whole maximum-minimum loss is completed, and the final calculation formula of the target countermeasures loss is as follows:
Wherein, ,/>Feature labels representing the source domain and the target domain, respectively.
S462, said optimizing said student model based on said first supervised loss, said second supervised loss, and said unsupervised loss, further comprising:
Acquiring positive sample data and negative sample data, wherein the positive sample data comprises a first image group in the first target domain and a second image group in the second target domain, the first image group and the second image group are identical to corresponding images in the target domain, and the negative sample data comprises all images in the first target domain except for the first image group;
And performing self-supervision training on the student model based on the positive sample data and the negative sample data to obtain target self-supervision loss.
Specifically, in order to further improve the feature analysis capability of the detection model on the target domain, in this embodiment, a self-supervision contrast learning module is further added in the training process of the model. Specifically, the method includes the steps that the same group of strong and weak enhanced target domain images are regarded as positive samples, namely, positive sample data comprise a first image group in the first target domain and a second image group in the second target domain, the first image group and the second image group are identical to the corresponding images in the target domain, other weak enhanced target domain images are regarded as negative samples, namely, the negative sample data comprise all images in the first target domain except for the first image group, potential information of the target domain images is further analyzed and mined through comparison learning, and the calculation formula of the target self-supervision loss is as follows:
Wherein MLP is a contrast learning multi-layer perceptron, i, j respectively represent iteration coefficients of the second target domain and the first target domain, Is a super parameter. Specifically, for each feature in the second target domain, a loss of contrast with positive and negative samples in the first target domain is calculated. The model structure is shown in fig. 10.
The optimizing the student model based on the first supervised loss, the second supervised loss, and the unsupervised loss, comprising:
optimizing the student model based on a target loss function;
the objective loss function is:
Wherein, ,/>,/>Is a weight balance factor,/>,/>Respectively said first supervision loss and said second supervision loss,/>For the unsupervised loss,/>To combat losses,/>, for the targetSelf-supervising the loss for the target.
In the present embodiment of the present invention, in the present embodiment,,/>,/>
Further, the face detection is different from the common target detection in the prior face detection, so that the face detection has the problems of more small face number and large face scale change. In order to better adapt to the scale change, the model is more sensitive to small face features, and in the embodiment, the method further comprises the step of adopting a progressive scale strategy to urge the model to learn the face features with various scales from large to small and from easy to difficult. And the noise is dense and the brightness is uneven in the low-illumination scene, so that the difficulty in detecting the small face in the target domain by the model is further increased. In order to improve the perception capability of the model to the multi-scale face in the target domain, a progressive scale strategy is provided, the target domain image is randomly erased in the training process, and along with the increase of training iteration times, the area of the clipping region is gradually enlarged until the clipping region is the same as the original image, as shown in fig. 11. The method provides the model with the face samples from large to small gradually, and increases the number of small faces in the training image gradually, so that the model is trained easily. By increasing the clipping proportion in an iterative way, the model is gradually adapted to the wider face scale change, and the small face can be predicted more accurately in the later stage of training.
S470, based on an EMA updating strategy, acquiring a sliding average value of the sequence of the student model, and updating the teacher model.
Specifically, in order to obtain a more stable and robust pseudo tag, the teacher model uses an EMA update strategy, and each optimization uses a sliding average value of the sequence of the student model, where the update formula is:
Wherein,
And S480, repeatedly executing the step of preprocessing the source domain and the target domain until the times of updating the teacher model and the student model reach preset times, and obtaining the target detection model.
In this embodiment, the preset number of times is fifty thousand times. Also, during the non-supervised training of the student model for fifty thousand times, the learning rate 1r=1e-3, batch_size=8.
The face detection model constructed based on the low-illumination face detection model construction method disclosed by the embodiment obviously improves the face detection performance of the low-illumination image without using the low-illumination image annotation and has better generalization.
In order to verify the validity of the face detection model constructed by the low-illumination face detection model construction method provided by the embodiment, the performance of the face detection model is verified on a self-built image dataset.
In the experiment, a large-scale FACE dataset WIDER FACE was used as the source field, which contained 12880 images and 159420 corresponding FACE real frames, a low-illumination FACE dataset DARK FACE was used as the target field, 6000 images under real night scenes were contained, a DARK FACE official test set was used as the experimental test set, and the performance evaluation used average precision (mAP).
By quantitative analysis: table 1 shows the performance comparison of the low-illuminance FACE detection model construction method according to the present embodiment with other types of technologies (enhancement, darkening, and unsupervised field adaptation) on the DARK FACE test set, and it can be seen that the technologies proposed in the present embodiment all lead other methods.
Table 1:
And qualitative analysis also proves that the face prediction frame generated by the face detection model constructed by the low-illumination face detection model construction method is more accurate compared with other models. With particular reference to fig. 12.
In summary, the present embodiment provides a method for constructing a low-illuminance face detection model, by acquiring a source domain and a target domain, where the source domain is a conventional illuminance dataset with face information marked, the target domain is a low-illuminance dataset without face information marked, then acquiring an initial model, the initial model includes a student model and a teacher model, the student model and the teacher model have the same structure, then performing supervision training on the student model based on the source domain to obtain initial model parameters and first supervision loss, and finally performing interactive supervision learning on the student model and the teacher model based on the initial model parameters and the first supervision loss to obtain a target detection model. The face detection model constructed by the low-illumination face detection model construction method provided by the invention can better identify the face in the low-illumination environment and improve the detection performance of the face detection model.
It should be understood that, although the steps in the flowcharts shown in the drawings of this specification are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps of the present invention are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps of the present invention may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order in which the sub-steps or stages are performed is not necessarily sequential, and may be performed in turn or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
Example two
Based on the above embodiment, the present invention further provides a low-illuminance face detection model building apparatus, whose functional block diagram is shown in fig. 13, where the low-illuminance face detection model building apparatus includes:
The data acquisition module is used for acquiring a source domain and a target domain, wherein the source domain is a conventional illumination data set marked with face information, and the target domain is a low illumination data set not marked with face information;
The building module is used for obtaining an initial model, wherein the initial model comprises a student model and a teacher model, and the student model and the teacher model have the same structure;
The initial module is used for performing supervision training on the student model based on the source domain to obtain initial model parameters and first supervision loss;
And the optimization module is used for performing interactive supervision learning on the student model and the teacher model based on the initial model parameters and the first supervision loss to obtain a target detection model.
Example III
Based on the method for constructing the low-illumination face detection model in the first embodiment, the invention also provides a terminal, and a schematic block diagram of the terminal can be shown in fig. 14. The terminal comprises a memory 10 and a processor 20, wherein the memory 10 stores a low-illumination face detection model construction program, and the processor 10 can at least realize the following steps when executing a computer program:
Acquiring a source domain and a target domain, wherein the source domain is a conventional illuminance data set marked with face information, and the target domain is a low illuminance data set not marked with face information;
acquiring an initial model, wherein the initial model comprises a student model and a teacher model, and the student model and the teacher model have the same structure;
performing supervision training on the student model based on the source domain to obtain initial model parameters and first supervision loss;
and performing interactive supervision learning on the student model and the teacher model based on the initial model parameters and the first supervision loss to obtain a target detection model.
The method for constructing the low-illumination face detection model, wherein the interactive supervised learning is performed on the student model and the teacher model based on the initial model parameters and the first supervision loss to obtain a target detection model, comprises the following steps:
copying the initial model parameters to the teacher model;
preprocessing the source domain and the target domain to obtain a low-illumination source domain, a first target domain and a second target domain;
performing supervision training on the student model based on the low-illumination source domain to obtain a second supervision loss;
inputting the first target domain into the teacher model for prediction to obtain an initial pseudo tag, wherein the initial pseudo tag is a face tag of an image in the first target domain predicted by the teacher model;
Performing unsupervised training on the student model based on the initial pseudo tag and the second target domain to obtain unsupervised loss;
Optimizing the student model based on the first supervised loss, the second supervised loss, and the unsupervised loss;
Based on an EMA updating strategy, acquiring a sliding average value of the sequence of the student model, and updating the teacher model;
and repeating the step of preprocessing the source domain and the target domain until the times of updating the teacher model and the student model reach preset times, so as to obtain the target detection model.
The method for constructing the low-illumination face detection model, wherein the preprocessing is performed on the source domain and the target domain to obtain a low-illumination source domain, a first target domain and a second target domain, includes:
Performing night migration processing on the image in the source domain to obtain the low-illumination source domain;
performing weak enhancement processing on the image in the target domain to obtain the first target domain;
and performing strong enhancement processing on the image in the target domain to obtain the second target domain.
Wherein said unsupervised training of said student model based on said initial pseudo tag and said second target domain comprises:
Obtaining a target confidence value threshold, filtering the initial pseudo tag based on the target confidence value threshold, and removing redundant frames in the initial pseudo tag through NMS operation to obtain a target pseudo tag;
And the target pseudo tag is used as the labeling data of the second target domain and is input to the student model together with the second target domain for training, so that the unsupervised loss is obtained.
Wherein said optimizing said student model based on said first supervised loss, said second supervised loss, and said unsupervised loss further comprises:
When the student model performs supervision training and unsupervised training, predicting a prediction domain label of input data, wherein the prediction domain label comprises a first label and a second label, the first label represents that an input source is source domain data, and the second label represents that the input source is target domain data;
and acquiring an input source real domain label, and acquiring target countermeasures based on the real domain label and the predicted domain label.
Wherein optimizing the student model based on the first supervised loss, the second supervised loss, and the unsupervised loss, further comprises:
Acquiring positive sample data and negative sample data, wherein the positive sample data comprises a first image group in the first target domain and a second image group in the second target domain, the first image group and the second image group are identical to corresponding images in the target domain, and the negative sample data comprises all images in the first target domain except for the first image group;
And performing self-supervision training on the student model based on the positive sample data and the negative sample data to obtain target self-supervision loss.
Wherein the optimizing the student model based on the first supervised loss, the second supervised loss, and the unsupervised loss, comprises:
optimizing the student model based on a target loss function;
the objective loss function is:
Wherein, ,/>,/>Is a weight balance factor,/>,/>Respectively said first supervision loss and said second supervision loss,/>For the unsupervised loss,/>To combat losses,/>, for the targetSelf-supervising the loss for the target.
Example IV
The present invention also provides a storage medium storing one or more programs executable by one or more processors to implement the steps of the low-illuminance face detection model construction method described in the above embodiment.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. The method for constructing the low-illumination face detection model is characterized by comprising the following steps of:
Acquiring a source domain and a target domain, wherein the source domain is a conventional illuminance data set marked with face information, and the target domain is a low illuminance data set not marked with face information;
acquiring an initial model, wherein the initial model comprises a student model and a teacher model, and the student model and the teacher model have the same structure;
performing supervision training on the student model based on the source domain to obtain initial model parameters and first supervision loss;
and performing interactive supervision learning on the student model and the teacher model based on the initial model parameters and the first supervision loss to obtain a target detection model.
2. The method of claim 1, wherein the performing interactive supervised learning on the student model and the teacher model based on the initial model parameters and the first supervision loss to obtain a target detection model comprises:
copying the initial model parameters to the teacher model;
preprocessing the source domain and the target domain to obtain a low-illumination source domain, a first target domain and a second target domain;
performing supervision training on the student model based on the low-illumination source domain to obtain a second supervision loss;
inputting the first target domain into the teacher model for prediction to obtain an initial pseudo tag, wherein the initial pseudo tag is a face tag of an image in the first target domain predicted by the teacher model;
Performing unsupervised training on the student model based on the initial pseudo tag and the second target domain to obtain unsupervised loss;
Optimizing the student model based on the first supervised loss, the second supervised loss, and the unsupervised loss;
Based on an EMA updating strategy, acquiring a sliding average value of the sequence of the student model, and updating the teacher model;
and repeating the step of preprocessing the source domain and the target domain until the times of updating the teacher model and the student model reach preset times, so as to obtain the target detection model.
3. The method of claim 2, wherein the preprocessing the source domain and the target domain to obtain a low-illuminance source domain, a first target domain, and a second target domain includes:
Performing night migration processing on the image in the source domain to obtain the low-illumination source domain;
performing weak enhancement processing on the image in the target domain to obtain the first target domain;
and performing strong enhancement processing on the image in the target domain to obtain the second target domain.
4. The method of claim 2, wherein the performing unsupervised training on the student model based on the initial pseudo tag and the second target domain comprises:
Obtaining a target confidence value threshold, filtering the initial pseudo tag based on the target confidence value threshold, and removing redundant frames in the initial pseudo tag through NMS operation to obtain a target pseudo tag;
And the target pseudo tag is used as the labeling data of the second target domain and is input to the student model together with the second target domain for training, so that the unsupervised loss is obtained.
5. The low-light face detection model construction method according to claim 2, wherein the optimizing the student model based on the first supervision loss, the second supervision loss, and the unsupervised loss further comprises:
When the student model performs supervision training and unsupervised training, predicting a prediction domain label of input data, wherein the prediction domain label comprises a first label and a second label, the first label represents that an input source is source domain data, and the second label represents that the input source is target domain data;
and acquiring an input source real domain label, and acquiring target countermeasures based on the real domain label and the predicted domain label.
6. The low-light face detection model construction method of claim 5, wherein optimizing the student model based on the first supervised loss, the second supervised loss, and the unsupervised loss, further comprises:
Acquiring positive sample data and negative sample data, wherein the positive sample data comprises a first image group in the first target domain and a second image group in the second target domain, the first image group and the second image group are identical to corresponding images in the target domain, and the negative sample data comprises all images in the first target domain except for the first image group;
And performing self-supervision training on the student model based on the positive sample data and the negative sample data to obtain target self-supervision loss.
7. The method of claim 6, wherein the optimizing the student model based on the first supervised loss, the second supervised loss, and the unsupervised loss comprises:
optimizing the student model based on a target loss function;
the objective loss function is:
Wherein, ,/>,/>Is a weight balance factor,/>,/>Respectively said first supervision loss and said second supervision loss,/>For the unsupervised loss,/>To combat losses,/>, for the targetSelf-supervising the loss for the target.
8. A low-illuminance face detection model construction apparatus, characterized in that the apparatus comprises:
The data acquisition module is used for acquiring a source domain and a target domain, wherein the source domain is a conventional illumination data set marked with face information, and the target domain is a low illumination data set not marked with face information;
The building module is used for obtaining an initial model, wherein the initial model comprises a student model and a teacher model, and the student model and the teacher model have the same structure;
The initial module is used for performing supervision training on the student model based on the source domain to obtain initial model parameters and first supervision loss;
And the optimization module is used for performing interactive supervision learning on the student model and the teacher model based on the initial model parameters and the first supervision loss to obtain a target detection model.
9. A terminal, the terminal comprising: a processor, a storage medium communicatively coupled to the processor, the storage medium adapted to store a plurality of instructions, the processor adapted to invoke the instructions in the storage medium to perform the steps of implementing the low-light face detection model construction method of any of the preceding claims 1-7.
10. A storage medium storing one or more programs executable by one or more processors to implement the steps of the low-light face detection model construction method of any one of claims 1-7.
CN202410167554.4A 2024-02-06 2024-02-06 Low-illumination face detection model construction method, device and terminal Pending CN118014048A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410167554.4A CN118014048A (en) 2024-02-06 2024-02-06 Low-illumination face detection model construction method, device and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410167554.4A CN118014048A (en) 2024-02-06 2024-02-06 Low-illumination face detection model construction method, device and terminal

Publications (1)

Publication Number Publication Date
CN118014048A true CN118014048A (en) 2024-05-10

Family

ID=90942516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410167554.4A Pending CN118014048A (en) 2024-02-06 2024-02-06 Low-illumination face detection model construction method, device and terminal

Country Status (1)

Country Link
CN (1) CN118014048A (en)

Similar Documents

Publication Publication Date Title
JP7236545B2 (en) Video target tracking method and apparatus, computer apparatus, program
Cheng et al. Fast and accurate online video object segmentation via tracking parts
CN112052787B (en) Target detection method and device based on artificial intelligence and electronic equipment
WO2019233297A1 (en) Data set construction method, mobile terminal and readable storage medium
Paul et al. Robust visual tracking by segmentation
CN108416266B (en) Method for rapidly identifying video behaviors by extracting moving object through optical flow
CN109614921B (en) Cell segmentation method based on semi-supervised learning of confrontation generation network
CN110910391B (en) Video object segmentation method for dual-module neural network structure
CN109271958A (en) The recognition methods of face age and device
CN106204658A (en) Moving image tracking and device
CN112001399B (en) Image scene classification method and device based on local feature saliency
CN113763424B (en) Real-time intelligent target detection method and system based on embedded platform
CN113012169A (en) Full-automatic cutout method based on non-local attention mechanism
US11367206B2 (en) Edge-guided ranking loss for monocular depth prediction
CN111126155B (en) Pedestrian re-identification method for generating countermeasure network based on semantic constraint
CN113361329B (en) Robust single-target tracking method based on example feature perception
CN114332166A (en) Visible light infrared target tracking method and device based on modal competition cooperative network
CN114399661A (en) Instance awareness backbone network training method
CN117253071B (en) Semi-supervised target detection method and system based on multistage pseudo tag enhancement
Ren et al. A robust and accurate end-to-end template matching method based on the Siamese network
CN111242114A (en) Character recognition method and device
CN116129417A (en) Digital instrument reading detection method based on low-quality image
CN115861223A (en) Solar cell panel defect detection method and system
CN118014048A (en) Low-illumination face detection model construction method, device and terminal
CN116228623A (en) Metal surface defect detection method, equipment and storage medium based on isomorphism regularization self-supervision attention network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination