CN111539248B

CN111539248B - Infrared face detection method and device and electronic equipment thereof

Info

Publication number: CN111539248B
Application number: CN202010163507.4A
Authority: CN
Inventors: 曹志诚; 庞辽军; 陈五星; 赵恒�
Original assignee: Xi'an Xd Xin'an Intelligent Technology Co ltd; Xidian University
Current assignee: Xi'an Xiyue Xin'an Intelligent Technology Co ltd; Xidian University
Priority date: 2020-03-10
Filing date: 2020-03-10
Publication date: 2023-05-05
Anticipated expiration: 2040-03-10
Also published as: CN111539248A

Abstract

The invention discloses an infrared face detection method, an infrared face detection device and electronic equipment thereof, wherein the method comprises the steps of acquiring an infrared face image set; preprocessing the infrared face image set to obtain an infrared face preprocessed image set; constructing an infrared face detection model; training the infrared face detection model according to the infrared face pretreatment image set to obtain a trained infrared face detection model; and performing generalization capability test on the trained infrared face detection model according to a plurality of infrared face image sets to obtain a face detection result so as to realize face detection. According to the infrared face detection method provided by the invention, the face detection under severe climates such as night and the like with high speed, high efficiency and strong generalization capability is realized through the constructed infrared face detection model.

Description

Infrared face detection method and device and electronic equipment thereof

Technical Field

The invention belongs to the technical fields of mode identification and digital image processing, and particularly relates to an infrared face detection method and device and electronic equipment thereof.

Background

Face detection is a classical problem deeply studied in the field of machine vision, and has important application value in the fields of security monitoring, personnel and evidence comparison, man-machine interaction, social interaction and the like. Devices on the terminals of digital cameras, smart phones and the like have largely used face detection technology to realize functions of focusing, atlas arrangement and classification and the like of faces during imaging, and various virtual face-beautifying cameras also need the face detection technology to position the faces, then can determine the range of the skin and the five sense organs of the faces according to the face alignment technology, and then beautify the faces. In the face recognition process, face detection is the first step of the whole face recognition.

To date, researchers have focused on face detection techniques based on visible light, which are often poorly performing in severe climates and environments, such as unconstrained acquisition conditions, uneven illumination, night and bad climates such as rain and snow. With the advent of applications in various complex environments in the real world, it is increasingly difficult for face detection technologies based on visible light to meet requirements, for example, monitoring tasks in the real world often occur in bad atmospheric environments such as nighttime, rainy and snowy days, and obtaining high-definition face images through visible light in such environments is a difficult task to complete. Meanwhile, the current face detection method has been changed from the traditional manual design method (such as the well-known Viola-Jones algorithm) to the deep learning-based method. The face detection method based on deep learning has become a mainstream method for face detection at present due to the characteristics of high automatic learning and robustness. The face detection based on deep learning is taken as a special case of deep learning target detection, the current design thought is roughly divided into two directions, and one is a target detection algorithm based on double stages, such as a typical representative Faster RCNN algorithm; the other direction is based on a single-stage object detection algorithm, such as the well-known SSD algorithm.

However, the dual-stage object detector has some advantages over the single-stage object detector in accuracy due to the rough screening process, but has slower speed, and the single-stage object detector can directly obtain the result by inputting data once, which is lower in accuracy than the dual-stage object detector.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides an infrared face detection method and device and electronic equipment thereof.

An embodiment of the present invention provides an infrared face detection method, including:

acquiring an infrared face image set;

preprocessing the infrared face image set to obtain an infrared face preprocessed image set;

constructing an infrared face detection model;

training the infrared face detection model according to the infrared face pretreatment image set to obtain a trained infrared face detection model;

and performing generalization capability test on the trained infrared face detection model according to a plurality of infrared face image sets to obtain a face detection result so as to realize face detection.

In one embodiment of the present invention, before training the infrared face detection model according to the infrared face pretreatment image set to obtain a trained infrared face detection model, the method further includes:

Acquiring a visible light face image set;

preprocessing the visible light face image set to obtain a visible light face preprocessing image set;

pre-training the infrared face detection model according to the visible light face pretreatment image set to obtain an infrared face detection pre-training model;

and retraining the infrared face detection pre-training model according to the infrared face pretreatment image set to obtain the trained infrared face detection model.

In one embodiment of the present invention, preprocessing the set of visible light face images to obtain a set of visible light face preprocessed images includes:

and carrying out gray conversion and normalization processing on the visible light face image set to obtain the visible light face pretreatment image set.

In one embodiment of the present invention, retraining the pre-training model for infrared face detection according to the set of infrared face pretreatment images to obtain the trained infrared face detection model includes:

performing migration learning on the infrared face pretreatment image set to obtain a migrated infrared face pretreatment image set;

and retraining the infrared face detection pre-training model according to the migrated infrared face pretreatment image set to obtain the trained infrared face detection model.

In one embodiment of the present invention, preprocessing the infrared face image set to obtain an infrared face preprocessed image set includes:

and carrying out image enhancement and normalization processing on the infrared face image set to obtain the infrared face pretreatment image set.

In one embodiment of the invention, the constructed infrared face detection model comprises a main body part, a characteristic pyramid module and a multi-branch cavity convolution module which are sequentially connected, wherein,

the main body part of the infrared face detection model comprises a first convolution block, a second convolution block, a third convolution layer and a fourth convolution layer which are sequentially connected, wherein each first convolution block comprises a plurality of sub-convolution layers and a pooling layer which are sequentially connected;

the feature pyramid module comprises a first feature fusion layer, a second feature fusion layer, a third feature fusion layer, a fourth feature fusion layer, a fifth feature fusion layer, a sixth feature fusion layer, a seventh feature fusion layer and a sixth feature fusion layer, wherein the first feature fusion layer, the second feature fusion layer, the third feature fusion layer, the fifth feature fusion layer, the sixth feature fusion layer, the seventh feature fusion layer, the eighth feature fusion layer, the fifth convolution layer and the sixth convolution layer are sequentially connected;

The multi-branch hole convolution module comprises first to sixth hole convolution layers which are sequentially connected, wherein the first hole convolution layer is connected with the first feature fusion layer, the second hole convolution layer is connected with the second feature fusion layer, the third hole convolution layer is connected with the third feature fusion layer, the fourth hole convolution layer is connected with the fifth feature fusion layer, the fifth hole convolution layer is connected with the seventh feature fusion layer, and the sixth hole convolution layer is connected with the eighth feature fusion layer and the sixth convolution layer.

In one embodiment of the present invention, training the infrared face detection model according to the infrared face pretreatment image set to obtain a trained infrared face detection model includes:

constructing a multi-task loss function;

and training the infrared face detection model by utilizing the multitasking loss function according to the infrared face preprocessing image set to obtain a trained infrared face detection model.

Another embodiment of the present invention provides an infrared face detection apparatus, including:

the first data acquisition module is used for acquiring an infrared face image set;

The first data preprocessing module is used for preprocessing the infrared face image set to obtain an infrared face preprocessed image set;

the data model construction module is used for constructing an infrared face detection model;

the first model training module is used for training the infrared face detection model according to the infrared face pretreatment image set to obtain a trained infrared face detection model;

and the data detection module is used for carrying out generalization capability test on the trained infrared face detection model according to a plurality of infrared face image sets to obtain a face detection result so as to realize face detection.

In one embodiment of the invention, the apparatus further comprises:

the second data acquisition module is used for acquiring a visible light face image set;

the second data preprocessing module is used for preprocessing the visible light face image set to obtain a visible light face preprocessing image set;

the model pre-training module is used for pre-training the infrared face detection model according to the visible light face pre-processing image set to obtain an infrared face detection pre-training model;

and the second model training module is used for retraining the infrared face detection pre-training model according to the infrared face pretreatment image set to obtain the trained infrared face detection model.

Still another embodiment of the present invention provides an infrared face detection electronic device, including an image collector, a display, a processor, a communication interface, a memory, and a communication bus, where the image collector, the display, the processor, the communication interface, and the memory complete communication with each other through the communication bus;

the image acquisition device is used for acquiring image data;

the display is used for displaying the image identification data;

the memory is used for storing a computer program;

the processor is configured to implement any one of the above-mentioned infrared face detection methods when executing the computer program stored on the memory.

Compared with the prior art, the invention has the beneficial effects that:

the infrared face detection method provided by the invention realizes the rapid, efficient and strong generalization capability through the constructed infrared face detection model, and is suitable for infrared face detection in severe climates such as night.

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Drawings

Fig. 1 is a schematic flow chart of an infrared face detection method according to an embodiment of the present invention;

Fig. 2 is a schematic structural diagram of an infrared face detection model in an infrared face detection method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of each hole convolution layer in an infrared face detection method according to an embodiment of the present invention;

fig. 4 is a schematic flow chart of another infrared face detection method according to an embodiment of the present invention;

FIGS. 5a to 5b are schematic diagrams illustrating examples of an infrared face image in an infrared face detection method according to an embodiment of the present invention;

FIGS. 6a to 6b are schematic diagrams illustrating an example of preprocessing an infrared face image in an infrared face detection method according to an embodiment of the present invention;

FIGS. 7a to 7c are schematic diagrams illustrating detection results of an infrared face detection method according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an infrared face detection device according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of another infrared face detection apparatus according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of an infrared face detection electronic device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to specific examples, but embodiments of the present invention are not limited thereto.

Example 1

Face detection based on deep learning is currently roughly divided into two directions: one is a dual-stage based target detection algorithm, such as the typical representative Faster RCNN algorithm; the other direction is based on a single-stage object detection algorithm, such as the well-known SSD algorithm. The dual-stage target detector has some advantages compared with the single-stage target detector in accuracy due to the rough screening process, but has a slower speed, the single-stage target detector can directly obtain a result by inputting data once, and the dual-stage target detector has lower accuracy compared with the dual-stage target detector, but has a fast speed and is convenient to apply and land. As the single-stage detection method has stronger practicability and is getting more attention in the field of face detection, please refer to fig. 1, fig. 1 is a flow chart of an infrared face detection method according to an embodiment of the present invention, and the embodiment provides an infrared face detection method, which includes the following steps:

and step 1, acquiring an infrared face image set.

Specifically, in this embodiment, an infrared camera is used to collect face images, and a specific electromagnetic band is an infrared band ranging from 760 nm to 1 mm (if no particular explanation is given, the infrared is limited in this band range in the present application), and the band can be subdivided into two sub-bands, namely a near infrared band (NIR) and a short wave infrared band (SWIR), so as to obtain an infrared face image set for subsequent face recognition.

And 2, preprocessing the infrared face image set to obtain an infrared face preprocessed image set.

Specifically, in order to achieve better face recognition, the embodiment normalizes and adjusts the contrast of the face images in the infrared face image set before image detection, specifically performs image enhancement and normalization processing on the infrared face image set to obtain an infrared face pretreatment image set, wherein,

for enhancing the contrast of the image, the embodiment adopts a logarithmic function and a power function to perform conversion, and a specific conversion formula is as follows:

wherein I is _IR And I' _IR Respectively, the images corresponding to the enhancement before and after enhancement.

For the enhanced image I' _IR Normalized to [0, 255 ]]The specific processing formula is as follows:

wherein I is _n For normalized gray-scale image, I' _IR For the added image, I _max And I _min Respectively the maximum and minimum gray values of the infrared face image.

In this embodiment, each infrared face image in the infrared face image set is processed by the above formula (1) and formula (2), so as to obtain an infrared face pretreatment image set.

And 3, constructing an infrared face detection model.

Specifically, since the single-stage object detection has stronger practicability than the dual-stage object detection, the present embodiment proposes an infrared face detection model based on deep learning and the single-stage object detector, and specifically, please refer to fig. 2, fig. 2 is a schematic structural diagram of the infrared face detection model in the infrared face detection method provided by the embodiment of the present invention, and it can be seen that the constructed infrared face detection model includes a main body portion of the infrared face detection model, a feature pyramid module and a multi-branch hole convolution module which are sequentially connected,

The main body part of the infrared face detection model comprises a first convolution block to a fifth convolution block and a first convolution layer to a sixth convolution layer which are sequentially connected, wherein each first convolution block comprises a plurality of sub-convolution layers and a pooling layer which are sequentially connected, for example, the first convolution block Conv1 comprises a sub-convolution layer conv1_1, a sub-convolution layer conv1_2 and a pooling layer maxpool1 which are sequentially connected, the second convolution block Conv2 comprises a sub-convolution layer conv2_1, a sub-convolution layer conv2_2 and a pooling layer maxpool2 which are sequentially connected, the third convolution block Conv3 comprises a sub-convolution layer conv3_1, a sub-convolution layer conv3_3 and a pooling layer maxpool3 which are sequentially connected, the fourth convolution block Conv3 comprises a sub-convolution layer conv4_1, a sub-convolution layer conv4_2, a sub-convolution layer conv4_3 and a pooling layer maxpool1 which are sequentially connected, and the fifth convolution block Conv3 comprises a sub-convolution layer conv3_1, a sub-convolution layer conv3 and a pooling layer maxpool5 which are sequentially connected. The infrared face detection model of the embodiment is based on a single-stage object detector SSD, and compared with the infrared face detection model of a double-stage object detector model, the infrared face detection model based on the single-stage object detector has the advantages that final prediction can be directly output through one-time input, no redundant intermediate steps exist, the module is simple, the parameters are fewer, the speed is high, and obvious disadvantages in the aspect of accuracy are avoided. Therefore, the embodiment selects the well-known VGG16 network in the SSD of the single-stage target detector as a main part, eliminates a full connection layer, adds an additional convolution layer to deepen the network, uses a small convolution kernel in the VGG16 network, has a simple network structure, adopts the same convolution kernel parameters in the convolution layer, adopts the same pooling kernel parameters in the pooling layer, and has good characteristic extraction capability and good fitting capability. In addition, the number of the rear parameters of the full-connection layer is greatly reduced, so that the training difficulty is obviously reduced.

The feature pyramid module comprises a first feature fusion layer, a second feature fusion layer, a third feature fusion layer, a fourth feature fusion layer, a fifth feature fusion layer, a sixth feature fusion layer, a seventh feature fusion layer, a fourth convolution layer, a fifth convolution layer Conv 7_fc7 and a sixth convolution layer Conv 7_fc7, wherein the first feature fusion layer, the second feature fusion layer, the third convolution block Conv3 and the second feature fusion layer are sequentially connected, the second feature fusion layer, the fourth convolution block Conv4 and the third feature fusion layer are sequentially connected, the third feature fusion layer, the fifth convolution block Conv5 and the fourth feature fusion layer are sequentially connected, the fifth feature fusion layer, the second convolution layer, the Conv_fc7 and the sixth feature fusion layer are sequentially connected, and the sixth feature fusion layer, the fourth convolution layer, the eighth feature fusion layer and the fifth convolution layer Conv7_1 and the sixth convolution layer Conv7_2 are sequentially connected. The embodiment adds a feature map pyramid structure which fully utilizes feature map information of each feature level: the low-level feature map has high resolution, less semantic information and smaller receptive field, is suitable for small targets and regression tasks, and the high-level feature map has low resolution, rich semantic information and large receptive field and is suitable for detecting large targets and classification tasks. The feature pyramid structure fuses the high-level feature images with the low-level feature images, namely pixel values at the same positions of the high-level feature images and the low-level feature images are added, the extracted features are enhanced, prediction output is carried out on each level respectively, and the final result is that the prediction output of each level is fused.

The multi-branch hole convolution module comprises a first hole convolution layer, a second hole convolution layer, a third hole convolution layer, a fourth hole convolution layer, a fifth hole convolution layer, a seventh feature fusion layer and a sixth hole convolution layer, wherein the first hole convolution layer, the second hole convolution layer, the third hole convolution layer, the fifth hole convolution layer, the seventh feature fusion layer and the eighth feature fusion layer are sequentially connected, the second hole convolution layer, the eighth feature fusion layer and the sixth convolution layer are connected. The multi-branch hole convolution module is added in the embodiment, because the human visual system is formed by compounding a plurality of parts with different receptive fields, and the receptive fields of each layer of the common CNN are fixed, some information is lost, the resolving power of different visual fields is lost, for example, the resolving power near the central part is more important and needs to be strengthened, and the like, therefore, the multi-branch hole convolution module of the embodiment combines a plurality of parallel convolution kernels with different sizes and hole convolutions with different hole ratios, the effect of integrating the receptive fields with different sizes is achieved, the loss of useful information is reduced, the characteristics are fused, the robustness of the characteristics is improved, and the specific operation positions are as follows: referring to fig. 3, fig. 3 is a schematic diagram of a structure example of each cavity convolution layer in the infrared face detection method provided by the embodiment of the present invention, each cavity convolution layer processes a feature map fused by a feature pyramid, the number of input channels is N, three convolution module branches with a convolution kernel number of N/3 are used for processing, but the convolution kernels of the three branches are different from the cavity rate, and finally, the results obtained by the three convolution module branches are superimposed on the channel dimension, and the number of the finally obtained fused feature map is still the same as the number of channels before processing. Wherein N is set according to actual needs.

Further, the output of the multi-branch hole convolution module sequentially passes through a normalization layer, a prediction convolution layer and a multi-task loss layer as shown in fig. 3, so as to realize the whole infrared face detection model. The normalization layer, the prediction convolution layer and the multi-task loss layer are common processes of the existing model, and are not specifically described herein.

And 4, training the infrared face detection model according to the infrared face pretreatment image set to obtain a trained infrared face detection model.

Specifically, the infrared face pretreatment image set obtained in the step 2 is input to the infrared face detection model constructed in the step 3 for training, specifically, the infrared face pretreatment image set is divided into a training set, a verification set and a test set in the training process, the training set is not overlapped with the verification set and the test set, the training is performed by using the training set and the verification set, meanwhile, a multi-task loss function is constructed, the multi-task loss function comprises a classification task loss function and a regression task loss function, and the multi-task loss function based on the classification task loss function and the regression task loss function is designed as follows:

wherein L is _cls (p _i ，p _i ^* ) To classify the loss function, L _reg (t _i ，t _i ^* ) For regression loss function, i is specifically the index of anchor frame, p _i To predict the probability that anchor frame i is a face, if the anchor frame is a positive sample, group trunk p _i ^* The label is 1, otherwise 0, t _i To predict the position of the frame, t _i ^* N is the real frame associated with anchor frame i _cls And N _reg For the parameters used for normalization, the ratio of positive sample to negative sample is specifically determined, and lambda is the balance parameter.

The present embodiment classifies task loss function L _cls (p _i ，p _i ^* ) The cross entropy loss function is adopted, and in particular, the softmax function is adopted to realize two classifications, namely two classifications of human faces and backgrounds.

The present embodiment regresses the task loss function L _reg (t _i ，t _i ^* ) A smooth first order norm loss function is used, which is designed as follows:

where x is the position of the label sample coordinates, and the regression loss only calculates the anchor frame with the label as the positive sample.

Next, the present embodiment uses N for the two-part loss function, the classification task loss function, the regression task loss function _cls And N _reg And carrying out normalization processing, and then carrying out infrared face detection model parameter weight balance on the two parts of loss functions by utilizing the balance parameter lambda. The number of positive sample anchor frames and the number of negative sample anchor frames are used for normalization in the classification loss, and the number of positive sample anchor frames is used for normalization in the regression loss.

Further, the infrared face detection model is trained according to the infrared face pretreatment image set and by utilizing the multi-task loss function, and a trained infrared face detection model is obtained.

Specifically, the embodiment trains the infrared face detection model based on the multi-task loss function of the classification task loss function and the regression task loss function, the training process uses a training set and a verification set in the infrared face preprocessing image set, a preferred optimizer in training is SGD, the initial learning rate is 0.001, the batch size is 8, and the model which minimizes the loss of the verification set is continuously saved to determine the infrared face detection model which is finally trained in the embodiment and is used for subsequent infrared face detection. Wherein the test set is used for subsequent generalization capability tests.

And 5, performing generalization capability test on the trained infrared face detection model according to the plurality of infrared face image sets to obtain a face detection result so as to realize face detection.

Specifically, in order to illustrate the generalization capability of the detection method provided by the embodiment, the embodiment uses a plurality of infrared face image data sets to perform a cross-library test to ensure the generalization capability, specifically selects different infrared face data sets, performs the cross-library test on the trained infrared face detection model obtained in the step 4, and verifies the generalization capability of the infrared face detection model.

In summary, the infrared face detection method provided in the embodiment realizes a fast and efficient face detection function through technologies such as infrared imaging, feature pyramid, multi-branch cavity convolution, and the like, and specifically: firstly, an infrared camera is used for collecting infrared face images and preprocessing the infrared face images, so that night working capacity is realized; secondly, a hierarchical characteristic golden tower structure is added on the basis of a single-stage target detector SSD, the advantages of high-low layer characteristics of a network characteristic diagram are fully utilized, and modules of different convolution kernel sizes of multiple branches and different cavity convolution combinations with different cavity rates are added, so that the effect of integrating different-size receptive fields is achieved, and the advantages of being rapid and high in performance are achieved; finally, visible light pre-training and infrared retraining based on the transfer learning theory realize the convergence of an infrared face detection model and realize cross-library face detection under the infrared face detection model.

Aiming at the defects of the traditional visible light face detection technology, the embodiment provides a face detection technology based on a plurality of infrared wave band imaging, and the technology has the advantage of working in severe weather and is suitable for various environments such as night, sunny days, rain and snow and the like; the neural network model for high-performance single-stage target detection based on deep learning, which is provided by the embodiment, effectively fuses high-level and low-level feature graphs by constructing a hierarchical feature pyramid structure, fully exerts the advantages of the feature graphs of each level, and enhances the extracted features in the first step; according to the embodiment, aiming at the problem of receptive fields in face detection, a module for adding different convolution kernel sizes of multiple branches and cavity convolution combinations with different cavity rates is added, so that a plurality of receptive fields with different sizes can be effectively fused, characteristics are enhanced, and meanwhile, the utilization of effective information is also increased.

The set of complete face detection technology provided by the embodiment can solve the defects of narrow application range, low recognition performance, poor feature extraction robustness and the like of the traditional face detection technology; the embodiment provides new algorithm support for the face detection technology, so that the face detection technology becomes more practical, reliable and popular; the embodiment can be widely applied to the application occasions such as attendance checking, civil monitoring, public security law enforcement, access control, community entrance and the like in the outdoor, night, rain and snow and other complex environments.

Example two

On the basis of the first embodiment, please refer to fig. 4, fig. 4 is a flow chart of another infrared face detection method according to an embodiment of the present invention, and further, due to the problem of small data size of the infrared face data set, if only using it to train the infrared face detection model may cause the model to be over-fitted or under-fitted. Therefore, the embodiment proposes to use a large amount of available data to pretrain the infrared face detection model in the field with similar tasks, and then retrain the infrared face detection pretraining model with the current task data set, and because the tasks are similar, the solution space will have similarity, retraining will be relatively easy, and the generalization capability of the trained model will be better. Specifically, because the existing face data size under visible light is larger, before training an infrared face detection model according to an infrared face preprocessing image set to obtain a trained infrared face detection model, a visible face image is introduced as data close to the infrared face image to perform pre-training, and the infrared face detection method provided by the embodiment specifically includes the following steps:

Step 1, obtaining a visible light face image set.

Specifically, the embodiment selects the most popular visible light face data set in the current face detection field, namely WIDER FACE face data set, as the visible light face image set. Among them, WIDER FACE face data set is a more extensive face detection reference data set provided by hong Kong Chinese university.

And step 2, preprocessing the visible light face image set to obtain a visible light face preprocessing image set.

Specifically, in step 2 of the first embodiment, the face images in the visible face image set are normalized and contrast-adjusted, specifically, the visible face image set is subjected to gray-scale conversion and normalization processing to obtain a visible face pretreatment image set, wherein,

for converting a visible light face image into a gray image, the gray conversion formula is specifically designed as follows:

I _gray ＝0.2989×R+0.5870×G+0.1140×B (5)

wherein I is _gray For the output of the gray-scale image after gray-scale conversion, R, G, B is the RGB value corresponding to the visible light face image before gray-scale conversion, and R, G, B is the value of each channel of the visible light face image.

For the gray image I _gray Normalization processing is performed, which normalizes to [0, 255]The normalized formula design of (a) is specifically as follows:

Wherein I is _n1 I is normalized gray scale image _gray I is the image after the gradation conversion _max1 And I _min1 The maximum gray value and the minimum gray value of the visible light face image are respectively.

In this embodiment, the processing of the above formula (5) and formula (6) is performed on each visible light face image in the visible light face image set, so as to obtain a visible light face preprocessing image set.

And 3, constructing an infrared face detection model.

Specifically, the infrared face detection model in this embodiment is the infrared face detection model constructed in the step 3 in the first embodiment, and will not be described herein.

And 4, pre-training the infrared face detection model according to the visible light face pretreatment image set to obtain an infrared face detection pre-training model.

Specifically, in the infrared face detection model of the embodiment, as described above, the visible light face pretreatment image set is divided into a training set, a verification set and a test set, the infrared face detection model is pre-trained by the multi-task loss function based on the classification task loss function and the regression task loss function constructed by the formula (3) in the first embodiment, the pre-training process uses the training set and the verification set in the visible light face pretreatment image set, a preferred optimizer in the pre-training is SGD, the initial learning rate is 0.001, the batch size is 8, and the model which minimizes the loss of the verification set is continuously saved to determine the final infrared face detection pre-training model of the embodiment. Wherein the test set is used for subsequent generalization capability tests.

And step 5, retraining the infrared face detection pre-training model according to the infrared face pretreatment image set to obtain a trained infrared face detection model.

Specifically, after the pre-training of the visible face pretreatment image set in the step 4 to obtain the pre-training model for infrared face detection, the embodiment inputs the infrared face image set into the pre-training model for infrared face detection, and the infrared face image set is obtained in the step 1 of the embodiment and is required to be pretreated in the step 2 of the embodiment to obtain the infrared face pretreatment image set, and then the infrared face pretreatment image set is input into the pre-training model for infrared face detection to be retrained.

Because the training of the deep neural network depends on a large amount of high-quality marking data, but in practice, the situation of insufficient data is often encountered, the similarity among data, tasks or models is utilized in transfer learning, the model which is learned or trained in the old field can be applied to the new field, specifically, the embodiment performs transfer learning on the infrared face pretreatment image set to obtain a transferred infrared face pretreatment image set, and the transferred infrared face pretreatment image set is used for retraining of the infrared face detection pre-training model. The transfer learning is an existing common learning method, and is not described in detail herein.

Further, retraining the infrared face detection pre-training model according to the migrated infrared face pretreatment image set to obtain a trained infrared face detection model.

Specifically, the embodiment divides the migrated infrared face pretreatment image set into a training set, a verification set and a test set, retrains based on an infrared face detection pre-training model, wherein the training process uses the training set and the verification set in the migrated infrared face pretreatment image set, a preferred optimizer in training is SGD, the initial learning rate is 0.001, the batch size is 8, and the model which minimizes the loss of the verification set is continuously saved to determine the infrared face detection model which is finally trained in the embodiment. Wherein the test set is used for subsequent generalization capability tests.

And 6, performing generalization capability test on the trained infrared face detection model according to the plurality of infrared face image sets to obtain a face detection result so as to realize face detection.

Specifically, in order to illustrate the generalization capability of the detection method provided by the embodiment, in the embodiment, a plurality of infrared face image datasets are used for performing a cross-library test to ensure the generalization capability, specifically, different infrared face datasets are selected, the trained infrared face detection model obtained in the step 5 is subjected to the cross-library test, and the generalization capability of the infrared face detection model is verified.

In order to verify the superiority of the infrared face detection method provided by the embodiment, please refer to fig. 5a to 5b, and fig. 5a to 5b are schematic diagrams of examples of infrared face images in the infrared face detection method provided by the embodiment of the invention;

referring to fig. 6a to 6b, fig. 6a to 6b are schematic diagrams illustrating an example of preprocessing an infrared face image in an infrared face detection method according to an embodiment of the present invention, and specifically, the embodiment obtains an infrared face preprocessing image shown in fig. 6b after the processing of step 2 in fig. 6a (a partial view of fig. 5 b);

in the verification process of the embodiment, the parameter design of each layer of the main body part in the infrared face detection model is specifically shown in table 1, and the padding mode is 0 padding in the convolution process; the feature pyramid module adopts a sequence from top to bottom, for example, a convolution kernel with the size of 1 multiplied by 1 is used for adjusting the channel number of an upper layer feature map to the channel number of a next lower layer feature map, and then the feature maps are added to obtain a new feature map; the multi-branch hole convolution module adopts a convolution kernel with the size of 3 multiplied by 3 and hole convolution with the hole rate of 3 to be combined, so that the network can observe images by using different visual fields when passing through the module, and the module fuses the results obtained by different branches. The parameters of each layer in the infrared face detection model are designed according to the actual situation, and the face detection is performed according to the specific parameter design in table 1 in this embodiment.

TABLE 1 parameter design for each layer of the body portion in the infrared face detection model

Layer number	Layer name	Function of	Convolution kernel size	Number of input channels	Number of output channels
						1	Conv1_1	Convolutional layer	3x3		3	64
2	Conv1_2	Convolutional layer	3x3	64	64
						3	Maxpool1	Pooling layer	2x2	64	64
4	Conv2_1	Convolutional layer	3x3	64	128
						5	Conv2_2	Convolutional layer	3x3	128	128
6	Maxpool2	Pooling layer	3x3	128	128
						7	Conv3_1	Convolutional layer	3x3	128	256
8	Conv3_2	Convolutional layer	3x3	256	256
						9	Conv3_3	Convolutional layer	3x3	256	256
10	Maxpool3	Pooling layer	3x3	256	256
						11	Conv4_1	Convolutional layer	3x3	256	512
12	Conv4_2	Convolutional layer	3x3	512	512
						13	Conv4_3	Convolutional layer	3x3	512	512
14	Maxpool4	Pooling layer	2x2	512	512
						15	Conv5_1	Convolutional layer	3x3	512	512
16	Conv5_2	Convolutional layer	3x3	512	512
						17	Conv5_3	Convolutional layer	3x3	512	512
18	Maxpool5	Pooling layer	2x2	512	512
						19	Conv_fc6	Convolutional layer	3x3	512	1024
20	Conv_fc7	Convolutional layer	1x1	1024	1024
						21	Conv6_1	Convolutional layer	1x1	1024	256
22	Conv6_2	Convolutional layer	3x3	256	512
						23	Conv7_1	Convolutional layer	1x1	512	128
24	Conv7_2	Convolutional layer	3x3	128	256

In the training process of the embodiment, the algorithm, the parameters and the training data set involved in the pre-training and the training are shown in table 2; the visible light face data set used in the pre-training process of the embodiment is a WIDER FACE face data set, wherein 393703 face images are contained, large changes are introduced in the aspects of scale, posture, shielding, expression, decoration, illumination and the like, the data set is organized based on 6 event classes, and for each event class, the proportion of a training set, a verification set and a test set is 4:1:5; in the training process, the infrared face data set is divided into a training set, a verification set and a test set, wherein the ratio of the training set to the verification set to the test set is 10:1:1.

TABLE 2 design of relevant parameters for training of infrared face detection models

In the test process, the application judges that if the classification of the human face in the test picture is correct and the coincidence ratio of the frame and the human face exceeds 50%, the detection is successful, otherwise, the detection is unsuccessful.

The experiments designed in this example are contrasted from the following two aspects:

(1) In order to prove the generalization capability of an infrared face detection model obtained by training the infrared face detection method, the embodiment uses a PolyU infrared face data set and a TINDERS infrared face data set to perform cross-library generalization capability test and compares the cross-library generalization capability test with a currently mainstream face detection algorithm MTCNN based on deep learning:

referring to fig. 7a to 7c, fig. 7a to 7c are schematic diagrams illustrating detection results of an infrared face detection method according to an embodiment of the present invention, fig. 7a is a schematic diagram illustrating face detection results of a poly u infrared face data set, and fig. 7b and 7c are schematic diagrams illustrating face detection results of a tiders infrared face data set.

Referring to table 3, the second and third columns in table 3 list the results of the MTCNN method versus the detection rate of the present application, respectively. As can be seen from Table 3, the accuracy of the detection rate of the face detection method based on the infrared face detection model exceeds that of the MTCNN method on a plurality of libraries, and experiments prove that the face detection method based on infrared rays and deep learning is a method with good and reliable performance.

TABLE 3 comparison of MTCNN with the detection Rate of the present application

/>

(2) In order to further prove the rapidity of the infrared face detection model-based method, the MTCNN method and the detection time of the method are calculated through experiments. Please refer to table 4.

Table 4 results of comparison of MTCNN with the detection speeds of the present application

From table 4, it can be seen that the detection speed of the infrared face detection model of the present application also exceeds the current mainstream face detection method MTCNN based on deep learning. Experiments prove that the face detection method based on infrared rays and deep learning is a rapid method.

Therefore, according to the embodiment, the problem that the training data volume of the face data set under the infrared ray is small is solved by using the migration learning theory to perform model training, the similarity of the face detection under the visible light and the face detection under the infrared ray on the task is utilized, the face data set under the visible light is used for pre-training, then the collected infrared face detection data set is used for retraining, so that a final infrared face detection model is obtained, and further the cross-database face detection is realized by using the infrared face detection model.

The other infrared face detection method provided in this embodiment may be implemented according to the infrared face detection method embodiment described in the first embodiment, and its implementation principle and technical effects are similar, and will not be described herein again.

Example III

On the basis of the first embodiment, please refer to fig. 8, fig. 8 is a schematic structural diagram of an infrared face detection device according to an embodiment of the present invention. The embodiment provides an infrared face detection device, which comprises:

the first data acquisition module is used for acquiring an infrared face image set.

The first data preprocessing module is used for preprocessing the infrared face image set to obtain an infrared face preprocessing image set.

Specifically, in the first data preprocessing module of this embodiment, preprocessing is performed on the infrared face image set to obtain an infrared face preprocessed image set, including:

and carrying out image enhancement and normalization processing on the infrared face image set to obtain an infrared face pretreatment image set.

And the data model construction module is used for constructing an infrared face detection model.

Specifically, the infrared face detection model constructed in the data model construction module of the embodiment comprises a main body part of the infrared face detection model, a characteristic pyramid module and a multi-branch cavity convolution module which are connected in sequence, wherein,

the main body part of the infrared face detection model comprises a first convolution block, a second convolution block, a third convolution layer and a fourth convolution layer which are sequentially connected, wherein each first convolution block comprises a plurality of sub convolution layers and a pooling layer which are sequentially connected;

The feature pyramid module comprises a first feature fusion layer, a second feature fusion layer, a third feature fusion layer, a fourth feature fusion layer, a fifth feature fusion layer, a sixth feature fusion layer, a seventh feature fusion layer and a seventh feature fusion layer, wherein the first feature fusion layer, the second feature fusion layer, the third feature fusion layer, the fourth feature fusion layer, the fifth feature fusion layer, the sixth feature fusion layer, the seventh feature fusion layer, the fourth convolution layer and the eighth feature fusion layer are sequentially connected;

the multi-branch hole convolution module comprises a first hole convolution layer, a second hole convolution layer, a third hole convolution layer, a fourth hole convolution layer, a fifth hole convolution layer, a seventh feature fusion layer and a sixth hole convolution layer, wherein the first hole convolution layer, the second hole convolution layer, the third hole convolution layer, the fifth hole convolution layer, the seventh feature fusion layer and the eighth feature fusion layer are sequentially connected, the second hole convolution layer, the eighth feature fusion layer and the sixth convolution layer are connected.

The first model training module is used for training the infrared face detection model according to the infrared face pretreatment image set to obtain a trained infrared face detection model.

Specifically, in the first model training module of this embodiment, training the infrared face detection model according to the infrared face pretreatment image set to obtain a trained infrared face detection model includes:

constructing a multi-task loss function;

and preprocessing the image set according to the infrared face, and training the infrared face detection model by utilizing a multitasking loss function to obtain a trained infrared face detection model.

And the data detection module is used for carrying out generalization capability test on the trained infrared face detection model according to the plurality of infrared face image sets to obtain a face detection result so as to realize face detection.

The infrared face detection device provided in this embodiment may perform the infrared face detection method embodiment of the foregoing embodiment, and its implementation principle and technical effects are similar, and will not be described herein.

Example IV

On the basis of the second embodiment, please refer to fig. 9, fig. 9 is a schematic structural diagram of another infrared face detection device according to an embodiment of the present invention, and the embodiment provides another infrared face detection device, which includes:

And the second data acquisition module is used for acquiring the visible light face image set.

And the second data preprocessing module is used for preprocessing the visible light face image set to obtain a visible light face preprocessing image set.

Specifically, in the second data preprocessing module of this embodiment, preprocessing is performed on the visible light face image set to obtain a visible light face preprocessed image set, including:

and carrying out gray conversion and normalization processing on the visible light face image set to obtain a visible light face preprocessing image set.

The model pre-training module is used for pre-training the infrared face detection model according to the visible light face pre-processing image set to obtain an infrared face detection pre-training model.

And the second model training module is used for retraining the infrared face detection pre-training model according to the infrared face pretreatment image set to obtain a trained infrared face detection model.

Specifically, in the second model training module of this embodiment, retraining the pre-training model for infrared face detection according to the image set for infrared face pretreatment to obtain a trained infrared face detection model, including:

and retraining the infrared face detection pre-training model according to the migrated infrared face pretreatment image set to obtain a trained infrared face detection model.

The other infrared face detection device provided in this embodiment may execute the infrared face detection method embodiment described in the second embodiment, and its implementation principle and technical effects are similar, and will not be described herein.

Example five

On the basis of the third embodiment, please refer to fig. 10, fig. 10 is a schematic structural diagram of an infrared face detection electronic device according to an embodiment of the present invention. The embodiment provides an infrared face detection electronic device, which comprises an infrared image collector, a display, a processor, a communication interface, a memory and a communication bus, wherein the infrared image collector, the display, the processor, the communication interface and the memory are in communication with each other through the communication bus;

The infrared image collector is used for collecting image data;

the display is used for displaying the image identification data;

a memory for storing a computer program;

a processor for executing a computer program stored on a memory, the computer program when executed by the processor performing the steps of:

and step 1, controlling an image collector to collect face images, and obtaining an infrared face image set.

Specifically, in step 2 of this embodiment, preprocessing an infrared face image set to obtain an infrared face preprocessed image set includes:

And 3, constructing an infrared face detection model.

Specifically, the infrared face detection model constructed in step 3 of this embodiment includes a main body portion of the infrared face detection model, a feature pyramid module, and a multi-branch hole convolution module that are sequentially connected, wherein,

Specifically, in step 4 of the present embodiment, training the infrared face detection model according to the infrared face preprocessing image set to obtain a trained infrared face detection model includes:

constructing a multi-task loss function;

And 5, performing generalization capability test on the trained infrared face detection model according to the plurality of infrared face image sets to obtain a face detection result so as to realize face detection. And finally, outputting the face detection result to a display.

The infrared face detection electronic device provided in this embodiment may perform the infrared face detection method embodiment of the first embodiment and the infrared face detection device embodiment of the third embodiment, and its implementation principle and technical effects are similar, and are not described herein again.

Example six

On the basis of the fourth embodiment, please refer to fig. 10 again. The embodiment provides an infrared face detection electronic device, which comprises an image collector, a display, a processor, a communication interface, a memory and a communication bus, wherein the image collector, the display, the processor, the communication interface and the memory are in communication with each other through the communication bus;

The image collector is used for collecting image data;

the display is used for displaying the image identification data;

a memory for storing a computer program;

step 1, obtaining a visible light face image set.

Specifically, in step 2 of this embodiment, preprocessing a visible light face image set to obtain a visible light face preprocessed image set includes:

And 3, constructing an infrared face detection model.

Specifically, in step 5 of this embodiment, retraining the pre-training model for infrared face detection according to the image set for infrared face pretreatment to obtain a trained infrared face detection model, including:

For the infrared face pretreatment image set in step 5, including:

controlling an image collector to collect face images to obtain an infrared face image set; and preprocessing the infrared face image set to obtain an infrared face preprocessing image set. The method for preprocessing the infrared face image set to obtain the infrared face preprocessing image set comprises the following steps:

And 6, performing generalization capability test on the trained infrared face detection model according to the plurality of infrared face image sets to obtain a face detection result so as to realize face detection. And finally, outputting the face detection result to a display.

The other infrared face detection electronic device provided in this embodiment may perform the infrared face detection method embodiment described in the second embodiment and the infrared face detection device embodiment described in the fourth embodiment, and its implementation principle and technical effects are similar, and are not described here again.

The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims

1. An infrared face detection method, comprising:

acquiring an infrared face image set;

constructing an infrared face detection model;

performing generalization capability test on the trained infrared face detection model according to a plurality of infrared face image sets to obtain a face detection result so as to realize face detection;

the constructed infrared face detection model comprises a main body part, a characteristic pyramid module and a multi-branch cavity convolution module which are sequentially connected, wherein,

the feature pyramid module comprises a first feature fusion layer, a second feature fusion layer, a third convolution block and a fourth feature fusion layer, wherein the first feature fusion layer, the third convolution block and the fourth feature fusion layer are sequentially connected;

2. The method according to claim 1, further comprising, before training the infrared face detection model according to the set of infrared face pretreatment images to obtain a trained infrared face detection model:

acquiring a visible light face image set;

3. The method of claim 2, wherein preprocessing the set of visible face images to obtain a set of visible face preprocessed images comprises:

4. The method according to claim 2, wherein retraining the infrared face detection pre-training model according to the infrared face pretreatment image set to obtain the trained infrared face detection model includes:

5. The method of claim 1, wherein preprocessing the set of infrared face images to obtain a set of infrared face preprocessed images comprises:

6. The method of claim 1, wherein training the infrared face detection model according to the set of infrared face pretreatment images to obtain a trained infrared face detection model comprises:

constructing a multi-task loss function;

7. An infrared face detection apparatus, the apparatus comprising:

the data detection module is used for carrying out generalization capability test on the trained infrared face detection model according to a plurality of infrared face image sets to obtain a face detection result so as to realize face detection;

8. The infrared face detection apparatus as defined in claim 7, further comprising:

9. The infrared face detection electronic equipment is characterized by comprising an image collector, a display, a processor, a communication interface, a memory and a communication bus, wherein the image collector, the display, the processor, the communication interface and the memory are in communication with each other through the communication bus;

the image acquisition device is used for acquiring image data;

the display is used for displaying the image identification data;

the memory is used for storing a computer program;

the processor is configured to implement the infrared face detection method according to any one of claims 1 to 6 when executing the computer program stored on the memory.