CN102622589A

CN102622589A - Multispectral face detection method based on graphics processing unit (GPU)

Info

Publication number: CN102622589A
Application number: CN2012100644917A
Authority: CN
Inventors: 李�灿
Original assignee: WISLAND TECHNOLOGY (BEIJING) Pte Ltd
Current assignee: Chen Yuchun
Priority date: 2012-03-13
Filing date: 2012-03-13
Publication date: 2012-08-01

Abstract

The invention discloses a multispectral face detection method based on a graphics processing unit (GPU). The GPU based on compute unified device architecture (CUDA) is used for calculating an infrared light video and a visible light video which are recorded synchronously, so that features of a face in an infrared light image and a visible light image are detected respectively; and an infrared light detection result and a visible light detection result are combined synchronously, and a combined result is used as a face feature of a human and output. According to the multispectral face detection method, the face detection results based on the infrared light image and the visible light image are combined. The detection method is not influenced by light, and a detected face image is an accurate visible light image; and a face in an image can be detected under a severe environment. During detection, a support vector machine (SVM) classifier for classifying attitudes of faces is constructed; and due to the classification of the attitude classifier, face detection based on an adaboos detection algorithm is performed on sub types of infrared images. According to the technology, faces with various attitudes in the infrared images can be detected; and the limitation that only front faces in the infrared images can be detected is broken through.

Description

Multispectral face detection method based on GPU

Technical Field

The invention relates to the technical field of human target query and search, in particular to a multispectral face detection method based on a GPU.

Background

At present, methods for unilaterally detecting human faces of visible light images and infrared images can be classified into four categories, specifically, 1) methods based on prior knowledge are introduced as follows. These a priori knowledge based methods encode the knowledge that makes up a typical face. Typically, the a priori knowledge contains the interrelationships between these facial features. The method is mainly used for face positioning. One difficulty with this approach is how to translate the face knowledge into well-defined criteria. If the criteria are too detailed, some faces will be missed because they do not pass all criteria. If the criteria are too coarse, many positive errors will likely be made. Furthermore, this method is difficult to extend to detecting faces of different poses, as it is difficult to enumerate all possible cases. On the other hand, the heuristic method has a good effect on detecting the positive human face in a specific scene. 2) A feature invariant method. The goal of these algorithms is to find some structural features that are present, which remain unchanged in the case of changes in pose, viewpoint, lighting conditions. These characteristics are then used to locate the face. These methods are mainly used for face localization. Researchers have been trying to find invariant quantities of faces, as opposed to knowledge-based methods, for detecting faces. Global features like skin color, size, shape etc. are used to find candidate faces, which are then verified with local features, such as eyebrows, nose and hair. One typical approach is to initially detect skin-like regions and then group and associate the face-like pixels using a combinatorial analysis or clustering algorithm. If the associated region has an elliptical or oval shape, then the region becomes a candidate face. 3) Template matching method. The method firstly stores several standard templates of a face to describe the whole face or part of the face. Detection is then performed by calculating the degree of correlation between the input image and the template that has been stored. These methods can be used for both face detection and face localization. Early Sakai et al created a face model using sub-templates of the eyes, nose, mouth and face contours. Miao et al propose a hierarchical template matching method for face detection. The input image is rotated according to a certain step length to form image layering, and edges are extracted by using Laplace transform. The face template contains the edges of six face structural elements: two eyebrows, two eyes, one nose, one mouth. And finally, determining whether the human face exists or not by using a heuristic method. Sinha describes the spatial characteristics of the face pattern using a set of spatial image invariants. When the variable changes the brightness of different parts of the face as the lighting changes, the relative brightness of these parts remains substantially unchanged. The pairwise brightness ratios of similar regions are determined, and the general trend of these ratios is retained, such as one region being brighter or darker than another, which provides us with a good invariance to the increase. Thus, the observed luminance law is encoded as a coarse face-space scale template containing appropriately selected sub-regions corresponding to the dominant facial features, such as eyes, cheeks, and forehead. The brightness limit between the face features is obtained through a set of proper bright-dark relation pairs between the sub-regions. 4) Appearance-based methods. Unlike template matching, the template is learned from a set of training images that should include representative variations in the appearance of a human face. These methods are mainly used for face detection. The appearance-based approach can be understood by using a probabilistic framework. An image vector or a feature vector acquired from an image is taken as a random variable x, and the value of x is determined by a conditional density function. The bayesian classification method and the maximum likelihood method can then be used to decide whether the candidate image location is a face or not. Another way to implement an appearance-based approach is to find a discriminant function between a face and a non-face. Conventionally, image patterns are mapped into a low-dimensional space and then discriminant functions are formed for classification, or a nonlinear decision surface is formed using a multi-layer neural network.

The development of the face detection technology in speed can be divided into several stages, and the improvement of detection precision and the face detection of various visual angles are mainly used as main contents in the initial research stage, so that the attention on speed is relatively low. Some representative line researches have a k-means clustering method, which establishes a plurality of face templates in a feature space by a clustering method and learns the distance between a training sample and each template by using a neural network. The human face detection method based on the skin color characteristics considers that the skin color of the human face has consistency and can be described by a uniform model. When skin color is used for face detection, different modeling methods can be adopted, and the methods mainly comprise a Gaussian model, a Gaussian mixture model, non-parameter estimation and the like. The nonparametric kernel function probability density estimation method can also be used for establishing a skin color model, and on the basis, the mean shift method can realize the detection and tracking of the face. The algorithm improves the detection speed of the human face and has certain robustness on shielding and illumination. But this method has difficulties in dealing with complex backgrounds and multiple faces. In order to solve the illumination problem, some scholars propose a method of compensating for different illumination and then detecting a skin color area. One turning point for improving the face detection speed is that the Adaboost and Cascade algorithms proposed by p. Viola realize a real-time face detection system, so that the face detection is practical in a real sense. Learning algorithms based on AdaBoost. It can select a small fraction of key features from a large set of features, resulting in an extremely efficient classifier. Systems based on Boosting and Cascade algorithms have great advantages in speed. On the basis of a series of proposed Haar-like characteristics, weak classifiers are learned through a Boosting algorithm and then combined into a strong classifier. However, generally, a strong classifier is not enough to complete the task satisfactorily, and a series of such strong classifiers are cascaded. However, to further improve the detection accuracy, more strong classifiers need to be cascaded, which reduces the detection speed. .

Disclosure of Invention

In order to make up for the defect that the face detection under multispectral has blank and meet the requirements of image enhancement and detection speed under complex environments, the invention provides a multispectral face detection method based on a GPU.

The method adopts the technical scheme that a GPU based on a CUDA framework is adopted to operate synchronously recorded infrared light videos and visible light videos, the characteristics of human faces under an infrared light image and a visible light image are respectively detected, the infrared light detection result and the visible light detection result are synchronously fused, and the fused result is used as the human face characteristic to be output

The human face feature detection step in the visible light video comprises the following steps:

(1) extracting feature points of images in the video by adopting an LBP (local Binary pattern) operator mode;

(2) dividing the feature points extracted in the step (1) into three subclasses of positive posture, left posture and right posture by an SVM classifier according to different image postures represented by the feature points;

(3) and (3) classifying the feature points of the three subclasses in the step (2) by adopting an SVM mode again, and distinguishing the human face features from all the feature points by utilizing the difference between the human skin color chromaticity and the non-skin color chromaticity.

The human face feature detection step in the infrared light video comprises the following steps:

firstly, extracting feature points of images in a video by adopting an LBP (local Binary pattern) operator mode;

dividing the feature points extracted in the step I into three subclasses of positive posture, left posture and right posture by an SVM classifier according to the difference of image postures represented by the feature points;

and thirdly, operating the divided three subclasses in the step two by using an adaboost algorithm, and distinguishing and identifying the human face features.

The division rule of the human face gesture in the visible light or infrared light image is as follows: an image vertical direction axis is taken as an angle of 0 degree, clockwise is taken as a positive direction, a space between-30 degrees and 30 degrees is divided into a positive posture, a space between-90 degrees and-30 degrees is divided into a left posture, and a space between 30 degrees and 90 degrees is divided into a right posture.

When the detection and identification result of the infrared light content and the detection and identification result of the visible light content are fused, the detection results of the infrared image and the visible light image which are synchronously recorded at the same time are fused by referring to the recording time of the two videos, and the fusion formula is as follows:

Figure 2012100644917100002DEST_PATH_IMAGE001

wherein,

Figure 2012100644917100002DEST_PATH_IMAGE002

and

Figure 2012100644917100002DEST_PATH_IMAGE003

respectively human face characteristics detected by human faces in the visible light image and the infrared image,for features of human face in visible and infrared images

Corresponding to the average luminance of the image area,

Figure 2012100644917100002DEST_PATH_IMAGE005

is the threshold value when judging that the average brightness of the image block is low，

Figure 2012100644917100002DEST_PATH_IMAGE006

Judging the threshold value when the average brightness of the image block is higher; in the above formula, if the detection results in the visible light image and the infrared image are the same, the detection result of the visible light image is taken; if the human face is not detected in the visible light image and the human face is detected in the infrared image, judging the brightness of the corresponding area in the visible light image, and if the human face detection fails due to overlarge or undersize brightness, taking the detection result of the infrared image as the final detection result.

The multispectral-based face detection method disclosed by the invention fuses face detection results of visible light images and infrared images. The detection method is not influenced by illumination, and the detected face image is an accurate visible light image. The face detection of the image in the severe environment can be realized. In the detection, an SVM classifier for classifying postures is established for the face postures, classification is carried out through the posture classifier, and face detection based on the adaboost algorithm is carried out on subclasses of each posture. The technology can detect the human faces in various postures in the infrared image, and breaks through the limitation that only the human face at the front side can be detected in the previous infrared image.

Drawings

FIG. 1 is a schematic flow chart of one embodiment of the method of the present invention;

FIG. 2 is a diagram illustrating an embodiment of face feature detection for infrared video in accordance with the present invention;

fig. 3A is a schematic diagram of an LBP operator with P =4, R = 1;

fig. 3B is a schematic diagram of an LBP operator with P =8, R = 1;

fig. 3C is a schematic diagram of an LBP operator with P =16, R = 1;

fig. 4 is a flowchart illustrating an LBP texture feature calculation process.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1, as shown in the flowchart of an embodiment of the method, the method adopts a GPU based on a CUDA framework to operate on synchronously recorded infrared light videos and visible light videos, respectively detects features of a human face under an infrared light image and a visible light image, synchronously fuses an infrared light detection result and a visible light detection result, and outputs the fused result as a human face feature.

In the method, the step of detecting the face features in the visible light video comprises the following steps:

Referring to fig. 2, the detection step of the face feature in the infrared light video in the above method includes:

The division rule of the human face gesture in the visible light or infrared light image features is as follows: an image vertical direction axis is taken as an angle of 0 degree, clockwise is taken as a positive direction, a space between-30 degrees and 30 degrees is divided into a positive posture, a space between-90 degrees and-30 degrees is divided into a left posture, and a space between 30 degrees and 90 degrees is divided into a right posture.

In the method, when the detection and identification result of the infrared light content and the detection and identification result of the visible light content are fused, the detection results of the infrared image and the visible light image which are synchronously recorded at the same time are fused by referring to the recording time of the two videos, and the fusion formula is as follows:

wherein,

and

respectively human face characteristics detected by human faces in the visible light image and the infrared image,

for features of human face in visible and infrared images

Corresponding to the average luminance of the image area,is the threshold value when the average luminance of the image block is judged to be low,

is to judge the average brightness of the image blockA threshold at a higher time; in the above formula, if the detection results in the visible light image and the infrared image are the same, the detection result of the visible light image is taken; if the human face is not detected in the visible light image and the human face is detected in the infrared image, judging the brightness of the corresponding area in the visible light image, and if the human face detection fails due to overlarge or undersize brightness, taking the detection result of the infrared image as the final detection result.

The method is a face detection method based on the combination of the visible light image and the infrared image, which is realized on the CUDA platform, integrates the advantages that the infrared image is not influenced by the illumination environment and the accuracy of the visible light image is high, and can ensure that the face can be detected quickly and accurately.

The whole process of the invention comprises three parts of infrared image face detection, visible light image face detection and detection result fusion.

Firstly, extracting a frame of image in an infrared video and a visible light video respectively, and extracting feature points of the images respectively by adopting an LBP operator.

LBP utilizes the joint distribution of each pixel in the image and P pixel points on the annular neighborhood with the radius of R

Figure 2012100644917100002DEST_PATH_IMAGE007

To describe the texture features of the image. Wherein

Figure 2012100644917100002DEST_PATH_IMAGE008

The gray value representing the center point of the local neighborhood,

Figure 2012100644917100002DEST_PATH_IMAGE009

the LBP operators are different for different (P, R) combinations corresponding to the gray values of P bisectors on a circle with radius R, see fig. 3A, 3B, 3C, 3 different LBP operators.

In order to realize the invariance of the texture features to the gray level, P equal divisions on the annular neighborhood are usedGray value of pointGray value minus center point

Figure 2012100644917100002DEST_PATH_IMAGE011

Combined distributed T conversion to

Figure 2012100644917100002DEST_PATH_IMAGE012

（1）

Andindependent of each other, the formula is approximately decomposed into

Figure 2012100644917100002DEST_PATH_IMAGE015

（2）

In the formula,

Figure 2012100644917100002DEST_PATH_IMAGE016

the gray distribution of the whole image is described, the local texture feature distribution of the image is not influenced, and therefore, the texture feature of the image can be described by the differential joint distribution, namely

Figure 2012100644917100002DEST_PATH_IMAGE017

（3）

When the illumination of the image is additively changed, the relative size of the gray value of the central pixel and the gray value of the pixels in the annular neighborhood of the central pixel is generally not changed, namely

Figure 2012100644917100002DEST_PATH_IMAGE018

Independent of additive variations in illumination, the texture of the image can be described by using a sign function of the difference between the central pixel and the neighboring pixels instead of a specific value, i.e. by using a sign function of the difference between the central pixel and the neighboring pixels

（4）

In the above formula: s is a symbolic function

The results obtained from the joint distribution T are ordered according to the specific order of the pixels on the ring-shaped neighborhood to form an 0/1 sequence, in this embodiment, the calculation is started in the counterclockwise direction by using the right neighborhood pixel of the central pixel as the starting pixel, and each item is given

Figure 2012100644917100002DEST_PATH_IMAGE021

Imparting a binomial factor

Figure 2012100644917100002DEST_PATH_IMAGE022

The Local spatial texture structure of a pixel can be represented as a unique decimal number called LBP, R-number, which is why the texture operator is called Local Binary Pattern (Local Binary Pattern), which can be calculated by the following formula

A specific LBP texture feature calculation process is described with reference to fig. 4, (in the figure, P =8, R = 1).

And (3) thresholding the template on the left side of the image 4, comparing each neighborhood pixel point with a central pixel (131), setting the pixel values to be greater than 0 to 1 and less than 0 to obtain an 0/1 table of the middle position, constructing a 0/1 sequence (10100101) by taking the lower right corner as the start in a counterclockwise sequence, finally calculating a corresponding decimal number (165), wherein the LBP texture characteristic value of the pixel point is 165, and solving the LBP characteristic value of each pixel in the image to obtain the LBP texture characteristic map of the image. Because the LBP texture features at the edge of the image are less affected by the neighborhood, the original pixel gray values are reserved for the pixel points at the edge of the image.

And after the LBP operator is adopted to extract all the image characteristics, an SVM classifier is adopted to classify the characteristic points in the infrared image and the visible light image respectively.

In the invention, the SVM classifier solves the problem of seeking an optimal interface in an original space. The mathematical model of the problem is as follows:

Figure 2012100644917100002DEST_PATH_IMAGE025

Figure 2012100644917100002DEST_PATH_IMAGE026

Figure 2012100644917100002DEST_PATH_IMAGE027

Figure 2012100644917100002DEST_PATH_IMAGE028

whereinIn order to be at intervals of time,

is the number of training samples and is,

Figure 2012100644917100002DEST_PATH_IMAGE031

is a vector of training samples that is,

is a vector of the weights that is,

is the threshold value of the threshold value,

Figure 2012100644917100002DEST_PATH_IMAGE034

in order to mark the sample, the sample is marked,

Figure 2012100644917100002DEST_PATH_IMAGE035

Figure 2012100644917100002DEST_PATH_IMAGE036

，

Figure 2012100644917100002DEST_PATH_IMAGE037

represents the first

And (4) class.

Constructing a Lagrangian function to obtain

Are respectively paired

Figure 2012100644917100002DEST_PATH_IMAGE040

Differentiating and substituting into Lagrange function to obtain

=

Figure 2012100644917100002DEST_PATH_IMAGE042

To find

Is optimized to obtain

Figure 2012100644917100002DEST_PATH_IMAGE044

Obtaining a dual Lagrange function

The original problem is converted into the following optimization problem

Figure 2012100644917100002DEST_PATH_IMAGE046

Figure 2012100644917100002DEST_PATH_IMAGE048

，

Figure 2012100644917100002DEST_PATH_IMAGE050

，，

Figure 2012100644917100002DEST_PATH_IMAGE052

According to the theory of optimization,

for the KKT additional condition, only a few samples, i.e., support vectors, which are the most informative data in the data set, have non-zero lagrange multipliers. At this point, the SVM classifier classifies the feature points in the image.

After the first classification by the SVM, the infrared image is detected by adopting an adaboost algorithm.

The infrared image can avoid the influence of illumination on the detection algorithm, and the invention provides a multi-view face detection algorithm based on a continuous adaboost algorithm on the infrared image. A flow diagram of one embodiment of infrared image detection. Firstly, carrying out visual angle estimation on an image, and dividing the posture of the face into 3 subclasses by adopting a statistical learning method for the estimation of the posture of the face. The y-axis in the image is a 0-degree reference, and the image is divided into three subclasses of positive attitude, left attitude and right attitude, and the front angle is [ -30 ]^。,30^。]Right angle [30 ]^。,90^。]And left angle [ -90 ]^。,-30^。]. Extracting local binary pattern (local binary pattern) from the face samples of the three subclassesLBP) feature training support vector machine SVM. And carrying out posture classification on the input human face image through the SVM. The face is divided into a plurality of viewpoint subclasses according to the three-dimensional posture, a look-up phenotype weak classifier form with continuous confidence degree output is designed by using LBP characteristics for each subclass, a weak classifier space is constructed, and a waterfall type face detector based on the view is learned by adopting a continuous Adaboost algorithm. During face detection, after classification is completed, the adaboost face detectors of all the subclasses are called for detection.

The core of the AdaBoost algorithm is that a plurality of key weak classifiers are automatically screened from a weak classifier space by adjusting sample distribution and weak classifier weights, and are integrated into a strong classifier in a certain mode. The Adaboost learning algorithm is as follows:

sample set

Figure 2012100644917100002DEST_PATH_IMAGE054

，

。

Initialization: for each one

Figure 2012100644917100002DEST_PATH_IMAGE056

，。

Obtaining basic classification rules

Figure 2012100644917100002DEST_PATH_IMAGE058

x-y

Updating:

here, the

Figure 2012100644917100002DEST_PATH_IMAGE060

Is a normalized constant such that

。

And (5) after iteration is finished, finally forming a cascade classifier as follows:

Figure 2012100644917100002DEST_PATH_IMAGE062

in the above-mentioned algorithm,

for the weights of the samples in the iterative method,the device is a weak-type device,

is an integrated strong classifier.

And calculating the image by an AdaBoost algorithm to detect the characteristics of the human face.

And detecting the visible light image based on a skin color model and an SVM classifier.

A large number of experiments prove that the change range of human skin color and chroma is obviously different from non-skin color (such as hair and surrounding objects). There are many common color spaces that can express skin color, such asEtc. we choose to use

The color space detects skin tones.

Figure 2012100644917100002DEST_PATH_IMAGE068

The space has the characteristic of separating the chrominance from the luminance, i.e. the space is to

(brightness) of the light emitted from the light source,

Figure 2012100644917100002DEST_PATH_IMAGE070

(blue chromaticity) and

(red chroma) separation. And in

The clustering characteristic of the skin color in the color space is better, the influence of brightness change is smaller, and the skin color distribution area can be better limited.

Figure 2012100644917100002DEST_PATH_IMAGE072

And

the coordinate correspondence relationship of mutual conversion is as follows:

selecting a large number of skin color samples for statistics, wherein the statistical distribution is selected to meet the requirement

Figure 2012100644917100002DEST_PATH_IMAGE074

A skin tone segmentation threshold. On a two-dimensional chromaticity plane, the area of skin color is more concentrated, and skin color pixels are subject to mean

Variance (variance)A gaussian distribution of (a). A gaussian skin tone model can thus be built in YCbCr space. According to the Gaussian distribution of skin color in the chromaticity space, for each pixel in the color image, after the pixel is converted from the RGB color space to the YCbCr space, the probability that the point belongs to the skin area can be calculated. The formula is as follows:

wherein

Figure 2012100644917100002DEST_PATH_IMAGE078

，

，

Figure 2012100644917100002DEST_PATH_IMAGE080

，

And calculating the skin color likelihood of each pixel point by using the formula, multiplying each point by 255 to obtain the gray value of the point, and setting a proper threshold value to segment a skin area in the gray image to obtain a binary image. The divided binary image needs to be processed by a mathematical morphology method, and the application of the mathematical morphology can simplify the image data, keep the basic shape characteristics and remove irrelevant structures. And carrying out face detection based on SVM classification on the basis of the image obtained by the skin color model segmentation.

The specific method comprises the following steps: the training image is normalized to a standard 64 x 64 image, and the sample image is expanded into a one-dimensional vector to obtain a sample mean and covariance matrix:

Figure 2012100644917100002DEST_PATH_IMAGE082

，

decomposing the covariance matrix by singular value decomposition, and arranging the calculated eigenvalues in a monotonically decreasing order

Figure 2012100644917100002DEST_PATH_IMAGE084

Feature vector corresponding thereto

,

Figure 2012100644917100002DEST_PATH_IMAGE086

Is stretched into a subspace, is called

Is a characteristic face space. Any given face image x, may be projected into the subspace,

Figure 2012100644917100002DEST_PATH_IMAGE088

. This set of coefficients, which indicates the position of the image in the subspace, will be used as a new feature vector for training to the SVM. Projection coefficient of image

The quantity is called the main feature of the face pattern Xi, and the space formed by the main features is called the feature space. And inputting the obtained coefficients serving as feature vectors into the SVM for training. And selecting the radial basis function as a kernel function when training the support vector machine.

Figure 2012100644917100002DEST_PATH_IMAGE090

，

. For any training component

Face modelThe formula is defined as 1, and the non-face mode is defined as-1. The training function is

Figure 2012100644917100002DEST_PATH_IMAGE092

，

。

In the formula

Support vector is used;

Figure 2012100644917100002DEST_PATH_IMAGE094

is the corresponding weight;

is composed of

The corresponding class label;

is the number of support vectors.

And after the infrared image and the visible light image are detected for the image characteristics, fusing the detection results. When the detection and identification result of the infrared light content and the detection and identification result of the visible light content are fused, the detection results of the infrared image and the visible light image which are synchronously recorded at the same time are fused by referring to the recording time of the two videos, and the fusion formula is as follows:

wherein,

andrespectively human face characteristics detected by human faces in the visible light image and the infrared image,for features of human face in visible and infrared images

Corresponding to the average luminance of the image area,

is the threshold value when the average luminance of the image block is judged to be low,

The hardware platform for detection and operation is a GPU with a CUDA framework, has wide application range, can be applied to the fields of face recognition, new generation human-computer interfaces, safe access, visual monitoring, content-based retrieval and the like, and is generally regarded by researchers in recent years.

The face detection needs to be put to practical application, and the precision and the speed are two key problems which need to be solved urgently. Through the development of more than ten years after the 90 s of the 20 th century, the precision of face detection is greatly improved, but the speed is always the stumbling block which hinders the face detection to be practical. Therefore, researchers have made hard efforts. NVIDIA later introduced a parallel computing architecture that enabled GPUs to solve complex computational problems. It contains the CUDA Instruction Set Architecture (ISA) and the parallel computing engine inside the GPU. The system realizes a face detection algorithm on the framework of the CUDA. On a parallel processing architecture, the image is divided into grids in parallel, and the divided image data is sent to the GPU in parallel. The computer simultaneously judges the divided image data.

The multispectral face detection method based on the GPU fuses the face detection results of the visible light image and the infrared image. The method combines the characteristics of high accuracy and clear image of the visible light image and the advantage that the infrared image is not influenced by illumination, and accelerates the detection algorithm on the GPU based on the CUDA. In terms of algorithm, the problem of human face posture change is solved to a certain extent through a multi-angle human face detection algorithm. The problem of illumination change is solved by a multispectral face detection method.

The invention provides and realizes the multi-pose face detection method in the infrared image for the first time. Illumination transformation is a difficult problem to solve in face detection research. Infrared images are of interest because they are not affected by concerns. The front face image is less affected by illumination change, and the multi-pose face, especially the side face, is easily affected by illumination. The research of the literature finds that the face detection in the infrared image does not provide a multi-pose method. The method is based on the face detection method with multiple postures in the infrared image, the postures of the face are classified through the SVM, and the face detection is carried out by the adaboost algorithm for each class.

The invention combines the human face detection method of visible light image and infrared image, and fuses the detection result in the result layer. The method not only avoids the influence of illumination transformation on the human face detection algorithm, but also keeps the advantage of high accuracy of visible light images, and achieves the purpose of improving the human face detection accuracy.

The multispectral face detection method is implemented on the GPU based on the CUDA framework based on the GPU serving as an operation platform, and the purpose of high-speed detection of face images is achieved.

In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A multispectral face detection method based on a GPU is characterized in that the method adopts the GPU based on a CUDA framework to operate synchronously recorded infrared light videos and visible light videos, the characteristics of a face under an infrared light image and a visible light image are respectively detected, the infrared light detection result and the visible light detection result are synchronously fused, and the fused result is used as the face characteristic of a person to be output.

2. The GPU-based multispectral face detection method according to claim 1, wherein the GPU detects the face features in the visible light video by:

3. The GPU-based multispectral face detection method according to claim 1, wherein the GPU detects the face features in the ir video by:

4. A GPU-based multispectral face detection method according to claim 2 or 3, wherein the classification rule for the face pose in the visible or infrared light image is: an image vertical direction axis is taken as an angle of 0 degree, clockwise is taken as a positive direction, a space between-30 degrees and 30 degrees is divided into a positive posture, a space between-90 degrees and-30 degrees is divided into a left posture, and a space between 30 degrees and 90 degrees is divided into a right posture.

5. The GPU-based multispectral face detection method according to claim 1, wherein when the detection and identification result of the infrared light content and the detection and identification result of the visible light content are fused, the detection and identification result of the infrared image and the detection result of the visible light image that are synchronously recorded at the same time are fused with reference to the recording time of the two videos, and the fusion formula is as follows:

wherein,