Disclosure of Invention
In order to make up for the defect that the face detection under multispectral has blank and meet the requirements of image enhancement and detection speed under complex environments, the invention provides a multispectral face detection method based on a GPU.
The method adopts the technical scheme that a GPU based on a CUDA framework is adopted to operate synchronously recorded infrared light videos and visible light videos, the characteristics of human faces under an infrared light image and a visible light image are respectively detected, the infrared light detection result and the visible light detection result are synchronously fused, and the fused result is used as the human face characteristic to be output
The human face feature detection step in the visible light video comprises the following steps:
(1) extracting feature points of images in the video by adopting an LBP (local Binary pattern) operator mode;
(2) dividing the feature points extracted in the step (1) into three subclasses of positive posture, left posture and right posture by an SVM classifier according to different image postures represented by the feature points;
(3) and (3) classifying the feature points of the three subclasses in the step (2) by adopting an SVM mode again, and distinguishing the human face features from all the feature points by utilizing the difference between the human skin color chromaticity and the non-skin color chromaticity.
The human face feature detection step in the infrared light video comprises the following steps:
firstly, extracting feature points of images in a video by adopting an LBP (local Binary pattern) operator mode;
dividing the feature points extracted in the step I into three subclasses of positive posture, left posture and right posture by an SVM classifier according to the difference of image postures represented by the feature points;
and thirdly, operating the divided three subclasses in the step two by using an adaboost algorithm, and distinguishing and identifying the human face features.
The division rule of the human face gesture in the visible light or infrared light image is as follows: an image vertical direction axis is taken as an angle of 0 degree, clockwise is taken as a positive direction, a space between-30 degrees and 30 degrees is divided into a positive posture, a space between-90 degrees and-30 degrees is divided into a left posture, and a space between 30 degrees and 90 degrees is divided into a right posture.
When the detection and identification result of the infrared light content and the detection and identification result of the visible light content are fused, the detection results of the infrared image and the visible light image which are synchronously recorded at the same time are fused by referring to the recording time of the two videos, and the fusion formula is as follows:
wherein,
and
respectively human face characteristics detected by human faces in the visible light image and the infrared image,
for features of human face in visible and infrared images
Corresponding to the average luminance of the image area,
is the threshold value when judging that the average brightness of the image block is low,
Judging the threshold value when the average brightness of the image block is higher; in the above formula, if the detection results in the visible light image and the infrared image are the same, the detection result of the visible light image is taken; if the human face is not detected in the visible light image and the human face is detected in the infrared image, judging the brightness of the corresponding area in the visible light image, and if the human face detection fails due to overlarge or undersize brightness, taking the detection result of the infrared image as the final detection result.
The multispectral-based face detection method disclosed by the invention fuses face detection results of visible light images and infrared images. The detection method is not influenced by illumination, and the detected face image is an accurate visible light image. The face detection of the image in the severe environment can be realized. In the detection, an SVM classifier for classifying postures is established for the face postures, classification is carried out through the posture classifier, and face detection based on the adaboost algorithm is carried out on subclasses of each posture. The technology can detect the human faces in various postures in the infrared image, and breaks through the limitation that only the human face at the front side can be detected in the previous infrared image.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1, as shown in the flowchart of an embodiment of the method, the method adopts a GPU based on a CUDA framework to operate on synchronously recorded infrared light videos and visible light videos, respectively detects features of a human face under an infrared light image and a visible light image, synchronously fuses an infrared light detection result and a visible light detection result, and outputs the fused result as a human face feature.
In the method, the step of detecting the face features in the visible light video comprises the following steps:
(1) extracting feature points of images in the video by adopting an LBP (local Binary pattern) operator mode;
(2) dividing the feature points extracted in the step (1) into three subclasses of positive posture, left posture and right posture by an SVM classifier according to different image postures represented by the feature points;
(3) and (3) classifying the feature points of the three subclasses in the step (2) by adopting an SVM mode again, and distinguishing the human face features from all the feature points by utilizing the difference between the human skin color chromaticity and the non-skin color chromaticity.
Referring to fig. 2, the detection step of the face feature in the infrared light video in the above method includes:
firstly, extracting feature points of images in a video by adopting an LBP (local Binary pattern) operator mode;
dividing the feature points extracted in the step I into three subclasses of positive posture, left posture and right posture by an SVM classifier according to the difference of image postures represented by the feature points;
and thirdly, operating the divided three subclasses in the step two by using an adaboost algorithm, and distinguishing and identifying the human face features.
The division rule of the human face gesture in the visible light or infrared light image features is as follows: an image vertical direction axis is taken as an angle of 0 degree, clockwise is taken as a positive direction, a space between-30 degrees and 30 degrees is divided into a positive posture, a space between-90 degrees and-30 degrees is divided into a left posture, and a space between 30 degrees and 90 degrees is divided into a right posture.
In the method, when the detection and identification result of the infrared light content and the detection and identification result of the visible light content are fused, the detection results of the infrared image and the visible light image which are synchronously recorded at the same time are fused by referring to the recording time of the two videos, and the fusion formula is as follows:
wherein,
and
respectively human face characteristics detected by human faces in the visible light image and the infrared image,
for features of human face in visible and infrared images
Corresponding to the average luminance of the image area,
is the threshold value when the average luminance of the image block is judged to be low,
is to judge the average brightness of the image blockA threshold at a higher time; in the above formula, if the detection results in the visible light image and the infrared image are the same, the detection result of the visible light image is taken; if the human face is not detected in the visible light image and the human face is detected in the infrared image, judging the brightness of the corresponding area in the visible light image, and if the human face detection fails due to overlarge or undersize brightness, taking the detection result of the infrared image as the final detection result.
The method is a face detection method based on the combination of the visible light image and the infrared image, which is realized on the CUDA platform, integrates the advantages that the infrared image is not influenced by the illumination environment and the accuracy of the visible light image is high, and can ensure that the face can be detected quickly and accurately.
The whole process of the invention comprises three parts of infrared image face detection, visible light image face detection and detection result fusion.
Firstly, extracting a frame of image in an infrared video and a visible light video respectively, and extracting feature points of the images respectively by adopting an LBP operator.
LBP utilizes the joint distribution of each pixel in the image and P pixel points on the annular neighborhood with the radius of R
To describe the texture features of the image. Wherein
The gray value representing the center point of the local neighborhood,
the LBP operators are different for different (P, R) combinations corresponding to the gray values of P bisectors on a circle with radius R, see fig. 3A, 3B, 3C, 3 different LBP operators.
In order to realize the invariance of the texture features to the gray level, P equal divisions on the annular neighborhood are usedGray value of point
Gray value minus center point
Combined distributed T conversion to
And
independent of each other, the formula is approximately decomposed into
In the formula,
the gray distribution of the whole image is described, the local texture feature distribution of the image is not influenced, and therefore, the texture feature of the image can be described by the differential joint distribution, namely
When the illumination of the image is additively changed, the relative size of the gray value of the central pixel and the gray value of the pixels in the annular neighborhood of the central pixel is generally not changed, namely
Independent of additive variations in illumination, the texture of the image can be described by using a sign function of the difference between the central pixel and the neighboring pixels instead of a specific value, i.e. by using a sign function of the difference between the central pixel and the neighboring pixels
In the above formula: s is a symbolic function
The results obtained from the joint distribution T are ordered according to the specific order of the pixels on the ring-shaped neighborhood to form an 0/1 sequence, in this embodiment, the calculation is started in the counterclockwise direction by using the right neighborhood pixel of the central pixel as the starting pixel, and each item is given
Imparting a binomial factor
The Local spatial texture structure of a pixel can be represented as a unique decimal number called LBP, R-number, which is why the texture operator is called Local Binary Pattern (Local Binary Pattern), which can be calculated by the following formula
A specific LBP texture feature calculation process is described with reference to fig. 4, (in the figure, P =8, R = 1).
And (3) thresholding the template on the left side of the image 4, comparing each neighborhood pixel point with a central pixel (131), setting the pixel values to be greater than 0 to 1 and less than 0 to obtain an 0/1 table of the middle position, constructing a 0/1 sequence (10100101) by taking the lower right corner as the start in a counterclockwise sequence, finally calculating a corresponding decimal number (165), wherein the LBP texture characteristic value of the pixel point is 165, and solving the LBP characteristic value of each pixel in the image to obtain the LBP texture characteristic map of the image. Because the LBP texture features at the edge of the image are less affected by the neighborhood, the original pixel gray values are reserved for the pixel points at the edge of the image.
And after the LBP operator is adopted to extract all the image characteristics, an SVM classifier is adopted to classify the characteristic points in the infrared image and the visible light image respectively.
In the invention, the SVM classifier solves the problem of seeking an optimal interface in an original space. The mathematical model of the problem is as follows:
wherein
In order to be at intervals of time,
is the number of training samples and is,
is a vector of training samples that is,
is a vector of the weights that is,
is the threshold value of the threshold value,
in order to mark the sample, the sample is marked,
,
represents the first
And (4) class.
Constructing a Lagrangian function to obtain
Are respectively paired
Differentiating and substituting into Lagrange function to obtain
To find
Is optimized to obtain
Obtaining a dual Lagrange function
The original problem is converted into the following optimization problem
According to the theory of optimization,
for the KKT additional condition, only a few samples, i.e., support vectors, which are the most informative data in the data set, have non-zero lagrange multipliers. At this point, the SVM classifier classifies the feature points in the image.
After the first classification by the SVM, the infrared image is detected by adopting an adaboost algorithm.
The infrared image can avoid the influence of illumination on the detection algorithm, and the invention provides a multi-view face detection algorithm based on a continuous adaboost algorithm on the infrared image. A flow diagram of one embodiment of infrared image detection. Firstly, carrying out visual angle estimation on an image, and dividing the posture of the face into 3 subclasses by adopting a statistical learning method for the estimation of the posture of the face. The y-axis in the image is a 0-degree reference, and the image is divided into three subclasses of positive attitude, left attitude and right attitude, and the front angle is [ -30 ]。,30。]Right angle [30 ]。,90。]And left angle [ -90 ]。,-30。]. Extracting local binary pattern (local binary pattern) from the face samples of the three subclassesLBP) feature training support vector machine SVM. And carrying out posture classification on the input human face image through the SVM. The face is divided into a plurality of viewpoint subclasses according to the three-dimensional posture, a look-up phenotype weak classifier form with continuous confidence degree output is designed by using LBP characteristics for each subclass, a weak classifier space is constructed, and a waterfall type face detector based on the view is learned by adopting a continuous Adaboost algorithm. During face detection, after classification is completed, the adaboost face detectors of all the subclasses are called for detection.
The core of the AdaBoost algorithm is that a plurality of key weak classifiers are automatically screened from a weak classifier space by adjusting sample distribution and weak classifier weights, and are integrated into a strong classifier in a certain mode. The Adaboost learning algorithm is as follows:
Initialization: for each one
,
。
Obtaining basic classification rules
x-y
Updating:
here, the
Is a normalized constant such that
。
And (5) after iteration is finished, finally forming a cascade classifier as follows:
in the above-mentioned algorithm,
for the weights of the samples in the iterative method,
the device is a weak-type device,
is an integrated strong classifier.
And calculating the image by an AdaBoost algorithm to detect the characteristics of the human face.
And detecting the visible light image based on a skin color model and an SVM classifier.
A large number of experiments prove that the change range of human skin color and chroma is obviously different from non-skin color (such as hair and surrounding objects). There are many common color spaces that can express skin color, such as
Etc. we choose to use
The color space detects skin tones.
The space has the characteristic of separating the chrominance from the luminance, i.e. the space is to
(brightness) of the light emitted from the light source,
(blue chromaticity) and
(red chroma) separation. And in
The clustering characteristic of the skin color in the color space is better, the influence of brightness change is smaller, and the skin color distribution area can be better limited.
And
the coordinate correspondence relationship of mutual conversion is as follows:
selecting a large number of skin color samples for statistics, wherein the statistical distribution is selected to meet the requirement
A skin tone segmentation threshold. On a two-dimensional chromaticity plane, the area of skin color is more concentrated, and skin color pixels are subject to mean
Variance (variance)
A gaussian distribution of (a). A gaussian skin tone model can thus be built in YCbCr space. According to the Gaussian distribution of skin color in the chromaticity space, for each pixel in the color image, after the pixel is converted from the RGB color space to the YCbCr space, the probability that the point belongs to the skin area can be calculated. The formula is as follows:
wherein
And calculating the skin color likelihood of each pixel point by using the formula, multiplying each point by 255 to obtain the gray value of the point, and setting a proper threshold value to segment a skin area in the gray image to obtain a binary image. The divided binary image needs to be processed by a mathematical morphology method, and the application of the mathematical morphology can simplify the image data, keep the basic shape characteristics and remove irrelevant structures. And carrying out face detection based on SVM classification on the basis of the image obtained by the skin color model segmentation.
The specific method comprises the following steps: the training image is normalized to a standard 64 x 64 image, and the sample image is expanded into a one-dimensional vector to obtain a sample mean and covariance matrix:
decomposing the covariance matrix by singular value decomposition, and arranging the calculated eigenvalues in a monotonically decreasing order
Feature vector corresponding thereto
,
Is stretched into a subspace, is called
Is a characteristic face space. Any given face image x, may be projected into the subspace,
. This set of coefficients, which indicates the position of the image in the subspace, will be used as a new feature vector for training to the SVM. Projection coefficient of image
The quantity is called the main feature of the face pattern Xi, and the space formed by the main features is called the feature space. And inputting the obtained coefficients serving as feature vectors into the SVM for training. And selecting the radial basis function as a kernel function when training the support vector machine.
,
. For any training component
Face modelThe formula is defined as 1, and the non-face mode is defined as-1. The training function is
In the formula
Support vector is used;
is the corresponding weight;
is composed of
The corresponding class label;
is the number of support vectors.
And after the infrared image and the visible light image are detected for the image characteristics, fusing the detection results. When the detection and identification result of the infrared light content and the detection and identification result of the visible light content are fused, the detection results of the infrared image and the visible light image which are synchronously recorded at the same time are fused by referring to the recording time of the two videos, and the fusion formula is as follows:
wherein,
and
respectively human face characteristics detected by human faces in the visible light image and the infrared image,
for features of human face in visible and infrared images
Corresponding to the average luminance of the image area,
is the threshold value when the average luminance of the image block is judged to be low,
judging the threshold value when the average brightness of the image block is higher; in the above formula, if the detection results in the visible light image and the infrared image are the same, the detection result of the visible light image is taken; if the human face is not detected in the visible light image and the human face is detected in the infrared image, judging the brightness of the corresponding area in the visible light image, and if the human face detection fails due to overlarge or undersize brightness, taking the detection result of the infrared image as the final detection result.
The hardware platform for detection and operation is a GPU with a CUDA framework, has wide application range, can be applied to the fields of face recognition, new generation human-computer interfaces, safe access, visual monitoring, content-based retrieval and the like, and is generally regarded by researchers in recent years.
The face detection needs to be put to practical application, and the precision and the speed are two key problems which need to be solved urgently. Through the development of more than ten years after the 90 s of the 20 th century, the precision of face detection is greatly improved, but the speed is always the stumbling block which hinders the face detection to be practical. Therefore, researchers have made hard efforts. NVIDIA later introduced a parallel computing architecture that enabled GPUs to solve complex computational problems. It contains the CUDA Instruction Set Architecture (ISA) and the parallel computing engine inside the GPU. The system realizes a face detection algorithm on the framework of the CUDA. On a parallel processing architecture, the image is divided into grids in parallel, and the divided image data is sent to the GPU in parallel. The computer simultaneously judges the divided image data.
The multispectral face detection method based on the GPU fuses the face detection results of the visible light image and the infrared image. The method combines the characteristics of high accuracy and clear image of the visible light image and the advantage that the infrared image is not influenced by illumination, and accelerates the detection algorithm on the GPU based on the CUDA. In terms of algorithm, the problem of human face posture change is solved to a certain extent through a multi-angle human face detection algorithm. The problem of illumination change is solved by a multispectral face detection method.
The invention provides and realizes the multi-pose face detection method in the infrared image for the first time. Illumination transformation is a difficult problem to solve in face detection research. Infrared images are of interest because they are not affected by concerns. The front face image is less affected by illumination change, and the multi-pose face, especially the side face, is easily affected by illumination. The research of the literature finds that the face detection in the infrared image does not provide a multi-pose method. The method is based on the face detection method with multiple postures in the infrared image, the postures of the face are classified through the SVM, and the face detection is carried out by the adaboost algorithm for each class.
The invention combines the human face detection method of visible light image and infrared image, and fuses the detection result in the result layer. The method not only avoids the influence of illumination transformation on the human face detection algorithm, but also keeps the advantage of high accuracy of visible light images, and achieves the purpose of improving the human face detection accuracy.
The multispectral face detection method is implemented on the GPU based on the CUDA framework based on the GPU serving as an operation platform, and the purpose of high-speed detection of face images is achieved.
In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.