CN106886763B

CN106886763B - System and method for detecting human face in real time

Info

Publication number: CN106886763B
Application number: CN201710065482.2A
Authority: CN
Inventors: 陈杰春; 赵丽萍; 田景
Original assignee: Northeast Dianli University
Current assignee: Northeast Electric Power University
Priority date: 2017-01-20
Filing date: 2017-01-20
Publication date: 2020-02-18
Anticipated expiration: 2037-01-20
Also published as: CN106886763A

Abstract

The invention discloses a system and a method for detecting a human face in real time. The face detection system comprises a main detector and a branch detector. Further, the main detector comprises a front-end main classifier and a rear-end main classifier, the front-end main classifier and the rear-end main classifier are both two-class classifiers, a face output end of the front-end main classifier is connected with an input end of the rear-end main classifier, and a non-face output end of the rear-end main classifier is connected with an input end of the branch detector. Further, the branch detector includes a front-end branch classifier and a back-end branch classifier. Correspondingly, the invention also provides a face detection method. The face detection system and method disclosed by the invention can ensure that the detection speed is fast enough, and can ensure that the recall rate is high enough and the false detection rate is low enough.

Description

System and method for detecting human face in real time

Technical Field

The invention relates to the field of digital image processing, in particular to a face detection technology.

Background

In recent years, face detection technology is one of the research hotspots in the field of digital image processing, because it plays an important role in a variety of applications. For example, when a digital camera is used to take a picture, the automatic focusing of the camera can be realized by detecting the face of the person in real time, so that the face of the person in the taken picture is relatively clear. In addition, the face detection technology is also a technology that must be used in face recognition. Only the region containing the human face is accurately positioned in the image, the characteristic information of human face organs can be extracted from the image, and the human face recognition is further realized.

The international conference on "CVPR 2001" Detection using a boost Features of Simple Features "introduced a method for detecting human faces using a cascade classifier, which is proposed by Paul Viola and Michael Jones. The cascade classifier is a classifier formed by connecting a plurality of strong classifiers in series, wherein each stage of strong classifier is trained by a weak classifier by using an AdaBoost method. The detection speed of the face detection method is high, because most detection windows are filtered by the first several stages of classifiers in the cascade classifier when the face is detected. In addition, when the numerical value of the Harr characteristic is calculated, the method also uses an integral image technology, so that the calculation efficiency of the numerical value of the Haar characteristic is higher. However, the recall rate of the face detection method is not high enough, and particularly, the effect of detecting partially occluded faces and side faces is not ideal. Since Paul Viola and Michael Jones proposed this face detection method, further improvements were attempted to it from two sides: (1) adopting different image characteristics; (2) the structure of the cascade classifier is changed.

An ICB 2007 international conference paper document 'Face Detection Based on Multi-Block LBPRdelivering' introduces a Face Detection method, and MB-LBP image features are introduced into a cascade classifier by the Face Detection method. A face detection method is also introduced in the international conference on topic of CVPR 2008, namely the Local Assembled Binary (LAB) feature with feature-centralized template for the face and the access face detection, and the LAB image features are introduced into the cascade classification. The recall rate of these two face detection methods is relatively improved, but is still less than ideal. The human Face detection method introduced in "A Fast and AccurateUnconstied Face Detector" in the 2 nd phase of academic journal 2016 of IEEE TRANSACTIONS ON PATTERN ANALYSISAND MACHINE INTELLIGENCE is also a human Face detection method based ON a cascade classifier. In the method, NPD image features are adopted, and meanwhile, decision trees in the cascade classifiers are improved. The detection speed and the recall rate of the method are ideal, but the false alarm rate is relatively high. The Chinese patent publication No. CN105718868A, published as 2016, 06, 29, entitled "a face detection system and method for multi-pose faces", discloses a face detection method, which combines a cascade classifier based on LAB image features and a multilayer sensor based on SURF image features to construct a face detector with a funnel structure. The recall rate and the false alarm rate of the method are ideal, but the detection speed is slow.

In addition to cascade classifier based face detection methods, other types of face detection methods have also been explored. For example, in the international conference paper of "CVPR 2015", a Convolutional Neural Network for Face Detection "describes a Convolutional Neural Network-based (CNN) Face Detection method. A face detection method based on a Deformable organ Model is introduced in the international conference on topic paper of BTAS 2015. Both methods are highly influential methods, their recall rate is high and false alarm rate is low, but their detection speed is very slow, so that their practical value is limited.

In summary, there is no absolutely satisfactory method for the existing face detection method. Some methods have slow detection speed; some methods have unsatisfactory recall rate; some methods have high false detection rate.

Disclosure of Invention

It is an object of the present invention to provide a face detection system and method thereof that overcomes the above-mentioned problems.

The technical scheme adopted for realizing one of the purposes of the invention is as follows: a face detection system comprises a main detector and a branch detector, wherein the main detector comprises a front-end main classifier and a rear-end main classifier, the front-end main classifier and the rear-end main classifier are two-class classifiers, a face output end of the front-end main classifier is connected with an input end of the rear-end main classifier, and a non-face output end of the rear-end main classifier is connected with an input end of the branch detector.

Preferably, the branch detector comprises a front-end branch classifier and a rear-end branch classifier, the front-end branch classifier and the rear-end branch classifier are both two types of classifiers, and a face output end of the front-end branch classifier is connected with an input end of the rear-end branch classifier.

Preferably, the correct rejection rate of the front-end main classifier is greater than or equal to 98.00% and less than or equal to 99.98%, the correct recognition rate is greater than or equal to 98.50% and less than or equal to 99.5%, the correct rejection rate of the rear-end main classifier is greater than or equal to 99.60% and less than or equal to 99.99%, the correct recognition rate is greater than or equal to 86.00% and less than or equal to 99.20%, and both the correct rejection rate and the correct recognition rate of the branch detector are greater than or equal to 99.9%.

Preferably, the correct rejection rate of the front-end branch classifier is greater than or equal to 80.00% and less than or equal to 99.50%, the correct recognition rate is greater than or equal to 99.20% and less than or equal to 99.80%, and both the correct rejection rate and the correct recognition rate of the back-end branch classifier are greater than or equal to 99.9%.

Preferably, the main detector is an n-th order deep cascade classifier, wherein 1 st to mth order classifiers are used as the front-end main classifier, m +1 st to nth order classifiers are used as the back-end main classifier, m and n are two integers, and m < n.

Preferably, the main detector is an n-th order deep cascade classifier, wherein 1 st to m-th order classifiers are used as the front-end main classifier, m +1 st to n-th order classifiers are used as the back-end main classifier, m and n are two integers, and m < n, and the front-end branch classifier includes 1 shallow cascade classifier or more than 2 shallow cascade classifiers connected in series.

Preferably, the front-end main classifier and the back-end main classifier employ image features that can be computed quickly, including Haar features, LBP features, LAB features, or global binary features.

Preferably, the front-end main classifier, the back-end main classifier and the front-end branch classifier adopt image features capable of being rapidly calculated, the image features capable of being rapidly calculated comprise Haar features, LBP features, LAB features or global binary features, and the front-end main classifier and the front-end branch classifier adopt different types of image features.

Preferably, the global binary feature is an image feature based on gray-scale values of pixels of a gray-scale image, and the numerical calculation step is as follows:

step 1, obtaining the gray values of 1 threshold pixel and more than 2 binarization pixels from a gray level image, wherein the threshold pixel is any one pixel in the image, and the binarization pixels are pixels which are sequentially connected in the image;

step 2, calculating the numerical value of the global binary characteristic according to the following formula:

in the formula: GBF stands for Global binary featureM represents the number of binarized pixels, I_bkRepresenting the gray value of the kth binarized pixel, I_tRepresenting a gray value of a threshold pixel;

the second technical scheme for realizing the purpose of the invention is as follows: a face detection method comprises the following steps:

step 1101, zooming an image to form an image pyramid;

step 1102, moving a detection window in each image of the image pyramid according to a specified step length, and establishing a detection window set;

1103, judging whether each detection window in the detection window set contains a human face by using the human face detection system of the invention;

1104, placing a detection window containing a human face in a human face window set;

step 1105, combining the detection windows in the face window set,

the step 1103 includes the steps of:

step 1201, judging whether the detection window contains a human face by the front-end main classifier;

step 1202, if the detection window contains a face, step 1203 is executed, otherwise step 1209 is executed;

step 1203, judging whether the detection window contains a human face or not by the rear-end main classifier;

step 1204, if the detection window contains a human face, step 1210 is executed, otherwise step 1205 is executed;

step 1205, the front branch classifier determines whether the detection window contains a human face

Step 1206, if the detection window contains a face, executing step 1207, otherwise executing step 1209;

step 1207, judging whether the detection window contains a human face or not by the rear-end branch classifier;

step 1208, if the detection window contains a face, executing step 1210, otherwise executing step 1209;

step 1209, filtering out the detection window, and executing step 1211;

step 1210, placing the detection window in a face window set;

and step 1211, ending.

Due to the adoption of the technical scheme, the face detection system and the face detection method provided by the invention have the beneficial effects that: the method can ensure that the detection speed is fast enough, and can also ensure that the recall rate is high enough and the false alarm rate is low enough.

Drawings

FIG. 1 shows a schematic diagram of a face detection system according to an embodiment of the invention;

FIG. 2 shows a schematic diagram of a face detection system according to another embodiment of the invention;

FIG. 3 is a schematic diagram of a face detection system according to yet another embodiment of the invention;

FIG. 4 is a schematic diagram illustrating the global binary feature of the present invention;

FIG. 5 is a schematic diagram of the four-bit square global binary feature of the present invention;

FIG. 6 is a schematic diagram of the four-bit horizontal line segment global binary feature of the present invention;

FIG. 7 is a diagram illustrating the four-bit vertical segment global binary feature of the present invention;

FIG. 8 is a schematic diagram of the four-bit diagonal global binary feature of the present invention;

FIG. 9 is a diagram illustrating the four bit back-slashed segment global binary feature of the present invention;

FIG. 10 is a schematic diagram of a face detection system according to yet another embodiment of the invention;

FIG. 11 is a flow diagram illustrating a face detection method according to an embodiment of the invention;

fig. 12 is a flowchart of a method for determining whether each detection window in the detection window set contains a human face by using the human face detection system of the present invention.

Detailed Description

The technical scheme in the embodiment of the invention is clearly and completely described by combining the attached drawings in the embodiment of the invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

FIG. 1 shows a schematic diagram of a face detection system according to an embodiment of the invention. As shown in fig. 1, the face detection system of the present invention includes a main detector 11 and a branch detector 12. Further, the main detector 11 comprises 1 front-end main classifier 111 and 1 back-end main classifier 112. The front-end main classifier 111 and the back-end main classifier 112 are both class-two classifiers, and their two outputs are a face output (outputting a detection window containing a face) and a non-face output (outputting a detection window containing no face). The face output of the front-end main classifier 111 is connected to an input of the back-end main classifier 112, and the non-face output of the back-end main classifier 112 is connected to an input of the branch detector 12.

When the face detection system is used, an image is firstly scaled according to a certain proportion to form an image pyramid. Then, in each image of the image pyramid, the detection windows are moved in the order from top to bottom and from left to right with a specified step size. And in the process of moving the detection window, a human face detection system is used for respectively detecting whether the window contains a human face. Finally, the detection windows are merged again using methods such as Non-maximum Suppression (Non-maximum Suppression).

The front-end main classifier 111 is used for filtering most of detection windows not including human faces, and is a classifier with high operation speed, moderate correct rejection rate and high correct recognition rate, for example, the correct rejection rate is greater than or equal to 98.00% and less than or equal to 99.98%, and the correct recognition rate is greater than or equal to 98.50% and less than or equal to 99.5%. Wherein, the correct rejection rate and the correct recognition rate are calculated according to the following formulas:

the rear-end main classifier 112 is a classifier having a high operation speed, a high correct rejection rate and a moderate correct recognition rate, for example, the correct rejection rate is greater than or equal to 99.60% and less than or equal to 99.99%, and the correct recognition rate is greater than or equal to 86.00% and less than or equal to 99.20%. It further filters the detection window without human face, and then correctly outputs the detection window with human face. The detection windows that are filtered out by back-end main classifier 112 may include both non-face windows and face windows. Typically, the faces contained in these face windows filtered out by the back-end main classifier 112 are difficult to detect, such as partially occluded faces or side faces. The branch detector 12 is a classifier with a high correct rejection rate and a high correct recognition rate, for example, the correct rejection rate is greater than or equal to 99.9%, and the correct recognition rate is greater than or equal to 99.9%, and is used for further selecting detection windows containing human faces from the detection windows filtered by the back-end main classifier 112.

As mentioned above, the front-end main classifier 111 is a classifier with fast operation speed, moderate correct rejection rate and high correct recognition rate, so that it can pass most of the detection windows containing human faces through detection in the case of filtering out most of the detection windows not containing human faces. The back-end main classifier 112 is a classifier with fast operation speed, high correct rejection rate and moderate correct recognition rate, and can detect most of the face windows which are easy to detect, and at the same time, make the proportion of the windows which are detected incorrectly (i.e. the detection windows which do not contain the face are regarded as the detection windows which contain the face) small. The branch classifier 12 is a classifier having a high correct rejection rate and a high correct recognition rate. The role of the branch detector 12 is to detect detection windows containing faces from those filtered out by the back-end main classifier 112. The detection windows contain faces that are mostly difficult to detect, such as partially occluded faces and side faces. Normally, classifiers with high correct rejection rate and high correct recognition rate are slow to run, but the number of windows that need to be detected by the branch detector 12 is small, for example, less than 200, because the front-end main classifier 111 has filtered out most of the detection windows that do not contain human faces. Therefore, the face detection system designed according to the scheme can ensure that the detection speed is fast enough, and the correct rejection rate and the correct recognition rate are high enough.

As shown in fig. 2, according to another embodiment of the present invention, the branch detector 12 further includes 1 front-end branch classifier 121 and 1 back-end branch classifier 122. The front-end branch classifier 121 and the back-end branch classifier 122 are both class-two classifiers, and their two outputs are a face output (outputting a detection window containing a face) and a non-face output (outputting a detection window not containing a face). The face output of the front-end branch classifier 121 is connected to the input of the back-end branch classifier 122.

The front-end main classifier 111 and the front-end branch classifier 121 have the same function, and are used for filtering out detection windows that do not contain human faces, thereby reducing the number of detection windows processed by subsequent classifiers. They may or may not be the same type of classifier. If they are the same type of classifier, different sets of image features need to be used in order to examine the detection window from different angles.

The front-end branch classifier 121 is a classifier with a fast operation speed, a moderate correct rejection rate and a high correct recognition rate, for example, the correct rejection rate is greater than or equal to 80.00% and less than or equal to 99.50%, and the correct recognition rate is greater than or equal to 99.20% and less than or equal to 99.80%, and is used for further filtering the detection windows filtered by the back-end main classifier 112. The back-end branch classifier 122 is a classifier with a high correct rejection rate and a high correct recognition rate (for example, both greater than or equal to 99.9%). For example, the following several classifiers may be used as the back-end branch classifier 122: (1) a convolutional neural network; (2) a SURF feature based multi-tier perceptron. A face detector with a high correct recognition rate and a high correct rejection rate, such as a face detector based on a deformable organ model, may also be used as the back-end branch classifier 122.

As shown in fig. 3, according to still another embodiment of the face detection system of the present invention, the main detector 11 is an n-order deep cascade classifier. Among them, the 1 st to mth order classifiers are regarded as the front-end main classifier 111(m < n), and the m +1 st to nth order classifiers are regarded as the rear-end main classifier 112. The front-end branch classifier 121 may be 1 shallow cascade classifier, or may be a classifier formed by connecting more than 2 shallow cascade classifiers in series. The deep cascade classifier is a cascade classifier with more orders, such as a cascade classifier with the order of 17-30. Conversely, the shallow cascade classifier is a cascade classifier with a small number of orders, for example, a cascade classifier with an order of 3-10.

In order to increase the speed of face detection, the front-end main classifier, the rear-end main classifier and the front-end branch classifier in the face detection system of the present invention all use image features that can be quickly calculated, such as Haar features, LBP features, LAB features or Global Binary Features (GBF). A global binary feature is an image feature based on gray scale image pixel gray scale values, which involves two types of pixels: (1)1 threshold pixel; (2) m binary pixels (m is more than or equal to 2). The threshold pixel is any pixel in the image, and the binary pixel is a pixel which is connected in sequence in the image. The value of the global binary feature is calculated according to the following steps:

step 1, obtaining the gray value (m is more than or equal to 2) of 1 threshold pixel and m binary pixels from the gray image.

As shown in fig. 4, a threshold pixel 401 is any one pixel in an image, and binarized pixels 402a through 402f are pixels connected in sequence in the image.

in the formula: GBF denotes the value of the global binary feature; i is_bkExpressing the gray value of the kth binary pixel; i is_tRepresenting a gray value of a threshold pixel;

when training the cascade classifier, firstly, an image feature set is constructed, and then image features with strong resolving power are selected from the image feature set and used for establishing each stage of classifier in the cascade classifier. As can be seen from the calculation method of the global binary feature value, the difference between different global binary features is embodied in 3 aspects: (1) the location of the threshold pixel; (2) the number of binarized pixels; (3) the position of the pixel is binarized. When a global binary image feature set is constructed, if a limiting condition is not specified, the number of image features contained in the global binary image feature set is very large, so that the training time of a cascade classifier is too long. Therefore, in constructing the global binary image feature set, the number of binarized pixels and the relative positional relationship between the binarized pixels need to be defined in advance. Fig. 5 to 9 show 5 types of global binary features. As shown in fig. 5, the binarized pixels 402a through 402d are connected in sequence to form a square, and thus this global binary feature is referred to as a four-bit square global binary feature, or QGBF _4 feature for short. As shown in fig. 6, the binarized pixels 402a through 402d are connected in sequence to form a horizontal line segment, and thus this global binary feature is referred to as a four-bit horizontal line segment global binary feature, HLGBF _4 feature for short. As shown in fig. 7, the binarized pixels 402a through 402d are sequentially connected to form a vertical line segment, and thus the global binary feature is referred to as a four-bit vertical line segment global binary feature, which is abbreviated as VLGBF _4 feature. As shown in fig. 8, the binarized pixels 402a through 402d are connected in sequence to form an oblique line segment, and thus this global binary feature is referred to as a four-bit oblique-line segment global binary feature, SLGBF _4 for short. As shown in fig. 9, the binarized pixels 402a to 402d are sequentially connected to form a backward-slanted line segment, and thus such a global binary feature is referred to as a four-bit backward-slanted line segment global binary feature, which is referred to as a BSLGBF _4 feature for short.

In yet another embodiment of the face detection system according to the invention, as shown in fig. 10, the main detector 11 is a 20-step deep cascade classifier based on the QGBF _4 feature. Among them, the 1 st to 9 th order classifiers are referred to as a front-end main classifier 111, and the 10 th to 20 th order classifiers are referred to as a rear-end main classifier 112. The front-end branch classifier 121 includes 2 cascaded classifiers, i.e., a first-layer front-end branch classifier 1211 and a second-layer front-end branch classifier 1212, which are connected in series. The first layer front-end tributary classifier 1211 is a 5 th order shallow cascade classifier based on the VLGBF _4 feature, and the second layer front-end tributary classifier 1212 is a 5 th order shallow cascade classifier based on the HLGBF _4 feature. The back-end branch classifier 122 is a multi-tier perceptron based on SURF features. Each stage of the main detector 11, the first-layer front-end branch classifier 1211 and the second-layer front-end branch classifier 1212 is trained from a binary decision tree using the Adaboost method. After training, the correct rejection rate of each stage of classifier is greater than or equal to 45.00% and less than or equal to 55.00%, and the correct recognition rate is greater than or equal to 99.50% and less than or equal to 99.90%.

As shown in fig. 11, according to an embodiment of the face detection system of the present invention, the face detection method of the present invention includes the following steps:

step 1101, zooming an image to form an image pyramid;

and step 1105, merging the detection windows in the face window set.

As shown in fig. 12, further, step 1103 includes:

step 1209, filtering out the detection window, and executing step 1211;

step 1210, placing the detection window in a face window set;

and step 1211, ending.

Claims

1. The system for detecting the human face in real time is characterized by comprising a main detector and a branch detector, wherein the main detector comprises a front-end main classifier and a rear-end main classifier, the front-end main classifier and the rear-end main classifier are both two classes of classifiers, the human face output end of the front-end main classifier is connected with the input end of the rear-end main classifier, the non-human face output end of the rear-end main classifier is connected with the input end of the branch detector, the branch detector comprises a front-end branch classifier and a rear-end branch classifier, the front-end branch classifier and the rear-end branch classifier are both two classes of classifiers, and the human face output end of the front-end branch classifier is connected with the input end of the rear-end branch classifier.

2. The system for detecting human face in real time according to claim 1, wherein the front-end main classifier has a correct rejection rate of 98.00% or more and 99.98% or less, a correct recognition rate of 98.50% or more and 99.5% or less, the rear-end main classifier has a correct rejection rate of 99.60% or more and 99.99% or less, a correct recognition rate of 86.00% or more and 99.20% or less, and both the correct rejection rate and the correct recognition rate of the branch detector are 99.9% or more.

3. The system for real-time human face detection according to claim 1, wherein the correct rejection rate of the front-end branch classifier is greater than or equal to 80.00% and less than or equal to 99.50%, the correct recognition rate is greater than or equal to 99.20% and less than or equal to 99.80%, and both the correct rejection rate and the correct recognition rate of the back-end branch classifier are greater than or equal to 99.9%.

4. The system of claim 1, wherein the main detector is an n-th order deep cascade classifier, wherein 1 st to mth order classifiers are used as the front-end main classifier, m +1 st to nth order classifiers are used as the back-end main classifier, m and n are two integers, and m < n.

5. The system of claim 1, wherein the main detector is an n-th order deep cascade classifier, wherein 1 st to mth order classifiers are used as the front-end main classifier, m +1 st to nth order classifiers are used as the back-end main classifier, m and n are two integers, and m < n, and the front-end branch classifier comprises 1 shallow cascade classifier or more than 2 shallow cascade classifiers connected in series.

6. The system for detecting human faces in real time according to claim 1, wherein the front-end main classifier and the back-end main classifier adopt image features capable of being rapidly calculated, and the image features capable of being rapidly calculated comprise Haar features, LBP features, LAB features or global binary features.

7. The system for detecting human faces in real time according to claim 1, wherein the front-end main classifier, the back-end main classifier and the front-end branch classifier adopt image features capable of being rapidly calculated, the image features capable of being rapidly calculated comprise Haar features, LBP features, LAB features or global binary features, and the front-end main classifier and the front-end branch classifier adopt different types of image features.

8. The system for real-time human face detection according to claim 6 or 7, wherein the global binary feature is an image feature based on gray-scale values of pixels of a gray-scale image, and the numerical calculation step is as follows:

in the formula: GBF denotes the value of the global binary feature, m denotes the number of binary pixels, I_bkRepresenting the gray value of the kth binarized pixel, I_tRepresenting a gray value of a threshold pixel;

9. a face detection method comprises the following steps:

step 1101, zooming an image to form an image pyramid;

1103, judging whether each detection window in the detection window set contains a human face by using the human face detection system of claim 1;

step 1105, combining the detection windows in the face window set,

characterized in that the step 1103 comprises the steps of:

step 1205, the front-end branch classifier judges whether the detection window contains a human face or not;

step 1209, filtering out the detection window, and executing step 1211;

step 1210, placing the detection window in a face window set;

and step 1211, ending.