CN100336070C

CN100336070C - Method of robust human face detection in complicated background image

Info

Publication number: CN100336070C
Application number: CNB2005100862485A
Authority: CN
Inventors: 丁晓青; 马勇; 方驰; 刘长松; 彭良瑞
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2005-08-19
Filing date: 2005-08-19
Publication date: 2007-09-05
Anticipated expiration: 2025-08-19
Also published as: CN1731417A

Abstract

The present invention relates to a human face detecting technology under a complex background, which belongs to the field of human face recognition. The present invention is characterized in that the present invention provides a human face detecting method in images of performance robust under a complex background. The present invention adopts microstructure characteristics with high efficiency and high redundancy to express the characteristics of the gray level distribution of regions of eyes, mouths, etc. in human face modes and adopts a risk sensitive AdaBoost algorithm to select characteristics which are in a microstructure for distinguishing human faces and nonhuman faces most from the characteristics to be formed into a strong classifier, and each classifier obtained through training reduces the false acceptance rate of nonhuman face samples as far as possible under the condition of the assurance of low rejected rate to human faces. Thereby, human face detection of higher performance under a complex background image is realized through simple structure, and additionally, a postprocessing algorithm is also used for further reducing error detecting rate. A plurality of public data bases and results of competition evaluation prove the excellent performance of the present invention.

Description

Method of robust human face detection in the complex background image

Technical field

Method for detecting human face belongs to the face recognition technology field in the complex background image.

Background technology

It is exactly the information such as position, size of definite people's face in image or image sequence that people's face detects.It is widely used in the systems such as recognition of face, video monitoring, Intelligent Human-Machine Interface at present.People's face detects people's face under the complex background especially, and to detect simultaneously also be the problem of a difficulty.This be owing to extraneous factors such as appearance, the colour of skin, expression, the reason of people's faces such as motion itself in three dimensions and beard, hair, glasses, illumination cause change in people's face mode class huge, and, be difficult to make a distinction with people's face because background object is very complicated.

The main stream approach of people's face detection at present is based on the detection method of sample statistics study.These class methods have generally been introduced " non-face " this classification, are different from the feature of " non-face " classification, the parameter of model by the sample of collecting being carried out statistical learning acquisition " people's face " classification, rather than obtain the top layer rule according to visual impression.This is more reliable on statistical significance, has not only avoided also can expanding the scope that detects by increasing training sample owing to imperfect, the mistake that out of true is brought of observing, and improves the robustness of detection system; These class methods adopt by simple mostly to complicated multistratum classification device structure in addition, earlier exclude most backdrop window by sorter simple in structure, by the sorter of complexity remaining window is further judged then, thereby reached detection speed faster.But since these class methods do not consider people's face and the extremely unbalanced characteristics of non-face two quasi-mode classification error risks in the real image (this be since the prior probability that people's face occurs in the image far below non-face prior probability, and it is to find out the position of people's face that people's face detects fundamental purpose, so being divided into non-face risk by mistake, people's face is people's face) much larger than non-face erroneous judgement, only adopt and train each layer sorter based on the method for minimum classification error rate, reach wrong reject rate (the False Rejection Rate lower by the threshold value of adjusting sorter to people's face, FRR), can not reach simultaneously like this false acceptance rate lower to non-face pattern (False Acceptance Rate, FAR); The sorter number of plies is too much, structure is too complicated, detection speed is slow thereby cause, and the algorithm overall performance is descended.Defective at this type of algorithm existence, the present invention proposes a kind of based on the responsive AdaBoost algorithm of risk (Cost Sensitive AdaBoost, abbreviation CS-AdaBoost) method for detecting human face, the principle that employing minimizes the classification risk makes each layer sorter of obtaining of training when guaranteeing the extremely low reject rate of people's face pattern, reduce the false acceptance rate of non-face classification as far as possible, thereby with the sorter number of plies still less, simpler sorter structure realizes that more high performance people's face detects under the complex background image, and this is not have used method in the present every other document.

Summary of the invention

The objective of the invention is to realize the human-face detector of energy robust people from location face under complex background.The realization of this human-face detector comprises training and detects two stages.

In the training stage, at first should carry out the collection of sample, comprise the collection of people's face and non-face sample, then sample is carried out the normalized of size and illumination; Utilize training sample then, carry out microstructure features and extract, obtain feature database; Utilize feature database in conjunction with the CS-AdaBoost algorithm then, training obtains one deck people face/non-face strong classifier; Repeat above training process, obtain structure by simple multistratum classification device to complexity; At last these sorter cascades are got up, obtain a complete human-face detector.

At detection-phase, at first be continuous according to a certain percentage scaling input picture, in the image series that obtains, differentiate the wherein wicket of each certain size (a rectangular area subimage in the definition input picture is a wicket) then.To each wicket, at first carrying out gray scale normalization handles, extract its microstructure features then, with the human-face detector that trains wicket is judged, if arbitrarily the output of one deck sorter is lower than assign thresholds and thinks that promptly this wicket is non-face and do not carry out follow-up judgement, has only those wickets of judging by all layers sorter to be considered to people's face.Thereby obtain high people's face and detect accuracy.This method has been applied to register in system etc. based on the work attendance of people's face.

The present invention consists of the following components: the cascade of sample collection and normalization, integrogram calculating and microstructure features extraction, feature selecting and classifier design, multistratum classification device.

1. sample collection and normalization

1.1 the collection of sample

Promptly adopt the manual method of demarcating of people, from the picture that comprises people's face, cut out facial image, never comprise in the scenery picture of people's face and cut out non-face image at random.Facial image and non-face image are used for training classifier as positive example sample and counter-example sample respectively.Gatherer process as shown in Figure 2.

1.2 size normalization

With people's face and the non-face image normalization of collecting each good size is specified size.If the original sample image is [F (x, y)] _{M * N}, picture traverse is M, highly is N, the value that image is positioned at the picture element of the capable y of x row is F (x, y) (0≤x＜M, 0≤y＜N); If image is after the size normalization [G (x, y)] _{W * H}, picture traverse is W, is H highly, gets W=H=20 in the experiment.Size normalization can be regarded as source images dot matrix [F (x, y)] like this _{M * N}Be mapped to target image dot matrix [G (x, y)] _{W * H}Process.The present invention use back projection and linear interpolation with the original sample image transformation to the standard size sample image, input picture [F (x, y)] then _{M * N}With image after the normalization [G (x, y)] _{W * H}Between corresponding relation be:

G(x，y)＝F(x/r _x，y/r _y)

R wherein _xAnd r _yBe respectively the change of scale factor of x and y direction: r _x=N/H, r _y=M/W.

According to following formula, (x is y) corresponding to the point (x/r in the input picture for the point in the output image dot matrix _x, y/r _y).Because x/r _x, y/r _yValue generally be not integer, so need estimate F (x/r according near the value at known discrete point place _x, y/r _y) value.According to linear interpolation method, for given (x, y), the order:

\{\begin{matrix} x / r_{x} = x_{0} + Δ_{x} \\ y / r_{y} = y_{0} + Δ_{y} \end{matrix}, 0 {\leq Δ}_{x}, Δ_{y} < 1

Wherein:

\{\begin{matrix} x_{0} = [x / r_{x}], & Δ_{x} = x / r_{x} - x_{0} \\ y_{0} = [y / r_{y}], & Δ_{y} = y / r_{y} - y_{0} \end{matrix},

[] is bracket function.Interpolation process can be expressed as:

G(x，y)＝F(x ₀+Δ _x，y ₀+Δ _y)＝F(x ₀，y ₀)Δ _xΔ _y+F(x ₀+1，y ₀)(1-Δ _x)Δ _y

+F(x ₀，y ₀+1)Δ _x(1-Δ _y)+F(x ₀+1，y ₀+1)(1-Δ _x)(1-Δ _y)

1.3 gray scale normalization

Because factors such as ambient light photograph, imaging device may cause brightness of image or contrast unusual, strong shadow or situation such as reflective appear.So also need the sample behind the geometrical normalization is carried out the gray balance processing, improve its intensity profile, the consistance between enhancement mode.The present invention adopts gray average, variance normalization that sample is carried out the gray balance processing, and the average μ and the variances sigma of samples pictures gray scale are adjusted to set-point μ ₀And σ ₀

At first adopt following formula calculate sample image G (x, y) (0≤x＜W, average and the variance of 0≤y＜H):

\overset{&OverBar;}{μ} = \frac{1}{WH} Σ_{y = 0}^{H - 1} Σ_{x = 0}^{W - 1} G (x, y)

\overset{&OverBar;}{σ} = \sqrt{\frac{1}{WH} Σ_{y = 0}^{H - 1} Σ_{x = 0}^{W - 1} {(G (x, y) - \overset{&OverBar;}{μ})}^{2}}

Then each gray values of pixel points is carried out as down conversion:

I (x, y) = \frac{σ_{0}}{\overset{&OverBar;}{σ}} (G (x, y) - \overset{&OverBar;}{μ}) + μ_{0}

Thereby the average and the variance of gradation of image are adjusted to set-point μ ₀And σ ₀, finish the gray scale normalization of sample.

2. microstructure features rapid extraction

The present invention adopts five types of microstructure templates among Fig. 5 to extract the higher-dimension microstructure features of people's face and non-face sample: each microstructure features by pixel grey scale in the corresponding image in calculation template black region and the white portion and difference to obtain (be in order to distinguish two zones herein, give different colors respectively, down with), and template in image the position and the size of template can change.Concrete feature extraction mode is as follows:

Definition S (x ₁, y ₁x ₂, y ₂) be zone (x ₁≤ x '≤x ₂, y ₁≤ y '≤y ₂) in pixel grey scale and

S (x_{1}, y_{1}; x_{2}, y_{2}) = \underset{x_{1} \leq x^{'} \leq x_{2}}{Σ} \underset{y_{1} \leq y^{'} \leq y_{2}}{Σ} I (x^{'} y^{'})

If the pixel coordinate in the microstructure template upper left corner be (x, y), then five types of microstructures (black region equates with white area in preceding four kinds of microstructures, black region being distributed symmetrically in white portion in the 5th type of microstructure) are respectively as shown in Figure 5:

(a)：S(x，y；x+w-1，y+h-1)-S(x+w，y；x+2w-1，y+h-1)

(b)：S(x，y；x+w-1，y+h-1)-S(x，y+h；x+w-1，y+2h-1)

(c)：2S(x+w，y；x+2w-1，y+h-1)-S(x，y；x+3w-1，y+h-1)

(d)：S(x，y；x+2w-1，y+2h-1)-2S(x，y；x+w-1，y+h-1)-

-2S(x+w，y+h；x+2w-1，y+2h-1)

(e)：S(x，y；x+w-1，y+h-1)-S(x+2，y+2；x+w-3，y+h-3)

Since each feature extraction only relate to pixel in the rectangular area and computational problem, so can utilize the integral image (Integral Image) of entire image to obtain a kind of microstructure features of any yardstick, optional position fast.

2.1 integral image

For an image I (x, y), (x 〉=0, y 〉=0), define its corresponding integral image II (x, y) be from (0,0) to (x, y) all pixel sums in the scope, promptly

II (x, y) = \underset{0 \leq x^{'} \leq x}{Σ} \underset{0 \leq y^{'} \leq y}{Σ} I (x^{'}, y^{'}),

And definition II (1, y)=0, II (x ,-1)=0.Have thus:

S(x ₁，y ₁；x ₂，y ₂)＝II(x ₂，y ₂)+II(x ₁-1，y ₁-1)-II(x ₂，y ₁-1)-II(x ₁-1，y ₂)。

Be original image I (x, y) in pixel and S (x in any one rectangular area ₁, y ₁x ₂, y ₂) can calculate through 3 plus-minus method by integrogram;

Same definition integrated square image SqrII (x, y) be from (0,0) to (x, y) interior all pixels square sum of scope, promptly

SqrII (x, y) = \underset{0 \leq x^{'} \leq x}{Σ} \underset{0 \leq y^{'} \leq y}{Σ} I (x^{'}, y^{'}) \cdot I (x^{'}, y^{'}) .

Wherein SqrII (1, y)=0, SqrII (x ,-1)=0.

The integrated square image can be used for calculating the variance (seeing 2.3 joints) of each rectangular area.

2.2 the rapid extraction of microstructure features

Since each feature extraction only relate to pixel in the rectangular area and computational problem, so above any one microstructure features can calculate fast by integral image several times plus-minus, wherein the computing formula (shown in Figure 6) of (a) type microstructure features

g(x，y，w，h)＝2·II(x+w-1，y-1)+II(x+2·w-1，y+h-1)

+II(x-1，y+h-1)-2·II(x+w-1，y+h-1)

-II(x+2·w-1，y-1)-II(x-1，y-1)

(b) type microstructure features:

g(x，y，w，h)＝2II(x+w-1，y+h-1)+II(x-1，y-1)-II(x+w-1，y-1)

-2II(x-1，y+h-1)-II(x+w-1，y+2h-1)+II(x-1，y+2h-1)

(c) type microstructure features:

g(x，y，w，h)＝2II(x+2w-1，y+h-1)+2II(x+w-1，y-1)-2II(x+2w-1，y-1)

-2II(x+w-1，y+h-1)-II(x+3w-1，y+h-1)-II(x-1，y-1)

+II(x-1，y+h-1)+II(x+3w-1，y-1)

(d) type microstructure features:

g(x，y，w，h)＝-II(x-1，y-1)-II(x+2w-1，y-1)-II(x-1，y+2h-1)

-4II(x+w-1，y+h-1)+2II(x+w-1，y-1)+2II(x-1，y+h-1)

-II(x+2w-1，y+2h-1)+2II(x+2w-1，y+h-1)+2II(x+w-1，y+2h-1)

(e) type microstructure features:

g(x，y，w，h)＝II(x+w-1，y+h-1)+II(x-1，y-1)-II(x+w-1，y，-1)-II(x-1，y+h-1)

-II(x+w-3，y+h-3)-II(x+1，y+1)+II(x+1，y+h-3)+II(x+w-1，y+1)

Change parameter x, y, w, h can extract the feature of diverse location, different scale.For the sample image of one 20 * 20 pixel, can obtain 92267 five types microstructure features altogether, form the eigenvector FV (j) of this sample, 1≤j≤92267.

2.3 the normalization of feature

In order to alleviate the influence that illumination detects for people's face, need carry out the normalization of gray average and variance to each 20 * 20 pixel samples image, so at first will calculate the average μ and the variances sigma of wicket fast, then each dimensional feature is carried out normalization, 20 * 20 pixel wicket zone (x wherein ₀≤ x '≤x ₀+ 19, y ₀≤ y '≤y ₀+ 19) Nei pixel grey scale and μ and σ be respectively (as shown in Figure 6):

μ＝[II(x ₀+19，y ₀+19)+II(x ₀-1，y ₀-1)-II(x ₀-1，y ₀+19)-II(x ₀+19，y ₀-1)]/400

σ＝{[SqrII(x ₀+19，y ₀+19)+SqrII(x ₀-1，y ₀-1)-SqrII(x ₀-1，y ₀+19)

-SqrII(x ₀+19，y ₀-1)]/400- μ ²} ^1/2

Can carry out following normalization to each dimension microstructure features:

FV (j) = \frac{σ_{0}}{\overset{&OverBar;}{σ}} \overset{&OverBar;}{FV} (j)

For the sample image of one 20 * 20 pixel, obtain 92267 dimension microstructure features FV (j) altogether, 1≤j≤92267.

3. feature selecting and classifier design

For reaching enough fast detection speed, a human-face detector must adopt hierarchy (as shown in Figure 7), and being cascaded up by the strong classifier from simple to complexity of sandwich construction constitutes.Earlier exclude backdrop window in the image, by baroque strong classifier remaining window is judged that (strong classifier herein is meant and reaches enough high performance sorter on training set then by strong classifier simple in structure; Weak Classifier hereinafter is meant that on training set error rate is a little less than 0.5 sorter).

The present invention uses every layer of strong classifier of CS-AdaBoost algorithm training.The CS-AdaBoost algorithm is the integrated algorithm of a kind of Weak Classifier, Weak Classifier can be combined into the strong classifier on training set; And treat the risk that two class classification errors bring in the CS-AdaBoost algorithm with a certain discrimination, the total classification risk of errors on the training set is minimized.Detect problem for people's face, the strong classifier that training is obtained reduces the classification error (FAR) of non-face classification simultaneously as far as possible on the basis of the enough low classification error (FRR) on underwriter's face classification.

3.1 the structure of Weak Classifier

Weak Classifier is to use the tree classification device of one-dimensional characteristic structure among the present invention:

h_{j} (sub) = \{\begin{matrix} 1, & if & g_{j} (sub) < θ_{j} & or & g_{j} (sub) > θ_{j} \\ 0, & otherwise \end{matrix}

Wherein sub is the sample of one 20 * 20 pixel, g _j(sub) j feature obtaining from this sample extraction of expression, θ _jBe the decision threshold (people face and j feature of the non-face sample requirement that make the FRR of people face sample satisfy regulation of this threshold value by adding up all collections obtains) of j feature correspondence, h _j(sub) the judgement output of the tree classification device of j latent structure is used in expression.Each Weak Classifier only need compare a subthreshold and just can finish judgement like this; Can obtain 92267 Weak Classifiers altogether.

3.2 strong classifier design based on the CS-AdaBoost algorithm

The CS-AdaBoost algorithm is used for training of human face/non-face strong classifier in conjunction with above-mentioned Weak Classifier building method.Following (the note training sample set L={ (sub of training step _i, l _i), i=1 ..., n, l _iThe=0, the 1st, sample image sub _iCategory label, respectively corresponding non-face classification and people's face classification, wherein people's face sample n _FaceIndividual, non-face sample n _NonfaceIndividual):

3.2.1 the initialization of parameter

The initialization of training sample misclassification risk.Misclassification risk for everyone face sample

C (i) = \frac{2 c}{c + 1},

Misclassification risk to each non-face sample

C (i) = \frac{2}{c + 1}

(c is that people's face classification is the misclassification risk multiple of non-face classification, and the c value should be greater than 1 and along with increasing of the strong classifier number of plies reduces to approach 1 gradually, and concrete selective value sees Table 1);

The initialization of training sample weight.The weight of initial each sample is

D_{1} (i) = \frac{(c + 1) \cdot C (i)}{2 c \cdot n_{face} + 2 \cdot n_{nonface}};

Select iterations T (T is the number of the Weak Classifier of wishing use), T should increase along with increasing gradually of the strong classifier number of plies, and concrete selective value sees Table 1;

Maximum value Fmax of every dimensional feature (j) and minimal value Fmin (j) (wherein j is the feature sequence number, 1≤j≤92267) on the statistical sample collection:

F \max (j) = \max_{1 \leq i \leq n} {FV}_{i} (j),

F \min (j) = \min_{1 \leq i \leq n} {FV}_{i} (j);

3.2.2 repeat following process T time (t=1 ..., T):

3.2.2.1 use j feature (1≤j≤92267) structure Weak Classifier h _j, exhaustive search threshold parameter θ between Fmin (j) and Fmax (j) then _j, make h _jError rate ε _jMinimum, definition

ϵ_{j} = Σ_{i = 1}^{n} D_{t} (i) \cdot | h_{j} ({sub}_{i}) - l_{i} |;

3.2.2.2 order

ϵ_{t} = \underset{1 \leq j \leq 92267}{\arg \min} ϵ_{j},

And the Weak Classifier that it is corresponding is as h _t

3.2.2.3 calculating parameter

α_{t} = \frac{1}{2} \ln (\frac{1 - ϵ_{t}}{ϵ_{t}});

3.2.2.4 the weight of new samples more

D_{t + 1} (i) = \frac{D_{t} (i) \exp (- α_{t} l_{i} h_{t} ({sub}_{i})) \exp (λ α_{t} l_{i})}{Z_{t}},

Wherein

λ = \frac{c - 1}{c + 1},

i＝1，...，n，

Z_{t} = Σ_{i = 1}^{n} D_{t} (i) \exp (- α_{t} l_{i} h_{t} ({sub}_{i})) \exp (λ α_{t} l_{i}) .

3.2.3. export last strong classifier

3.3 the cascade of multilayer strong classifier

Because the individual layer strong classifier is difficult to realize high-class speed simultaneously, extremely low FRR and extremely low targets such as FAR are so whole human-face detector must adopt hierarchy, by simply to complicated the multilayer strong classifier being cascaded up, as shown in Figure 7.When detecting,, can exclude immediately and not carry out follow-up judgement, otherwise further judge by follow-up more complicated strong classifier as long as certain image window can not pass through wherein any one deck.So for obviously unlike the video in window of people's face, preceding what just can be excluded, need not subsequent calculations, thereby saved calculated amount greatly.

Use 11580 people's face samples and 2000000 non-face samples as training sample set, the concrete training step of multilayer strong classifier cascade is as follows:

(1) initialization i=1; The training objective that defines every layer of strong classifier is FRR≤0.02% on people's face training set, FAR on non-face training set≤60%; Define target FRR≤0.5% of whole human-face detector on people's face training set, the target FAR on non-face training set≤3.2 * 10 ^-6, wherein FAR and FRR are defined as follows:

FAR=is differentiated the non-face total sample number of non-face number of samples ÷ * 100% for people's face

FRR=is differentiated is non-face people's face number of samples ÷ people face total sample number * 100%

(2) use training sample set to adopt the method in 3.2 joints to train i layer strong classifier;

(3) the preceding i layer sorter that obtains with training detects sample set:

(4) if FRR, FAR do not reach predetermined value, then i ← i+1 returns step (2) and proceeds training; Otherwise stop training.

The human-face detector that training at last obtains comprises 19 layers of strong classifier, has used 3139 Weak Classifiers altogether.The FRR of whole detecting device on people's face checking collection is about 0.15%, and FAR is about 3.2 * 10 on non-face training set ^-6Table 1 provides the wherein training result of several layers sorter.

Some face/non-face strong classifier training result of table 1

Number of plies i	c	T	People's face FRR training set	Non-face FAR checking collection
Number of plies i	c	T	People's face FRR training set	Non-face FAR checking collection	1	100	1	0.10％	64.2％
2	60	1	0.0％	83.5％	1	100	1	0.10％	64.2％
2	60	1	0.0％	83.5％	3	3.5	5	0.0％	75.4％
7	1.5	65	0.0％	42.5％	3	3.5	5	0.0％	75.4％
7	1.5	65	0.0％	42.5％	8	1.4	87	0.0％	40.1％
9	1.4	120	0.0％	35.4％	8	1.4	87	0.0％	40.1％
9	1.4	120	0.0％	35.4％	17	1.2	355	0.01％	67.6％
18	1.15	361	0.02％	60.2％	17	1.2	355	0.01％	67.6％
18	1.15	361	0.02％	60.2％	19	1.10	397	0.02％	68.3％

If a window thinks then that by the judgement of all layers sorter this window comprises people's face when detecting.

The invention is characterized in, it be a kind of can be under complex background and illumination robust ground detect the technology of various people's faces, and in normal video, can reach real-time detection speed.It at first carries out size normalization and unitary of illumination to the sample of collecting, to eliminate the input sample to greatest extent because of difference in the different classes that cause of illumination and size, extract the microstructure features of energy fine difference people's face and non-face mode configuration characteristics then expeditiously, utilize the training of CS-AdaBoost algorithm to obtain having the strong classifier of extremely low FRR and extremely low FAR on this basis, then the multilayer strong classifier is cascaded into a complete human-face detector, obtains final people's face position.

In the system that is made up of image capture device and computing machine, this detection method comprises training stage and detection-phase.Wherein the training stage is contained following steps successively:

1. the collection of sample

Utilize equipment images acquired such as camera, digital camera, scanner, and artificial demarcation of people's face wherein cut out, set up people's face training sample database; Never comprise in the scenery picture of people's face and cut out non-face training image at random.Obtain 11580 people's face samples and 2000000 non-face samples altogether as training sample set

2. normalized comprises sample light and shines and big or small linear normalization

(2.1) size normalization

If the original sample image is [F (x, y)] _{M * N}, picture traverse and highly be respectively M and N, after the size normalization be [G (x, y)] _{W * H}, get W=H=20 in the experiment.Use back projection and linear interpolation to obtain sample image after the normalization from the original sample image, then after input picture and the normalization there be the corresponding relation between the image:

G(x，y)＝F(x/r _x，y/r _y)

R wherein _xAnd r _yBe respectively the change of scale factor of x and y direction: r _x=N/H, r _y=M/W.Because x/r _x, y/r _yValue generally be not integer, so need estimate F (x/r according near the value at known discrete point place _x, y/r _y) value.The present invention adopts linear interpolation method.For given (x, y), the order:

\{\begin{matrix} x / r_{x} = x_{0} + Δ_{x} \\ y / r_{y} = y_{0} + Δ_{y} \end{matrix}, 0 \leq Δ_{x}, Δ_{y} < 1

Wherein:

\{\begin{matrix} x_{0} = [x / r_{x}], & Δ_{x} = x / r_{x} - x_{0} \\ y_{0} = [y / r_{y}], & Δ_{y} = y / r_{y} - y_{0} \end{matrix},

[] is bracket function, can get:

+F(x ₀，y ₀+1)Δ _x(1-Δ _y)+F(x ₀+1，y ₀+1)(1-Δ _x)(1-Δ _y)

(2.2) gray scale normalization

(x, the gray scale of each pixel y) is carried out as down conversion, and average μ and variances sigma are adjusted to set-point μ to the sample image G after the size normalization ₀And σ ₀, obtain sample image I (x, y):

I (x, y) = \frac{σ_{0}}{\overset{&OverBar;}{σ}} (G (x, y) - \overset{&OverBar;}{μ}) + μ_{0} .

Wherein

\overset{&OverBar;}{μ} = \frac{1}{WH} Σ_{y = 0}^{H - 1} Σ_{x = 0}^{W - 1} G (x, y),

\overset{&OverBar;}{σ} = \sqrt{\frac{1}{WH} Σ_{y = 0}^{H - 1} Σ_{x = 0}^{W - 1} {(G (x, y) - \overset{&OverBar;}{μ})}^{2}};

3. the sample characteristics storehouse obtains

Calculate integrogram with the rapid extraction microstructure features, it contains following steps successively:

(3.1) calculate the integrogram of each sample

Use according to definition

II (x, y) = \underset{0 \leq x^{'} \leq x}{Σ} \underset{0 \leq y' \leq y}{Σ} I (x', y^{'})

Calculate each sample correspondence integrogram II (x, y), and have II (1, y)=0, II (x ,-1)=0.

(3.2) extraction in microstructure features storehouse

Utilize the definition of each microstructure features and 92267 features of above each sample correspondence of integrogram rapid extraction, thereby constitute the feature database of people's face sample and the feature database of non-face sample respectively.

4. classifier design

Train each layer people face/non-face strong classifier with training of above training sample set and CS-AdaBoost algorithm, and the multilayer strong classifier cascaded up form a complete human-face detector.May further comprise the steps:

(4.1) initialization i=1; The training objective that defines every layer of strong classifier is FRR≤0.02% on people's face training set, FAR on non-face training set≤60%; Define target FRR≤0.5% of whole human-face detector on people's face training set, the target FAR on non-face training set≤3.2 * 10 ^-6

(4.2) training i layer strong classifier;

(4.3) the preceding i layer sorter that obtains with training detects sample set;

(4.4) if FRR, FAR do not reach predetermined value, then i ← i+1 returns step (4.2) and proceeds training; Otherwise stop training.

Wherein step (4.2) contains following steps successively:

(4.2.1) initialization of parameter

C (i) = \frac{2 c}{c + 1},

Misclassification risk to each non-face sample

C (i) = \frac{2}{c + 1}

D_{1} (i) = \frac{(c + 1) \cdot C (i)}{2 c \cdot n_{face} + 2 \cdot n_{nonface}};

Select iterations T (T is the number of the Weak Classifier of wishing use), T should increase along with increasing gradually of the strong classifier number of plies, and concrete selective value sees Table 1);

F \max (j) = \max_{1 \leq i \leq n} {FV}_{i} (j),

F \min (j) = \min_{1 \leq i \leq n} {FV}_{i} (j);

(4.2.2) repeat following process T time (t=1 ..., T):

(4.2.2.1) use j feature (1≤j≤92267) structure Weak Classifier h _j, exhaustive search threshold parameter θ between Fmin (j) and Fmax (j) then _j, make h _jError rate ε _jMinimum, definition

ϵ_{j} = Σ_{i = 1}^{n} D_{t} (i) \cdot | h_{j} ({sub}_{i}) - l_{i} |;

(4.2.2.2) order

ϵ_{t} = \underset{1 \leq j \leq 92267}{\arg \min} ϵ_{j},

And the Weak Classifier that it is corresponding is as h _t

(4.2.2.3) calculating parameter

α_{t} = \frac{1}{2} \ln (\frac{1 - ϵ_{t}}{ϵ_{t}});

(4.2.2.4) weight of new samples more

D_{t + 1} (i) = \frac{D_{t} (i) \exp (- α_{t} l_{i} h_{t} ({sub}_{i})) \exp (λ α_{t} l_{t})}{Z_{t}},

Wherein

λ = \frac{c - 1}{c + 1},

i＝1，...，n，

Z_{t} = Σ_{i = 1}^{n} D_{t} (i) \exp (- α_{t} l_{i} h_{t} ({sub}_{i})) \exp (λ α_{t} l_{i}) .

(4.2.1) the last strong classifier of output

Can train by above each step and to obtain a complete human-face detector.

At detection-phase, these invention employing following steps judge whether comprise people's face in the input picture (an actual detection process is as figure):

(1) collection of input picture

Utilize equipment images acquired such as camera, digital camera, scanner.

(2) the quick judgement of each wicket in the scaling of input picture and the image thereof

For detecting people's face of different size, the linear interpolation method that adopts preamble to use dwindles 12 input pictures (the present invention adopts 1.25 ratio) according to a certain percentage continuously, obtain input picture altogether by 13 different sizes, judge the wicket of 20 * 20 all in every input picture pixels respectively, can detect the people face of size like this from 20 * 20 pixels to 280 * 280 pixels.May further comprise the steps specifically:

(2.1) scaling of input picture

Adopt linear interpolation method that preamble uses in proportion q=1.25 dwindle 12 input picture I continuously (x y) obtain input image sequence { I _i(x, y) } (and i=0 ..., 12);

(2.2) calculating of integral image

Use above iterative formula to calculate each image I respectively _i(x, y) pairing integral image II _i(x is y) with square integral image SqrII _i(x, y), (i=0 ..., 9);

(2.3) the exhaustive judgement of wicket

From every width of cloth image I _i(x, upper left corner y) begins the wicket of all 20 * 20 Pixel Dimensions of exhaustive differentiation, to any wicket [x ₀, y ₀x ₀+ 19, y ₀+ 19] treatment step is as follows:

(2.3.1). utilize the integrogram II of entire image _i(x is y) with square integrogram SqrII _i(x, y) the average μ and the variances sigma of calculating wicket;

μ＝[II _i(x ₀+19，y ₀+19)+II _i(x ₀-1，y ₀-1)-II _i(x ₀-1，y ₀+19)-II _i(x ₀+19，y ₀-1)]/400

σ＝{[SqrII _i(x ₀+19，y ₀+19)+SqrII _i(x ₀-1，y ₀-1)-SqrII _i(x ₀-1，y ₀+19)

-SqrII _i(x ₀+19，y ₀-1)]/400- μ ²} ^1/2

(2.3.2). utilize the microstructure features of this wicket of preamble introduction method rapid extraction, and carry out the feature normalized;

(2.3.3). adopt the multilayer people's face/non-face strong classifier that trains that wicket is judged; If, think that then this wicket comprises people's face, exports its position by the judgement of all layers strong classifier; Otherwise discard this wicket, do not carry out subsequent treatment;

Utilize above step can fast robust ground to detect everyone face in the input picture.

For verifying validity of the present invention, we test on a plurality of public datas storehouse, and provide a concrete realization example.

We compare the performance of performance of the present invention with present universally acknowledged best algorithm on the CMU test set.The CMU test set comprises 130 pictures with complex background altogether, 507 people's faces.In the experiment image is carried out maximum 13 convergent-divergents in 1.25 ratio, 71040758 image windows have been judged in search altogether.Comparative result sees Table 2, this paper algorithm overall performance is better than Viola[Viola P as can be seen, Jones M.Rapid object detection using a boosted cascadeof simple features.Proc on Computer Vision Pattern Recognition, 2001], Schneiderman[Schneiderman H, Kanade T.Probabilistic modeling of local appearanceand spatial relationships for object recognition.Proc.on CVPR, 1998], Rowley[RowleyHA, Baluja S, and Kanade T.Neural network-based face detection.IEEE Transactionson Pattern Analysis and Machine Intelligence, 1998, the performance of method such as 20 (1): 23-38], particularly under the situation of low false-alarm, for example during 10 false-alarms, people's face verification and measurement ratio of this paper algorithm is 90.1%, higher by 7%～14% than the verification and measurement ratio of other algorithms, be much better than other algorithms.Wherein compare with the detecting device that obtains based on conventional AdaBoost algorithm of Viola, our human-face detector has used 3139 Weak Classifiers of 19 layers of strong classifier, and it has used a Weak Classifier surplus 38 layers of strong classifier 6000, our detecting device is structurally simple more than it, so the present invention has more excellent performance and detection speed faster.Normal video image for 386 * 288 adopts this paper algorithm can reach above detection speed (PIII1.8GHZ dominant frequency, 512M internal memory) 18 frame/seconds.

Table 2 compares with the performance of other detection method on the positive homo erectus's face of CMU test set

On the BANCA database, also compare in addition with the detection performance of the famous product F aceIT of Identix company.The BANCA database comprises 6540 pictures with complex background and illumination, comprises positive homo erectus's face in every pictures, and the pitching of people's face changes greatly.Correct verification and measurement ratio of the present invention is 98.8%, and the correct verification and measurement ratio of FaceIT is 94.9%; Comprise in the test of carrying out on the image set of people's face transferring to every image of third party-China Aerospace information firm in its collection, the detection accuracy of this paper algorithm is 98.6%, and the detection accuracy of FaceIT is 98.0%.

Description of drawings

The hardware of a typical face detection system of Fig. 1 constitutes.

The acquisition process of Fig. 2 training sample.

The typical human face sample example that Fig. 3 obtains.

The formation of Fig. 4 face detection system.

Five kinds of microstructure features templates of Fig. 5.

The extraction example of the calculating of Fig. 6 integrogram and microstructure features.

The cascade of the multistage strong classifier of Fig. 7.

The training process of Fig. 8 strong classifier.

The actual detected process example of people's face in image of Fig. 9.

Figure 10 is based on the recognition of face of this algorithm system of registering.

Embodiment

When realizing a face detection system, at first should obtain human-face detector, but with regard to end user's face detector any input picture be detected then by collecting abundant sample training.The hardware configuration of total system as shown in Figure 1, the training process of system and testing process as shown in Figure 4, below the detailed various piece of introducing system:

A) realization of training system

A.1 training sample obtains

Utilize equipment images acquired such as camera, digital camera, scanner, the artificial demarcation of people's face wherein cut out, set up people's face training sample database; Non-face training sample then is never to comprise in the scenery picture etc. of people's face to extract at random.Collect altogether in this example and use 11580 people's face samples and 2000000 non-face samples as training set.

A.2 sample normalization

A.2.1 size normalization

G(x，y)＝F(x/r _x，y/r _y)

R wherein _xAnd r _yBe respectively the change of scale factor of x and y direction: r _x=N/H, r _y=M/W.For given (x, y), the order:

\{\begin{matrix} x / r_{x} = x_{0} + Δ_{x} \\ y / r_{y} = y_{0} + Δ_{y} \end{matrix}, 0 \leq Δ_{x}, Δ_{y} < 1

Wherein:

\{\begin{matrix} x_{0} = [x / r_{x}], & Δ_{x} = x / r_{x} - x_{0} \\ y_{0} = [y / r_{y}], & Δ_{y} = y / r_{y} - y_{0} \end{matrix},

[] is bracket function, can get:

+F(x ₀，y ₀+1)Δ _x(1-Δ _y)+F(x ₀+1，y ₀+1)(1-Δ _x)(1-Δ _y)

A.2.2 unitary of illumination

I (x, y) = \frac{σ_{0}}{\overset{&OverBar;}{σ}} (G (x, y) - \overset{&OverBar;}{μ}) + μ_{0} .

Wherein

\overset{&OverBar;}{μ} = \frac{1}{WH} Σ_{y = 0}^{H - 1} Σ_{X = 0}^{W - 1} G (x, y),

\overset{&OverBar;}{σ} = \sqrt{\frac{1}{WH} Σ_{y = 0}^{H - 1} Σ_{x = 0}^{W - 1} {(G (x, y) - \overset{&OverBar;}{μ})}^{2}};

A.3 the sample characteristics storehouse obtains

A.3.1 the calculating of sample integrogram

Use according to definition

II (x, y) = \underset{0 \leq x^{'} \leq x}{Σ} \underset{0 \leq y' \leq y}{Σ} I (x^{'}, y^{'})

A.3.2 the extraction in microstructure features storehouse

Utilize the definition of each microstructure features and 92267 features of above each sample correspondence of integrogram rapid extraction, carry out normalization respectively, thereby constitute the feature database of people's face sample and the feature database of non-face sample respectively.

A.4 the training of human-face detector

A.4.1 initialization i=1; The training objective that defines every layer of strong classifier is FRR≤0.02% on people's face training set, FAR on non-face training set≤60%; Define target FRR≤0.5% of whole human-face detector on people's face training set, the target FAR on non-face training set≤3.2 * 10 ^-6

A.4.2 train i layer strong classifier;

A.4.3 the preceding i layer sorter that obtains with training detects sample set;

If A.4.4 FRR, FAR do not reach predetermined value, then i ← i+1 returns step (4.2) and proceeds training; Otherwise stop training.

Foregoing steps A .4.2 contains following steps successively:

A.4.2.1 the initialization of parameter

C (i) = \frac{2 c}{c + 1},

Misclassification risk to each non-face sample

C (i) = \frac{2}{c + 1}

D_{1} (i) = \frac{(c + 1) \cdot C (i)}{2 c \cdot n_{face} + 2 \cdot n_{nonface}};

F \max (j) = \max_{1 \leq i \leq n} {FV}_{i} (j),

F \min (j) = \min_{1 \leq i \leq n} {FV}_{i} (j);

A.4.2.2 repeat following process T time (t=1 ..., T):

A.4.2.2.1 use j feature (1≤j≤92267) structure Weak Classifier h _j, exhaustive search threshold parameter θ between Fmin (j) and Fmax (j) then _j, make h _jError rate ε _jMinimum, definition

ϵ_{j} = Σ_{i = 1}^{n} D_{t} (i) \cdot | h_{j} ({sub}_{i}) - l_{i} |;

A.4.2.2.2 order

ϵ_{t} = \underset{1 \leq j \leq 92267}{\arg \min} ϵ_{j},

And the Weak Classifier that it is corresponding is as h _t

A.4.2.2.3 calculating parameter

α_{t} = \frac{1}{2} \ln (\frac{1 - ϵ_{t}}{ϵ_{t}});

A.4.2.2.4 the weight of new samples more

D_{t + 1} (i) = \frac{D_{t} (i) \exp (- α_{t} l_{i} h_{t} ({sub}_{i})) \exp (λ α_{t} l_{i})}{z_{t}},

Wherein

λ = \frac{c - 1}{c + 1},

i＝1，...，n，

Z_{t} = Σ_{i = 1}^{n} D_{t} (i) \exp (- α_{t} l_{i} h_{t} ({sub}_{i})) \exp (λ α_{t} l_{i}) .

A.4.2.3 export last strong classifier

B) realization of detection system

At detection-phase, this invention may further comprise the steps:

B.1 the collection of image

Utilize equipment images acquired such as camera, digital camera, scanner.

B.2 the calculating of pyramidal formation of input picture and integral image

For detecting people's face of different size, the linear interpolation method that adopts preamble to use dwindles 12 input pictures (the present invention adopts 1.25 ratio) according to a certain percentage continuously, obtain input picture altogether by 13 different sizes, judge the wicket (a rectangular area subimage in the definition input picture is a wicket) of each 20 * 20 pixel in every input picture respectively, can detect the people face of size like this from 20 * 20 pixels to 280 * 280 pixels.May further comprise the steps specifically:

B.2.1 the scaling of input picture

B.2.2 the calculating of integral image

B.2.3 the exhaustive judgement of wicket

B.2.3.1 utilize the integrogram II of entire image _i(x is y) with square integrogram SqrII _i(x, y) the average μ and the variances sigma of calculating wicket;

-SqrII _i(x ₀+19，y ₀-1)]/400- μ ₂} ^1/2

B.2.3.2 utilize the microstructure features of this wicket of preamble introduction method rapid extraction, and carry out the feature normalized;

B.2.3.3 adopt the multilayer people's face/non-face strong classifier that trains that wicket is judged; If, think that then this wicket comprises people's face, exports its position by the judgement of all layers strong classifier; Otherwise discard this wicket, do not carry out subsequent treatment;

Embodiment 1: based on the identification of the people's face system (as Figure 10) of registering

Face authentication is to be subjected to the most friendly a kind of authentication mode in the biological characteristic authentication technology of extensive concern recently, be intended to utilize facial image to carry out the automatic personal identification of computing machine, to replace identification authentication mode such as traditional password, certificate, seal, has advantages such as being difficult for forging, can not losing and making things convenient for.Native system utilizes people's face information to come people's identity is verified automatically.Use therein people's face detection module is the achievement in research of this paper.Native system has also been participated in the FAT2004 contest of ICPR2004 tissue in addition.Total 13 face recognition algorithms of 11 science such as the Carnegie Mellon university from the U.S., the Neuroinformatik research institute of Germany, the Surrey university of Britain and commercial undertaking that comprise of contest are this time participated in.The system that submit in this laboratory all obtains the first place than the result of second place with low about 50% error rate on three evaluation indexes.The achievement in research of this paper is applied in this people's face detection module of testing real institute submission system, thereby the overall performance that has guaranteed system occupy advanced international standard.

In sum, the present invention can detect people's face in robust ground in having the image of complex background, obtained excellent testing result in experiment, has very application prospects.

Claims

1, the method for robust human face detection in the complex background image is characterized in that, this method is come designer's face detector based on the classification error risk of people's face and non-face pattern; For designing this human-face detector, at first the sample of collecting is carried out size normalization and unitary of illumination, to eliminate the input sample because of difference in the different classes that cause of illumination and size, utilize the CS-AdaBoost algorithm to select the microstructure features of reflection people face pattern/non-face pattern difference then, and these features are formed one deck strong classifier that has the mistake reject rate that is lower than 10e-3 and be lower than the false acceptance rate of 10e-6, then the multilayer strong classifier is cascaded into a complete human-face detector, handles obtaining final people's face position;

In the system that is made up of image capture device and computing machine, described method for detecting human face comprises training stage and detection-phase; Wherein the training stage is contained following steps successively:

The collection of step 1. sample

Utilization comprises camera, digital camera, scanner in any interior equipment images acquired, and artificial demarcation of people's face wherein cut out, and sets up people's face training sample database; Never comprise in the scenery picture of people's face and cut out non-face training image at random; Obtain 11580 people's face samples and 2000000 non-face samples altogether as training sample set;

Step 2. normalized comprises sample light and shines and big or small linear normalization;

The normalization of step 2.1 size, people's face and non-face image normalization that step 1 is obtained are specified size;

If the original sample image is [F (x, y)] _{M * N}, picture traverse and highly be respectively M and N, after the size normalization be [G (x, y)] _{W * H}Then after input picture and the normalization there be the corresponding relation between the image:

G(x，y)＝F(x/r _x，y/r _y)

R wherein _xAnd r _yBe respectively the change of scale factor of x and y direction: r _x=N/H, r _y=M/W; F (x/r _x, y/r _y) be estimation point (x/r _x, y/r _y) pixel value located, order:

\{\begin{matrix} x / r_{x} = x_{0} + Δ_{x} \\ y / r_{y} = y_{0} + Δ_{y} \end{matrix}, 0 \leq Δ_{x}, Δ_{y} < 1

Wherein:

\{\begin{matrix} x_{0} = [x / r_{x}], Δ_{x} = x / r_{x} - x_{0} \\ y_{0} = [y / r_{y}], Δ_{y} = y / r_{y} - y_{0} \end{matrix},

[] is bracket function, can get:

+F(x ₀，y ₀+1)Δ _x(1-Δ _y)+F(x ₀+1，y ₀+1)(1-Δ _x)(1-Δ _y)；

Step 2.2 gray scale normalization

I (x, y) = \frac{σ_{0}}{\overset{&OverBar;}{σ}} (G (x, y) - \overset{&OverBar;}{μ}) + μ_{0}

Wherein

\overset{&OverBar;}{μ} = \frac{1}{WH} Σ_{y = 0}^{H - 1} Σ_{x = 0}^{W - 1} G (x, y),

\overset{&OverBar;}{σ} = \sqrt{\frac{1}{WH} Σ_{y = 0}^{H - 1} Σ_{x = 0}^{W - 1} {(G (x, y) - \overset{&OverBar;}{μ})}^{2}};

Obtaining of step 3. sample characteristics storehouse

Calculate integrogram to extract microstructure features, it contains following steps successively:

Step 3.1 is calculated the integrogram of each sample

Use according to definition

II (x, y) = \underset{0 \leq x^{'} \leq 0}{Σ} \underset{0 \leq y^{'} \leq y}{Σ} I (x^{'}, y^{'})

Calculate each sample correspondence integrogram II (x, y), and have II (1, y)=0, II (x ,-1)=0;

The extraction in step 3.2 microstructure features storehouse

Set: five kinds of microstructure features that extract people's face sample with following five types of microstructure templates, each microstructure features by pixel grey scale in the corresponding image in calculation template black region and the white portion and difference obtain, described five kinds of microstructure features g (x, y, w h) is expressed as follows respectively:

(a) class: black region and white portion left-right symmetric and area equate, represents the wide of each zone wherein with w, and h represents the wherein height in each zone:

g(x，y，w，h)＝2·II(x+w-1，y-1)+II(x+2·w-1，y+h-1)

+II(x-1，y+h-1)-2·II(x+w-1，y+h-1)

-II(x+2·w-1，y-1)-II(x-1，y-1)

(b) class: symmetry and area equate about black region and the white portion, and the definition of w, h is identical with (a) class:

g(x，y，w，h)＝2II(x+w-1，y+h-1)+II(x-1，y-1)-II(x+w-1，y-1)

-2II(x-1，y+h-1)-II(x+w-1，y+2h-1)+II(x-1，y+2h-1)

(c) class: in the horizontal direction, black region is between two white portions, and the area of black region and every white portion equates that the definition of w, h is identical with (a) class:

g(x，y，w，h)＝2II(x+2w-1，y+h-1)+2II(x+w-1，y-1)-2II(x+2w-1，y-1)

-2II(x+w-1，y+h-1)-II(x+3w-1，y+h-1)-II(x-1，y-1)

+II(x-1，y+h-1)+II(x+3w-1，y-1)

(d) class: two black regions are in first quartile and third quadrant respectively, and two white portions are in second and four-quadrant respectively, and the area of every black region and every white portion equates that the definition of w, h is identical with (a) class:

g(x，y，w，h)＝-II(x-1，y-1)-II(x+2w-1，y-1)-II(x-1，y+2h-1)

-4II(x+w-1，y+h-1)+2II(x+w-1，y-1)+2II(x-1，y+h-1)

-II(x+2w-1，y+2h-1)+2II(x+2w-1，y+h-1)+2II(x+w-1，y+2h-1)

(e) class: black region is positioned at the central authorities of white portion, and the upper and lower both sides of black region, and left and right both sides are respectively apart from the upper and lower both sides of white portion, 2 pixels in left and right both sides, and w, h represent the wide and high of white portion respectively:

g(x，y，w，h)＝II(x+w-1，y+h-1)+II(x-1，y-1)-II(x+w-1，y-1)-II(x-1，y+h-1)

-II(x+w-3，y+h-3)-II(x+1，y+1)+II(x+1，y+h-3)+II(x+w-1，y+1)

For the sample image of one 20 * 20 pixel and above five types microstructure template, parameter x, y, w, the combination of h has 92267, can extract the characteristic quantity FV (j) of this sample image thus, 1≤j≤92267;

The normalization of step 3.3 feature, promptly each pixel samples image is carried out the normalization of gray average and variance:

If: each 20 * 20 pixel wicket zone, i.e. (x ₀≤ x '≤x ₀+ 19, y ₀≤ y '≤y ₀+ 19) mean value of Nei pixel grey scale is μ, and variance is σ, then:

-SqrII(x ₀+19，y ₀-1)]/400- μ ²} ^1/2

Then each microstructure features is done following normalization:

FV (j) = \frac{σ_{2}}{\overset{&OverBar;}{σ}} \overset{&OverBar;}{FV} (j)

For the sample image of one 20 * 20 pixel, obtain 92267 dimension microstructure features FV (j) altogether, 1≤j≤92267;

Step 4. feature selecting and classifier design

Train each layer people face/non-face strong classifier with training of above training sample set and CS-AdaBoost algorithm, and the multilayer strong classifier cascaded up form a complete human-face detector, may further comprise the steps:

Step 4.1 initialization i=1; The training objective that defines every layer of strong classifier is false rejection rate FRR≤0.02% on people's face training set, wrong acceptance rate FAR≤60% on non-face training set; Define target FRR≤0.5% of whole human-face detector on people's face training set, the target FAR on non-face training set≤3.2 * 10 ^-6

Step 4.2 training i layer strong classifier;

Step 4.3 detects sample set with the preceding i layer sorter that training obtains, and calculates FRR, FAR:

If step 4.4 FRR, FAR do not reach the predetermined value that step 4.1 is set, then the i value increases by 1, returns step 4.2 and proceeds training; Otherwise stop training;

Wherein step 4.2 contains following steps successively:

The initialization of step 4.2.1 parameter

The initialization of training sample misclassification risk; Misclassification risk for people's face sample

C_{i} = \frac{2 c}{c + 1},

Misclassification risk to non-face sample

C_{i} = \frac{2}{c + 1},

C is that people's face classification is the misclassification risk multiple of non-face classification, and the c value is greater than 1 and along with increasing of the strong classifier number of plies reduces to approach 1 gradually;

The initialization of training sample weight; The weight of initial each sample is

D_{1} (i) = \frac{C_{i}}{\underset{j}{Σ} C_{j}};

Select iterations T, T is the number of the Weak Classifier of wishing use, and T is along with increasing gradually of the strong classifier number of plies increased;

Maximum value Fmax of each characteristic distribution (j) and minimal value Fmin (j) on the statistical sample collection, wherein j is the feature sequence number, 1≤j≤92267;

Step 4.2.2 repeats following process T time, t=1 ..., T:

Step 4.2.2.1 uses j feature, 1≤j≤92267, structure Weak Classifier h _j, exhaustive search threshold parameter θ between Fmin (j) and Fmax (j) then _j, make h _jError rate ε _jMinimum, definition

ϵ_{j} = Σ_{i = 1}^{n} D_{i} (i) \cdot | h_{j} ({sub}_{i}) - l_{i} |;

Weak Classifier uses h _j(sub) expression is called for short h _j:

Wherein:

Sub is the sample of one 20 * 20 pixel,

g _j(sub) be j the feature that obtains from this sample extraction;

θ _jBe based on the decision threshold of j feature, people face and j feature of the non-face sample requirement that make the FRR of people face sample satisfy regulation of this threshold value by adding up all collections obtains;

h _j(sub) be to use the judgement output of the tree classification device of j latent structure, correspondingly obtain 92267 Weak Classifiers;

l _iThe=0, the 1st, sample image sub _iCategory label, respectively corresponding non-face classification and people's face classification, wherein people's face sample n _FaceIndividual, non-face sample n _NonfaceIndividual, common composing training sample set: L={ (sub _i, l _i), i=1 ..., n, l _i=0,1;

Step 4.2.2.2 order

ϵ_{t} = \underset{1 \leq j \leq 92267}{\arg \min} ϵ_{j},

And the Weak Classifier that it is corresponding is as h _t

Step 4.2.2.3 calculating parameter

α_{t} = \frac{1}{2} \ln (\frac{1 - ϵ_{t}}{ϵ_{t}});

Step 4.2.2.4 is the weight of new samples more

D_{t + 1} (i) = \frac{D_{t} (i) \exp (- α_{t} l_{i} h_{t} ({sub}_{i})) \exp ({λα}_{t} l_{i})}{Z_{t}},

Wherein

λ = \frac{c - 1}{c + 1},

i＝1，...，n，

Z_{t} = Σ_{i = 1}^{n} D_{t} (i) \exp (- α_{t} l_{i} h_{t} ({sub}_{i})) \exp ({λα}_{t} l_{i});

The strong classifier that step 4.2.3 output is last

Can train by above each step and to obtain a complete human-face detector;

At detection-phase, adopt following steps to judge whether comprise people's face in the input picture:

The collection of step 1. input picture

Utilize camera, digital camera, the arbitrary equipment images acquired of scanner;

The judgement of each wicket in the scaling of step 2. input picture and the image thereof

For detecting people's face of different size, the linear interpolation method that adopts training stage step 2.1 to use dwindles input picture according to a certain percentage continuously 12 times, obtain the input picture of 13 different sizes altogether, judge the window of each 20 * 20 pixel in every input picture respectively, can detect the people face of size like this from 20 * 20 pixels to 280 * 280 pixels; May further comprise the steps:

The scaling of step 2.1 input picture

Adopt linear interpolation method that training stage step 2.1 uses in proportion q=1.25 dwindle 12 input picture I continuously (x y) obtain input image sequence { I _i(x, y) }, i=0 ..., 12;

The calculating of step 2.2 integral image

Use the iterative formula of training stage step 3.1 to calculate each image I respectively _i(x, y) pairing integral image II _i(x is y) with square integral image SqrII _i(x, y), i=0 ..., 9;

The wicket of each in the step 2.3 judgement image

From every width of cloth image I _i(x, upper left corner y) begins to differentiate the wicket of each 20 * 20 Pixel Dimensions in the image, to any wicket [x ₀, y ₀x ₀+ 19, y ₀+ 19] treatment step is as follows:

Step 2.3.1 utilizes the integrogram II of entire image _i(x is y) with square integrogram SarII _i(x, y) the average μ and the variances sigma of calculating wicket;

-SqrII _i(x ₀+19，y ₀-1)]/400- μ ²} ^1/2

Step 2.3.2 utilizes training stage step 3.2 introduction method to extract the microstructure features of this wicket, and carries out the feature normalized;

Step 2.3.3 adopts the multilayer people's face/non-face strong classifier that trains that wicket is judged; If, think that then this wicket comprises people's face, exports its position by the judgement of all layers strong classifier; Otherwise discard this wicket, do not carry out subsequent treatment;

Utilize above step can detect everyone face in the input picture.