CN100336070C - Method of robust human face detection in complicated background image - Google Patents

Method of robust human face detection in complicated background image Download PDF

Info

Publication number
CN100336070C
CN100336070C CNB2005100862485A CN200510086248A CN100336070C CN 100336070 C CN100336070 C CN 100336070C CN B2005100862485 A CNB2005100862485 A CN B2005100862485A CN 200510086248 A CN200510086248 A CN 200510086248A CN 100336070 C CN100336070 C CN 100336070C
Authority
CN
China
Prior art keywords
face
sample
people
image
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2005100862485A
Other languages
Chinese (zh)
Other versions
CN1731417A (en
Inventor
丁晓青
马勇
方驰
刘长松
彭良瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CNB2005100862485A priority Critical patent/CN100336070C/en
Publication of CN1731417A publication Critical patent/CN1731417A/en
Application granted granted Critical
Publication of CN100336070C publication Critical patent/CN100336070C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The present invention relates to a human face detecting technology under a complex background, which belongs to the field of human face recognition. The present invention is characterized in that the present invention provides a human face detecting method in images of performance robust under a complex background. The present invention adopts microstructure characteristics with high efficiency and high redundancy to express the characteristics of the gray level distribution of regions of eyes, mouths, etc. in human face modes and adopts a risk sensitive AdaBoost algorithm to select characteristics which are in a microstructure for distinguishing human faces and nonhuman faces most from the characteristics to be formed into a strong classifier, and each classifier obtained through training reduces the false acceptance rate of nonhuman face samples as far as possible under the condition of the assurance of low rejected rate to human faces. Thereby, human face detection of higher performance under a complex background image is realized through simple structure, and additionally, a postprocessing algorithm is also used for further reducing error detecting rate. A plurality of public data bases and results of competition evaluation prove the excellent performance of the present invention.

Description

Method of robust human face detection in the complex background image
Technical field
Method for detecting human face belongs to the face recognition technology field in the complex background image.
Background technology
It is exactly the information such as position, size of definite people's face in image or image sequence that people's face detects.It is widely used in the systems such as recognition of face, video monitoring, Intelligent Human-Machine Interface at present.People's face detects people's face under the complex background especially, and to detect simultaneously also be the problem of a difficulty.This be owing to extraneous factors such as appearance, the colour of skin, expression, the reason of people's faces such as motion itself in three dimensions and beard, hair, glasses, illumination cause change in people's face mode class huge, and, be difficult to make a distinction with people's face because background object is very complicated.
The main stream approach of people's face detection at present is based on the detection method of sample statistics study.These class methods have generally been introduced " non-face " this classification, are different from the feature of " non-face " classification, the parameter of model by the sample of collecting being carried out statistical learning acquisition " people's face " classification, rather than obtain the top layer rule according to visual impression.This is more reliable on statistical significance, has not only avoided also can expanding the scope that detects by increasing training sample owing to imperfect, the mistake that out of true is brought of observing, and improves the robustness of detection system; These class methods adopt by simple mostly to complicated multistratum classification device structure in addition, earlier exclude most backdrop window by sorter simple in structure, by the sorter of complexity remaining window is further judged then, thereby reached detection speed faster.But since these class methods do not consider people's face and the extremely unbalanced characteristics of non-face two quasi-mode classification error risks in the real image (this be since the prior probability that people's face occurs in the image far below non-face prior probability, and it is to find out the position of people's face that people's face detects fundamental purpose, so being divided into non-face risk by mistake, people's face is people's face) much larger than non-face erroneous judgement, only adopt and train each layer sorter based on the method for minimum classification error rate, reach wrong reject rate (the False Rejection Rate lower by the threshold value of adjusting sorter to people's face, FRR), can not reach simultaneously like this false acceptance rate lower to non-face pattern (False Acceptance Rate, FAR); The sorter number of plies is too much, structure is too complicated, detection speed is slow thereby cause, and the algorithm overall performance is descended.Defective at this type of algorithm existence, the present invention proposes a kind of based on the responsive AdaBoost algorithm of risk (Cost Sensitive AdaBoost, abbreviation CS-AdaBoost) method for detecting human face, the principle that employing minimizes the classification risk makes each layer sorter of obtaining of training when guaranteeing the extremely low reject rate of people's face pattern, reduce the false acceptance rate of non-face classification as far as possible, thereby with the sorter number of plies still less, simpler sorter structure realizes that more high performance people's face detects under the complex background image, and this is not have used method in the present every other document.
Summary of the invention
The objective of the invention is to realize the human-face detector of energy robust people from location face under complex background.The realization of this human-face detector comprises training and detects two stages.
In the training stage, at first should carry out the collection of sample, comprise the collection of people's face and non-face sample, then sample is carried out the normalized of size and illumination; Utilize training sample then, carry out microstructure features and extract, obtain feature database; Utilize feature database in conjunction with the CS-AdaBoost algorithm then, training obtains one deck people face/non-face strong classifier; Repeat above training process, obtain structure by simple multistratum classification device to complexity; At last these sorter cascades are got up, obtain a complete human-face detector.
At detection-phase, at first be continuous according to a certain percentage scaling input picture, in the image series that obtains, differentiate the wherein wicket of each certain size (a rectangular area subimage in the definition input picture is a wicket) then.To each wicket, at first carrying out gray scale normalization handles, extract its microstructure features then, with the human-face detector that trains wicket is judged, if arbitrarily the output of one deck sorter is lower than assign thresholds and thinks that promptly this wicket is non-face and do not carry out follow-up judgement, has only those wickets of judging by all layers sorter to be considered to people's face.Thereby obtain high people's face and detect accuracy.This method has been applied to register in system etc. based on the work attendance of people's face.
The present invention consists of the following components: the cascade of sample collection and normalization, integrogram calculating and microstructure features extraction, feature selecting and classifier design, multistratum classification device.
1. sample collection and normalization
1.1 the collection of sample
Promptly adopt the manual method of demarcating of people, from the picture that comprises people's face, cut out facial image, never comprise in the scenery picture of people's face and cut out non-face image at random.Facial image and non-face image are used for training classifier as positive example sample and counter-example sample respectively.Gatherer process as shown in Figure 2.
1.2 size normalization
With people's face and the non-face image normalization of collecting each good size is specified size.If the original sample image is [F (x, y)] M * N, picture traverse is M, highly is N, the value that image is positioned at the picture element of the capable y of x row is F (x, y) (0≤x<M, 0≤y<N); If image is after the size normalization [G (x, y)] W * H, picture traverse is W, is H highly, gets W=H=20 in the experiment.Size normalization can be regarded as source images dot matrix [F (x, y)] like this M * NBe mapped to target image dot matrix [G (x, y)] W * HProcess.The present invention use back projection and linear interpolation with the original sample image transformation to the standard size sample image, input picture [F (x, y)] then M * NWith image after the normalization [G (x, y)] W * HBetween corresponding relation be:
G(x,y)=F(x/r x,y/r y)
R wherein xAnd r yBe respectively the change of scale factor of x and y direction: r x=N/H, r y=M/W.
According to following formula, (x is y) corresponding to the point (x/r in the input picture for the point in the output image dot matrix x, y/r y).Because x/r x, y/r yValue generally be not integer, so need estimate F (x/r according near the value at known discrete point place x, y/r y) value.According to linear interpolation method, for given (x, y), the order:
x / r x = x 0 + &Delta; x y / r y = y 0 + &Delta; y , 0 &le; &Delta; x , &Delta; y < 1
Wherein: x 0 = [ x / r x ] , &Delta; x = x / r x - x 0 y 0 = [ y / r y ] , &Delta; y = y / r y - y 0 , [] is bracket function.Interpolation process can be expressed as:
G(x,y)=F(x 0x,y 0y)=F(x 0,y 0xΔ y+F(x 0+1,y 0)(1-Δ xy
+F(x 0,y 0+1)Δ x(1-Δ y)+F(x 0+1,y 0+1)(1-Δ x)(1-Δ y)
1.3 gray scale normalization
Because factors such as ambient light photograph, imaging device may cause brightness of image or contrast unusual, strong shadow or situation such as reflective appear.So also need the sample behind the geometrical normalization is carried out the gray balance processing, improve its intensity profile, the consistance between enhancement mode.The present invention adopts gray average, variance normalization that sample is carried out the gray balance processing, and the average μ and the variances sigma of samples pictures gray scale are adjusted to set-point μ 0And σ 0
At first adopt following formula calculate sample image G (x, y) (0≤x<W, average and the variance of 0≤y<H):
&mu; &OverBar; = 1 WH &Sigma; y = 0 H - 1 &Sigma; x = 0 W - 1 G ( x , y )
&sigma; &OverBar; = 1 WH &Sigma; y = 0 H - 1 &Sigma; x = 0 W - 1 ( G ( x , y ) - &mu; &OverBar; ) 2
Then each gray values of pixel points is carried out as down conversion:
I ( x , y ) = &sigma; 0 &sigma; &OverBar; ( G ( x , y ) - &mu; &OverBar; ) + &mu; 0
Thereby the average and the variance of gradation of image are adjusted to set-point μ 0And σ 0, finish the gray scale normalization of sample.
2. microstructure features rapid extraction
The present invention adopts five types of microstructure templates among Fig. 5 to extract the higher-dimension microstructure features of people's face and non-face sample: each microstructure features by pixel grey scale in the corresponding image in calculation template black region and the white portion and difference to obtain (be in order to distinguish two zones herein, give different colors respectively, down with), and template in image the position and the size of template can change.Concrete feature extraction mode is as follows:
Definition S (x 1, y 1x 2, y 2) be zone (x 1≤ x '≤x 2, y 1≤ y '≤y 2) in pixel grey scale and
S ( x 1 , y 1 ; x 2 , y 2 ) = &Sigma; x 1 &le; x &prime; &le; x 2 &Sigma; y 1 &le; y &prime; &le; y 2 I ( x &prime; y &prime; )
If the pixel coordinate in the microstructure template upper left corner be (x, y), then five types of microstructures (black region equates with white area in preceding four kinds of microstructures, black region being distributed symmetrically in white portion in the 5th type of microstructure) are respectively as shown in Figure 5:
(a):S(x,y;x+w-1,y+h-1)-S(x+w,y;x+2w-1,y+h-1)
(b):S(x,y;x+w-1,y+h-1)-S(x,y+h;x+w-1,y+2h-1)
(c):2S(x+w,y;x+2w-1,y+h-1)-S(x,y;x+3w-1,y+h-1)
(d):S(x,y;x+2w-1,y+2h-1)-2S(x,y;x+w-1,y+h-1)-
-2S(x+w,y+h;x+2w-1,y+2h-1)
(e):S(x,y;x+w-1,y+h-1)-S(x+2,y+2;x+w-3,y+h-3)
Since each feature extraction only relate to pixel in the rectangular area and computational problem, so can utilize the integral image (Integral Image) of entire image to obtain a kind of microstructure features of any yardstick, optional position fast.
2.1 integral image
For an image I (x, y), (x 〉=0, y 〉=0), define its corresponding integral image II (x, y) be from (0,0) to (x, y) all pixel sums in the scope, promptly II ( x , y ) = &Sigma; 0 &le; x &prime; &le; x &Sigma; 0 &le; y &prime; &le; y I ( x &prime; , y &prime; ) , And definition II (1, y)=0, II (x ,-1)=0.Have thus:
S(x 1,y 1;x 2,y 2)=II(x 2,y 2)+II(x 1-1,y 1-1)-II(x 2,y 1-1)-II(x 1-1,y 2)。
Be original image I (x, y) in pixel and S (x in any one rectangular area 1, y 1x 2, y 2) can calculate through 3 plus-minus method by integrogram;
Same definition integrated square image SqrII (x, y) be from (0,0) to (x, y) interior all pixels square sum of scope, promptly
SqrII ( x , y ) = &Sigma; 0 &le; x &prime; &le; x &Sigma; 0 &le; y &prime; &le; y I ( x &prime; , y &prime; ) &CenterDot; I ( x &prime; , y &prime; ) . Wherein SqrII (1, y)=0, SqrII (x ,-1)=0.
The integrated square image can be used for calculating the variance (seeing 2.3 joints) of each rectangular area.
2.2 the rapid extraction of microstructure features
Since each feature extraction only relate to pixel in the rectangular area and computational problem, so above any one microstructure features can calculate fast by integral image several times plus-minus, wherein the computing formula (shown in Figure 6) of (a) type microstructure features
g(x,y,w,h)=2·II(x+w-1,y-1)+II(x+2·w-1,y+h-1)
+II(x-1,y+h-1)-2·II(x+w-1,y+h-1)
-II(x+2·w-1,y-1)-II(x-1,y-1)
(b) type microstructure features:
g(x,y,w,h)=2II(x+w-1,y+h-1)+II(x-1,y-1)-II(x+w-1,y-1)
-2II(x-1,y+h-1)-II(x+w-1,y+2h-1)+II(x-1,y+2h-1)
(c) type microstructure features:
g(x,y,w,h)=2II(x+2w-1,y+h-1)+2II(x+w-1,y-1)-2II(x+2w-1,y-1)
-2II(x+w-1,y+h-1)-II(x+3w-1,y+h-1)-II(x-1,y-1)
+II(x-1,y+h-1)+II(x+3w-1,y-1)
(d) type microstructure features:
g(x,y,w,h)=-II(x-1,y-1)-II(x+2w-1,y-1)-II(x-1,y+2h-1)
-4II(x+w-1,y+h-1)+2II(x+w-1,y-1)+2II(x-1,y+h-1)
-II(x+2w-1,y+2h-1)+2II(x+2w-1,y+h-1)+2II(x+w-1,y+2h-1)
(e) type microstructure features:
g(x,y,w,h)=II(x+w-1,y+h-1)+II(x-1,y-1)-II(x+w-1,y,-1)-II(x-1,y+h-1)
-II(x+w-3,y+h-3)-II(x+1,y+1)+II(x+1,y+h-3)+II(x+w-1,y+1)
Change parameter x, y, w, h can extract the feature of diverse location, different scale.For the sample image of one 20 * 20 pixel, can obtain 92267 five types microstructure features altogether, form the eigenvector FV (j) of this sample, 1≤j≤92267.
2.3 the normalization of feature
In order to alleviate the influence that illumination detects for people's face, need carry out the normalization of gray average and variance to each 20 * 20 pixel samples image, so at first will calculate the average μ and the variances sigma of wicket fast, then each dimensional feature is carried out normalization, 20 * 20 pixel wicket zone (x wherein 0≤ x '≤x 0+ 19, y 0≤ y '≤y 0+ 19) Nei pixel grey scale and μ and σ be respectively (as shown in Figure 6):
μ=[II(x 0+19,y 0+19)+II(x 0-1,y 0-1)-II(x 0-1,y 0+19)-II(x 0+19,y 0-1)]/400
σ={[SqrII(x 0+19,y 0+19)+SqrII(x 0-1,y 0-1)-SqrII(x 0-1,y 0+19)
-SqrII(x 0+19,y 0-1)]/400- μ 2} 1/2
Can carry out following normalization to each dimension microstructure features:
FV ( j ) = &sigma; 0 &sigma; &OverBar; FV &OverBar; ( j )
For the sample image of one 20 * 20 pixel, obtain 92267 dimension microstructure features FV (j) altogether, 1≤j≤92267.
3. feature selecting and classifier design
For reaching enough fast detection speed, a human-face detector must adopt hierarchy (as shown in Figure 7), and being cascaded up by the strong classifier from simple to complexity of sandwich construction constitutes.Earlier exclude backdrop window in the image, by baroque strong classifier remaining window is judged that (strong classifier herein is meant and reaches enough high performance sorter on training set then by strong classifier simple in structure; Weak Classifier hereinafter is meant that on training set error rate is a little less than 0.5 sorter).
The present invention uses every layer of strong classifier of CS-AdaBoost algorithm training.The CS-AdaBoost algorithm is the integrated algorithm of a kind of Weak Classifier, Weak Classifier can be combined into the strong classifier on training set; And treat the risk that two class classification errors bring in the CS-AdaBoost algorithm with a certain discrimination, the total classification risk of errors on the training set is minimized.Detect problem for people's face, the strong classifier that training is obtained reduces the classification error (FAR) of non-face classification simultaneously as far as possible on the basis of the enough low classification error (FRR) on underwriter's face classification.
3.1 the structure of Weak Classifier
Weak Classifier is to use the tree classification device of one-dimensional characteristic structure among the present invention:
h j ( sub ) = 1 , if g j ( sub ) < &theta; j or g j ( sub ) > &theta; j 0 , otherwise
Wherein sub is the sample of one 20 * 20 pixel, g j(sub) j feature obtaining from this sample extraction of expression, θ jBe the decision threshold (people face and j feature of the non-face sample requirement that make the FRR of people face sample satisfy regulation of this threshold value by adding up all collections obtains) of j feature correspondence, h j(sub) the judgement output of the tree classification device of j latent structure is used in expression.Each Weak Classifier only need compare a subthreshold and just can finish judgement like this; Can obtain 92267 Weak Classifiers altogether.
3.2 strong classifier design based on the CS-AdaBoost algorithm
The CS-AdaBoost algorithm is used for training of human face/non-face strong classifier in conjunction with above-mentioned Weak Classifier building method.Following (the note training sample set L={ (sub of training step i, l i), i=1 ..., n, l iThe=0, the 1st, sample image sub iCategory label, respectively corresponding non-face classification and people's face classification, wherein people's face sample n FaceIndividual, non-face sample n NonfaceIndividual):
3.2.1 the initialization of parameter
The initialization of training sample misclassification risk.Misclassification risk for everyone face sample C ( i ) = 2 c c + 1 , Misclassification risk to each non-face sample C ( i ) = 2 c + 1 (c is that people's face classification is the misclassification risk multiple of non-face classification, and the c value should be greater than 1 and along with increasing of the strong classifier number of plies reduces to approach 1 gradually, and concrete selective value sees Table 1);
The initialization of training sample weight.The weight of initial each sample is D 1 ( i ) = ( c + 1 ) &CenterDot; C ( i ) 2 c &CenterDot; n face + 2 &CenterDot; n nonface ;
Select iterations T (T is the number of the Weak Classifier of wishing use), T should increase along with increasing gradually of the strong classifier number of plies, and concrete selective value sees Table 1;
Maximum value Fmax of every dimensional feature (j) and minimal value Fmin (j) (wherein j is the feature sequence number, 1≤j≤92267) on the statistical sample collection: F max ( j ) = max 1 &le; i &le; n FV i ( j ) , F min ( j ) = min 1 &le; i &le; n FV i ( j ) ;
3.2.2 repeat following process T time (t=1 ..., T):
3.2.2.1 use j feature (1≤j≤92267) structure Weak Classifier h j, exhaustive search threshold parameter θ between Fmin (j) and Fmax (j) then j, make h jError rate ε jMinimum, definition &epsiv; j = &Sigma; i = 1 n D t ( i ) &CenterDot; | h j ( sub i ) - l i | ;
3.2.2.2 order &epsiv; t = arg min 1 &le; j &le; 92267 &epsiv; j , And the Weak Classifier that it is corresponding is as h t
3.2.2.3 calculating parameter &alpha; t = 1 2 ln ( 1 - &epsiv; t &epsiv; t ) ;
3.2.2.4 the weight of new samples more D t + 1 ( i ) = D t ( i ) exp ( - &alpha; t l i h t ( sub i ) ) exp ( &lambda; &alpha; t l i ) Z t , Wherein &lambda; = c - 1 c + 1 , i=1,...,n, Z t = &Sigma; i = 1 n D t ( i ) exp ( - &alpha; t l i h t ( sub i ) ) exp ( &lambda; &alpha; t l i ) .
3.2.3. export last strong classifier
Figure C200510086248001410
3.3 the cascade of multilayer strong classifier
Because the individual layer strong classifier is difficult to realize high-class speed simultaneously, extremely low FRR and extremely low targets such as FAR are so whole human-face detector must adopt hierarchy, by simply to complicated the multilayer strong classifier being cascaded up, as shown in Figure 7.When detecting,, can exclude immediately and not carry out follow-up judgement, otherwise further judge by follow-up more complicated strong classifier as long as certain image window can not pass through wherein any one deck.So for obviously unlike the video in window of people's face, preceding what just can be excluded, need not subsequent calculations, thereby saved calculated amount greatly.
Use 11580 people's face samples and 2000000 non-face samples as training sample set, the concrete training step of multilayer strong classifier cascade is as follows:
(1) initialization i=1; The training objective that defines every layer of strong classifier is FRR≤0.02% on people's face training set, FAR on non-face training set≤60%; Define target FRR≤0.5% of whole human-face detector on people's face training set, the target FAR on non-face training set≤3.2 * 10 -6, wherein FAR and FRR are defined as follows:
FAR=is differentiated the non-face total sample number of non-face number of samples ÷ * 100% for people's face
FRR=is differentiated is non-face people's face number of samples ÷ people face total sample number * 100%
(2) use training sample set to adopt the method in 3.2 joints to train i layer strong classifier;
(3) the preceding i layer sorter that obtains with training detects sample set:
(4) if FRR, FAR do not reach predetermined value, then i ← i+1 returns step (2) and proceeds training; Otherwise stop training.
The human-face detector that training at last obtains comprises 19 layers of strong classifier, has used 3139 Weak Classifiers altogether.The FRR of whole detecting device on people's face checking collection is about 0.15%, and FAR is about 3.2 * 10 on non-face training set -6Table 1 provides the wherein training result of several layers sorter.
Some face/non-face strong classifier training result of table 1
Number of plies i c T People's face FRR training set Non-face FAR checking collection
1 100 1 0.10% 64.2%
2 60 1 0.0% 83.5%
3 3.5 5 0.0% 75.4%
7 1.5 65 0.0% 42.5%
8 1.4 87 0.0% 40.1%
9 1.4 120 0.0% 35.4%
17 1.2 355 0.01% 67.6%
18 1.15 361 0.02% 60.2%
19 1.10 397 0.02% 68.3%
If a window thinks then that by the judgement of all layers sorter this window comprises people's face when detecting.
The invention is characterized in, it be a kind of can be under complex background and illumination robust ground detect the technology of various people's faces, and in normal video, can reach real-time detection speed.It at first carries out size normalization and unitary of illumination to the sample of collecting, to eliminate the input sample to greatest extent because of difference in the different classes that cause of illumination and size, extract the microstructure features of energy fine difference people's face and non-face mode configuration characteristics then expeditiously, utilize the training of CS-AdaBoost algorithm to obtain having the strong classifier of extremely low FRR and extremely low FAR on this basis, then the multilayer strong classifier is cascaded into a complete human-face detector, obtains final people's face position.
In the system that is made up of image capture device and computing machine, this detection method comprises training stage and detection-phase.Wherein the training stage is contained following steps successively:
1. the collection of sample
Utilize equipment images acquired such as camera, digital camera, scanner, and artificial demarcation of people's face wherein cut out, set up people's face training sample database; Never comprise in the scenery picture of people's face and cut out non-face training image at random.Obtain 11580 people's face samples and 2000000 non-face samples altogether as training sample set
2. normalized comprises sample light and shines and big or small linear normalization
(2.1) size normalization
If the original sample image is [F (x, y)] M * N, picture traverse and highly be respectively M and N, after the size normalization be [G (x, y)] W * H, get W=H=20 in the experiment.Use back projection and linear interpolation to obtain sample image after the normalization from the original sample image, then after input picture and the normalization there be the corresponding relation between the image:
G(x,y)=F(x/r x,y/r y)
R wherein xAnd r yBe respectively the change of scale factor of x and y direction: r x=N/H, r y=M/W.Because x/r x, y/r yValue generally be not integer, so need estimate F (x/r according near the value at known discrete point place x, y/r y) value.The present invention adopts linear interpolation method.For given (x, y), the order:
x / r x = x 0 + &Delta; x y / r y = y 0 + &Delta; y , 0 &le; &Delta; x , &Delta; y < 1
Wherein: x 0 = [ x / r x ] , &Delta; x = x / r x - x 0 y 0 = [ y / r y ] , &Delta; y = y / r y - y 0 , [] is bracket function, can get:
G(x,y)=F(x 0x,y 0y)=F(x 0,y 0xΔ y+F(x 0+1,y 0)(1-Δ xy
+F(x 0,y 0+1)Δ x(1-Δ y)+F(x 0+1,y 0+1)(1-Δ x)(1-Δ y)
(2.2) gray scale normalization
(x, the gray scale of each pixel y) is carried out as down conversion, and average μ and variances sigma are adjusted to set-point μ to the sample image G after the size normalization 0And σ 0, obtain sample image I (x, y):
I ( x , y ) = &sigma; 0 &sigma; &OverBar; ( G ( x , y ) - &mu; &OverBar; ) + &mu; 0 .
Wherein &mu; &OverBar; = 1 WH &Sigma; y = 0 H - 1 &Sigma; x = 0 W - 1 G ( x , y ) , &sigma; &OverBar; = 1 WH &Sigma; y = 0 H - 1 &Sigma; x = 0 W - 1 ( G ( x , y ) - &mu; &OverBar; ) 2 ;
3. the sample characteristics storehouse obtains
Calculate integrogram with the rapid extraction microstructure features, it contains following steps successively:
(3.1) calculate the integrogram of each sample
Use according to definition II ( x , y ) = &Sigma; 0 &le; x &prime; &le; x &Sigma; 0 &le; y &prime; &le; y I ( x &prime; , y &prime; ) Calculate each sample correspondence integrogram II (x, y), and have II (1, y)=0, II (x ,-1)=0.
(3.2) extraction in microstructure features storehouse
Utilize the definition of each microstructure features and 92267 features of above each sample correspondence of integrogram rapid extraction, thereby constitute the feature database of people's face sample and the feature database of non-face sample respectively.
4. classifier design
Train each layer people face/non-face strong classifier with training of above training sample set and CS-AdaBoost algorithm, and the multilayer strong classifier cascaded up form a complete human-face detector.May further comprise the steps:
(4.1) initialization i=1; The training objective that defines every layer of strong classifier is FRR≤0.02% on people's face training set, FAR on non-face training set≤60%; Define target FRR≤0.5% of whole human-face detector on people's face training set, the target FAR on non-face training set≤3.2 * 10 -6
(4.2) training i layer strong classifier;
(4.3) the preceding i layer sorter that obtains with training detects sample set;
(4.4) if FRR, FAR do not reach predetermined value, then i ← i+1 returns step (4.2) and proceeds training; Otherwise stop training.
Wherein step (4.2) contains following steps successively:
(4.2.1) initialization of parameter
The initialization of training sample misclassification risk.Misclassification risk for everyone face sample C ( i ) = 2 c c + 1 , Misclassification risk to each non-face sample C ( i ) = 2 c + 1 (c is that people's face classification is the misclassification risk multiple of non-face classification, and the c value should be greater than 1 and along with increasing of the strong classifier number of plies reduces to approach 1 gradually, and concrete selective value sees Table 1);
The initialization of training sample weight.The weight of initial each sample is D 1 ( i ) = ( c + 1 ) &CenterDot; C ( i ) 2 c &CenterDot; n face + 2 &CenterDot; n nonface ;
Select iterations T (T is the number of the Weak Classifier of wishing use), T should increase along with increasing gradually of the strong classifier number of plies, and concrete selective value sees Table 1);
Maximum value Fmax of every dimensional feature (j) and minimal value Fmin (j) (wherein j is the feature sequence number, 1≤j≤92267) on the statistical sample collection: F max ( j ) = max 1 &le; i &le; n FV i ( j ) , F min ( j ) = min 1 &le; i &le; n FV i ( j ) ;
(4.2.2) repeat following process T time (t=1 ..., T):
(4.2.2.1) use j feature (1≤j≤92267) structure Weak Classifier h j, exhaustive search threshold parameter θ between Fmin (j) and Fmax (j) then j, make h jError rate ε jMinimum, definition &epsiv; j = &Sigma; i = 1 n D t ( i ) &CenterDot; | h j ( sub i ) - l i | ;
(4.2.2.2) order &epsiv; t = arg min 1 &le; j &le; 92267 &epsiv; j , And the Weak Classifier that it is corresponding is as h t
(4.2.2.3) calculating parameter &alpha; t = 1 2 ln ( 1 - &epsiv; t &epsiv; t ) ;
(4.2.2.4) weight of new samples more D t + 1 ( i ) = D t ( i ) exp ( - &alpha; t l i h t ( sub i ) ) exp ( &lambda; &alpha; t l t ) Z t , Wherein &lambda; = c - 1 c + 1 , i=1,...,n, Z t = &Sigma; i = 1 n D t ( i ) exp ( - &alpha; t l i h t ( sub i ) ) exp ( &lambda; &alpha; t l i ) .
(4.2.1) the last strong classifier of output
Figure C20051008624800189
Can train by above each step and to obtain a complete human-face detector.
At detection-phase, these invention employing following steps judge whether comprise people's face in the input picture (an actual detection process is as figure):
(1) collection of input picture
Utilize equipment images acquired such as camera, digital camera, scanner.
(2) the quick judgement of each wicket in the scaling of input picture and the image thereof
For detecting people's face of different size, the linear interpolation method that adopts preamble to use dwindles 12 input pictures (the present invention adopts 1.25 ratio) according to a certain percentage continuously, obtain input picture altogether by 13 different sizes, judge the wicket of 20 * 20 all in every input picture pixels respectively, can detect the people face of size like this from 20 * 20 pixels to 280 * 280 pixels.May further comprise the steps specifically:
(2.1) scaling of input picture
Adopt linear interpolation method that preamble uses in proportion q=1.25 dwindle 12 input picture I continuously (x y) obtain input image sequence { I i(x, y) } (and i=0 ..., 12);
(2.2) calculating of integral image
Use above iterative formula to calculate each image I respectively i(x, y) pairing integral image II i(x is y) with square integral image SqrII i(x, y), (i=0 ..., 9);
(2.3) the exhaustive judgement of wicket
From every width of cloth image I i(x, upper left corner y) begins the wicket of all 20 * 20 Pixel Dimensions of exhaustive differentiation, to any wicket [x 0, y 0x 0+ 19, y 0+ 19] treatment step is as follows:
(2.3.1). utilize the integrogram II of entire image i(x is y) with square integrogram SqrII i(x, y) the average μ and the variances sigma of calculating wicket;
μ=[II i(x 0+19,y 0+19)+II i(x 0-1,y 0-1)-II i(x 0-1,y 0+19)-II i(x 0+19,y 0-1)]/400
σ={[SqrII i(x 0+19,y 0+19)+SqrII i(x 0-1,y 0-1)-SqrII i(x 0-1,y 0+19)
-SqrII i(x 0+19,y 0-1)]/400- μ 2} 1/2
(2.3.2). utilize the microstructure features of this wicket of preamble introduction method rapid extraction, and carry out the feature normalized;
(2.3.3). adopt the multilayer people's face/non-face strong classifier that trains that wicket is judged; If, think that then this wicket comprises people's face, exports its position by the judgement of all layers strong classifier; Otherwise discard this wicket, do not carry out subsequent treatment;
Utilize above step can fast robust ground to detect everyone face in the input picture.
For verifying validity of the present invention, we test on a plurality of public datas storehouse, and provide a concrete realization example.
We compare the performance of performance of the present invention with present universally acknowledged best algorithm on the CMU test set.The CMU test set comprises 130 pictures with complex background altogether, 507 people's faces.In the experiment image is carried out maximum 13 convergent-divergents in 1.25 ratio, 71040758 image windows have been judged in search altogether.Comparative result sees Table 2, this paper algorithm overall performance is better than Viola[Viola P as can be seen, Jones M.Rapid object detection using a boosted cascadeof simple features.Proc on Computer Vision Pattern Recognition, 2001], Schneiderman[Schneiderman H, Kanade T.Probabilistic modeling of local appearanceand spatial relationships for object recognition.Proc.on CVPR, 1998], Rowley[RowleyHA, Baluja S, and Kanade T.Neural network-based face detection.IEEE Transactionson Pattern Analysis and Machine Intelligence, 1998, the performance of method such as 20 (1): 23-38], particularly under the situation of low false-alarm, for example during 10 false-alarms, people's face verification and measurement ratio of this paper algorithm is 90.1%, higher by 7%~14% than the verification and measurement ratio of other algorithms, be much better than other algorithms.Wherein compare with the detecting device that obtains based on conventional AdaBoost algorithm of Viola, our human-face detector has used 3139 Weak Classifiers of 19 layers of strong classifier, and it has used a Weak Classifier surplus 38 layers of strong classifier 6000, our detecting device is structurally simple more than it, so the present invention has more excellent performance and detection speed faster.Normal video image for 386 * 288 adopts this paper algorithm can reach above detection speed (PIII1.8GHZ dominant frequency, 512M internal memory) 18 frame/seconds.
Table 2 compares with the performance of other detection method on the positive homo erectus's face of CMU test set
Figure C20051008624800201
On the BANCA database, also compare in addition with the detection performance of the famous product F aceIT of Identix company.The BANCA database comprises 6540 pictures with complex background and illumination, comprises positive homo erectus's face in every pictures, and the pitching of people's face changes greatly.Correct verification and measurement ratio of the present invention is 98.8%, and the correct verification and measurement ratio of FaceIT is 94.9%; Comprise in the test of carrying out on the image set of people's face transferring to every image of third party-China Aerospace information firm in its collection, the detection accuracy of this paper algorithm is 98.6%, and the detection accuracy of FaceIT is 98.0%.
Description of drawings
The hardware of a typical face detection system of Fig. 1 constitutes.
The acquisition process of Fig. 2 training sample.
The typical human face sample example that Fig. 3 obtains.
The formation of Fig. 4 face detection system.
Five kinds of microstructure features templates of Fig. 5.
The extraction example of the calculating of Fig. 6 integrogram and microstructure features.
The cascade of the multistage strong classifier of Fig. 7.
The training process of Fig. 8 strong classifier.
The actual detected process example of people's face in image of Fig. 9.
Figure 10 is based on the recognition of face of this algorithm system of registering.
Embodiment
When realizing a face detection system, at first should obtain human-face detector, but with regard to end user's face detector any input picture be detected then by collecting abundant sample training.The hardware configuration of total system as shown in Figure 1, the training process of system and testing process as shown in Figure 4, below the detailed various piece of introducing system:
A) realization of training system
A.1 training sample obtains
Utilize equipment images acquired such as camera, digital camera, scanner, the artificial demarcation of people's face wherein cut out, set up people's face training sample database; Non-face training sample then is never to comprise in the scenery picture etc. of people's face to extract at random.Collect altogether in this example and use 11580 people's face samples and 2000000 non-face samples as training set.
A.2 sample normalization
A.2.1 size normalization
If the original sample image is [F (x, y)] M * N, picture traverse and highly be respectively M and N, after the size normalization be [G (x, y)] W * H, get W=H=20 in the experiment.Use back projection and linear interpolation to obtain sample image after the normalization from the original sample image, then after input picture and the normalization there be the corresponding relation between the image:
G(x,y)=F(x/r x,y/r y)
R wherein xAnd r yBe respectively the change of scale factor of x and y direction: r x=N/H, r y=M/W.For given (x, y), the order:
x / r x = x 0 + &Delta; x y / r y = y 0 + &Delta; y , 0 &le; &Delta; x , &Delta; y < 1
Wherein: x 0 = [ x / r x ] , &Delta; x = x / r x - x 0 y 0 = [ y / r y ] , &Delta; y = y / r y - y 0 , [] is bracket function, can get:
G(x,y)=F(x 0x,y 0y)=F(x 0,y 0xΔ y+F(x 0+1,y 0)(1-Δ xy
+F(x 0,y 0+1)Δ x(1-Δ y)+F(x 0+1,y 0+1)(1-Δ x)(1-Δ y)
A.2.2 unitary of illumination
(x, the gray scale of each pixel y) is carried out as down conversion, and average μ and variances sigma are adjusted to set-point μ to the sample image G after the size normalization 0And σ 0, obtain sample image I (x, y):
I ( x , y ) = &sigma; 0 &sigma; &OverBar; ( G ( x , y ) - &mu; &OverBar; ) + &mu; 0 .
Wherein &mu; &OverBar; = 1 WH &Sigma; y = 0 H - 1 &Sigma; X = 0 W - 1 G ( x , y ) , &sigma; &OverBar; = 1 WH &Sigma; y = 0 H - 1 &Sigma; x = 0 W - 1 ( G ( x , y ) - &mu; &OverBar; ) 2 ;
A.3 the sample characteristics storehouse obtains
A.3.1 the calculating of sample integrogram
Use according to definition II ( x , y ) = &Sigma; 0 &le; x &prime; &le; x &Sigma; 0 &le; y &prime; &le; y I ( x &prime; , y &prime; ) Calculate each sample correspondence integrogram II (x, y), and have II (1, y)=0, II (x ,-1)=0.
A.3.2 the extraction in microstructure features storehouse
Utilize the definition of each microstructure features and 92267 features of above each sample correspondence of integrogram rapid extraction, carry out normalization respectively, thereby constitute the feature database of people's face sample and the feature database of non-face sample respectively.
A.4 the training of human-face detector
Train each layer people face/non-face strong classifier with training of above training sample set and CS-AdaBoost algorithm, and the multilayer strong classifier cascaded up form a complete human-face detector.May further comprise the steps:
A.4.1 initialization i=1; The training objective that defines every layer of strong classifier is FRR≤0.02% on people's face training set, FAR on non-face training set≤60%; Define target FRR≤0.5% of whole human-face detector on people's face training set, the target FAR on non-face training set≤3.2 * 10 -6
A.4.2 train i layer strong classifier;
A.4.3 the preceding i layer sorter that obtains with training detects sample set;
If A.4.4 FRR, FAR do not reach predetermined value, then i ← i+1 returns step (4.2) and proceeds training; Otherwise stop training.
Foregoing steps A .4.2 contains following steps successively:
A.4.2.1 the initialization of parameter
The initialization of training sample misclassification risk.Misclassification risk for everyone face sample C ( i ) = 2 c c + 1 , Misclassification risk to each non-face sample C ( i ) = 2 c + 1 (c is that people's face classification is the misclassification risk multiple of non-face classification, and the c value should be greater than 1 and along with increasing of the strong classifier number of plies reduces to approach 1 gradually, and concrete selective value sees Table 1);
The initialization of training sample weight.The weight of initial each sample is D 1 ( i ) = ( c + 1 ) &CenterDot; C ( i ) 2 c &CenterDot; n face + 2 &CenterDot; n nonface ;
Select iterations T (T is the number of the Weak Classifier of wishing use), T should increase along with increasing gradually of the strong classifier number of plies, and concrete selective value sees Table 1);
Maximum value Fmax of every dimensional feature (j) and minimal value Fmin (j) (wherein j is the feature sequence number, 1≤j≤92267) on the statistical sample collection: F max ( j ) = max 1 &le; i &le; n FV i ( j ) , F min ( j ) = min 1 &le; i &le; n FV i ( j ) ;
A.4.2.2 repeat following process T time (t=1 ..., T):
A.4.2.2.1 use j feature (1≤j≤92267) structure Weak Classifier h j, exhaustive search threshold parameter θ between Fmin (j) and Fmax (j) then j, make h jError rate ε jMinimum, definition &epsiv; j = &Sigma; i = 1 n D t ( i ) &CenterDot; | h j ( sub i ) - l i | ;
A.4.2.2.2 order &epsiv; t = arg min 1 &le; j &le; 92267 &epsiv; j , And the Weak Classifier that it is corresponding is as h t
A.4.2.2.3 calculating parameter &alpha; t = 1 2 ln ( 1 - &epsiv; t &epsiv; t ) ;
A.4.2.2.4 the weight of new samples more D t + 1 ( i ) = D t ( i ) exp ( - &alpha; t l i h t ( sub i ) ) exp ( &lambda; &alpha; t l i ) z t , Wherein &lambda; = c - 1 c + 1 , i=1,...,n, Z t = &Sigma; i = 1 n D t ( i ) exp ( - &alpha; t l i h t ( sub i ) ) exp ( &lambda; &alpha; t l i ) .
A.4.2.3 export last strong classifier
Figure C20051008624800237
B) realization of detection system
At detection-phase, this invention may further comprise the steps:
B.1 the collection of image
Utilize equipment images acquired such as camera, digital camera, scanner.
B.2 the calculating of pyramidal formation of input picture and integral image
For detecting people's face of different size, the linear interpolation method that adopts preamble to use dwindles 12 input pictures (the present invention adopts 1.25 ratio) according to a certain percentage continuously, obtain input picture altogether by 13 different sizes, judge the wicket (a rectangular area subimage in the definition input picture is a wicket) of each 20 * 20 pixel in every input picture respectively, can detect the people face of size like this from 20 * 20 pixels to 280 * 280 pixels.May further comprise the steps specifically:
B.2.1 the scaling of input picture
Adopt linear interpolation method that preamble uses in proportion q=1.25 dwindle 12 input picture I continuously (x y) obtain input image sequence { I i(x, y) } (and i=0 ..., 12);
B.2.2 the calculating of integral image
Use above iterative formula to calculate each image I respectively i(x, y) pairing integral image II i(x is y) with square integral image SqrII i(x, y), (i=0 ..., 9);
B.2.3 the exhaustive judgement of wicket
From every width of cloth image I i(x, upper left corner y) begins the wicket of all 20 * 20 Pixel Dimensions of exhaustive differentiation, to any wicket [x 0, y 0x 0+ 19, y 0+ 19] treatment step is as follows:
B.2.3.1 utilize the integrogram II of entire image i(x is y) with square integrogram SqrII i(x, y) the average μ and the variances sigma of calculating wicket;
μ=[II i(x 0+19,y 0+19)+II i(x 0-1,y 0-1)-II i(x 0-1,y 0+19)-II i(x 0+19,y 0-1)]/400
σ={[SqrII i(x 0+19,y 0+19)+SqrII i(x 0-1,y 0-1)-SqrII i(x 0-1,y 0+19)
-SqrII i(x 0+19,y 0-1)]/400- μ 2} 1/2
B.2.3.2 utilize the microstructure features of this wicket of preamble introduction method rapid extraction, and carry out the feature normalized;
B.2.3.3 adopt the multilayer people's face/non-face strong classifier that trains that wicket is judged; If, think that then this wicket comprises people's face, exports its position by the judgement of all layers strong classifier; Otherwise discard this wicket, do not carry out subsequent treatment;
Utilize above step can fast robust ground to detect everyone face in the input picture.
Embodiment 1: based on the identification of the people's face system (as Figure 10) of registering
Face authentication is to be subjected to the most friendly a kind of authentication mode in the biological characteristic authentication technology of extensive concern recently, be intended to utilize facial image to carry out the automatic personal identification of computing machine, to replace identification authentication mode such as traditional password, certificate, seal, has advantages such as being difficult for forging, can not losing and making things convenient for.Native system utilizes people's face information to come people's identity is verified automatically.Use therein people's face detection module is the achievement in research of this paper.Native system has also been participated in the FAT2004 contest of ICPR2004 tissue in addition.Total 13 face recognition algorithms of 11 science such as the Carnegie Mellon university from the U.S., the Neuroinformatik research institute of Germany, the Surrey university of Britain and commercial undertaking that comprise of contest are this time participated in.The system that submit in this laboratory all obtains the first place than the result of second place with low about 50% error rate on three evaluation indexes.The achievement in research of this paper is applied in this people's face detection module of testing real institute submission system, thereby the overall performance that has guaranteed system occupy advanced international standard.
In sum, the present invention can detect people's face in robust ground in having the image of complex background, obtained excellent testing result in experiment, has very application prospects.

Claims (1)

1, the method for robust human face detection in the complex background image is characterized in that, this method is come designer's face detector based on the classification error risk of people's face and non-face pattern; For designing this human-face detector, at first the sample of collecting is carried out size normalization and unitary of illumination, to eliminate the input sample because of difference in the different classes that cause of illumination and size, utilize the CS-AdaBoost algorithm to select the microstructure features of reflection people face pattern/non-face pattern difference then, and these features are formed one deck strong classifier that has the mistake reject rate that is lower than 10e-3 and be lower than the false acceptance rate of 10e-6, then the multilayer strong classifier is cascaded into a complete human-face detector, handles obtaining final people's face position;
In the system that is made up of image capture device and computing machine, described method for detecting human face comprises training stage and detection-phase; Wherein the training stage is contained following steps successively:
The collection of step 1. sample
Utilization comprises camera, digital camera, scanner in any interior equipment images acquired, and artificial demarcation of people's face wherein cut out, and sets up people's face training sample database; Never comprise in the scenery picture of people's face and cut out non-face training image at random; Obtain 11580 people's face samples and 2000000 non-face samples altogether as training sample set;
Step 2. normalized comprises sample light and shines and big or small linear normalization;
The normalization of step 2.1 size, people's face and non-face image normalization that step 1 is obtained are specified size;
If the original sample image is [F (x, y)] M * N, picture traverse and highly be respectively M and N, after the size normalization be [G (x, y)] W * HThen after input picture and the normalization there be the corresponding relation between the image:
G(x,y)=F(x/r x,y/r y)
R wherein xAnd r yBe respectively the change of scale factor of x and y direction: r x=N/H, r y=M/W; F (x/r x, y/r y) be estimation point (x/r x, y/r y) pixel value located, order:
x / r x = x 0 + &Delta; x y / r y = y 0 + &Delta; y , 0 &le; &Delta; x , &Delta; y < 1
Wherein: x 0 = [ x / r x ] , &Delta; x = x / r x - x 0 y 0 = [ y / r y ] , &Delta; y = y / r y - y 0 , [] is bracket function, can get:
G(x,y)=F(x 0x,y 0y)=F(x 0,y 0xΔ y+F(x 0+1,y 0)(1-Δ xy
+F(x 0,y 0+1)Δ x(1-Δ y)+F(x 0+1,y 0+1)(1-Δ x)(1-Δ y);
Step 2.2 gray scale normalization
(x, the gray scale of each pixel y) is carried out as down conversion, and average μ and variances sigma are adjusted to set-point μ to the sample image G after the size normalization 0And σ 0, obtain sample image I (x, y):
I ( x , y ) = &sigma; 0 &sigma; &OverBar; ( G ( x , y ) - &mu; &OverBar; ) + &mu; 0
Wherein &mu; &OverBar; = 1 WH &Sigma; y = 0 H - 1 &Sigma; x = 0 W - 1 G ( x , y ) , &sigma; &OverBar; = 1 WH &Sigma; y = 0 H - 1 &Sigma; x = 0 W - 1 ( G ( x , y ) - &mu; &OverBar; ) 2 ;
Obtaining of step 3. sample characteristics storehouse
Calculate integrogram to extract microstructure features, it contains following steps successively:
Step 3.1 is calculated the integrogram of each sample
Use according to definition II ( x , y ) = &Sigma; 0 &le; x &prime; &le; 0 &Sigma; 0 &le; y &prime; &le; y I ( x &prime; , y &prime; ) Calculate each sample correspondence integrogram II (x, y), and have II (1, y)=0, II (x ,-1)=0;
The extraction in step 3.2 microstructure features storehouse
Set: five kinds of microstructure features that extract people's face sample with following five types of microstructure templates, each microstructure features by pixel grey scale in the corresponding image in calculation template black region and the white portion and difference obtain, described five kinds of microstructure features g (x, y, w h) is expressed as follows respectively:
(a) class: black region and white portion left-right symmetric and area equate, represents the wide of each zone wherein with w, and h represents the wherein height in each zone:
g(x,y,w,h)=2·II(x+w-1,y-1)+II(x+2·w-1,y+h-1)
+II(x-1,y+h-1)-2·II(x+w-1,y+h-1)
-II(x+2·w-1,y-1)-II(x-1,y-1)
(b) class: symmetry and area equate about black region and the white portion, and the definition of w, h is identical with (a) class:
g(x,y,w,h)=2II(x+w-1,y+h-1)+II(x-1,y-1)-II(x+w-1,y-1)
-2II(x-1,y+h-1)-II(x+w-1,y+2h-1)+II(x-1,y+2h-1)
(c) class: in the horizontal direction, black region is between two white portions, and the area of black region and every white portion equates that the definition of w, h is identical with (a) class:
g(x,y,w,h)=2II(x+2w-1,y+h-1)+2II(x+w-1,y-1)-2II(x+2w-1,y-1)
-2II(x+w-1,y+h-1)-II(x+3w-1,y+h-1)-II(x-1,y-1)
+II(x-1,y+h-1)+II(x+3w-1,y-1)
(d) class: two black regions are in first quartile and third quadrant respectively, and two white portions are in second and four-quadrant respectively, and the area of every black region and every white portion equates that the definition of w, h is identical with (a) class:
g(x,y,w,h)=-II(x-1,y-1)-II(x+2w-1,y-1)-II(x-1,y+2h-1)
-4II(x+w-1,y+h-1)+2II(x+w-1,y-1)+2II(x-1,y+h-1)
-II(x+2w-1,y+2h-1)+2II(x+2w-1,y+h-1)+2II(x+w-1,y+2h-1)
(e) class: black region is positioned at the central authorities of white portion, and the upper and lower both sides of black region, and left and right both sides are respectively apart from the upper and lower both sides of white portion, 2 pixels in left and right both sides, and w, h represent the wide and high of white portion respectively:
g(x,y,w,h)=II(x+w-1,y+h-1)+II(x-1,y-1)-II(x+w-1,y-1)-II(x-1,y+h-1)
-II(x+w-3,y+h-3)-II(x+1,y+1)+II(x+1,y+h-3)+II(x+w-1,y+1)
For the sample image of one 20 * 20 pixel and above five types microstructure template, parameter x, y, w, the combination of h has 92267, can extract the characteristic quantity FV (j) of this sample image thus, 1≤j≤92267;
The normalization of step 3.3 feature, promptly each pixel samples image is carried out the normalization of gray average and variance:
If: each 20 * 20 pixel wicket zone, i.e. (x 0≤ x '≤x 0+ 19, y 0≤ y '≤y 0+ 19) mean value of Nei pixel grey scale is μ, and variance is σ, then:
μ=[II(x 0+19,y 0+19)+II(x 0-1,y 0-1)-II(x 0-1,y 0+19)-II(x 0+19,y 0-1)]/400
σ={[SqrII(x 0+19,y 0+19)+SqrII(x 0-1,y 0-1)-SqrII(x 0-1,y 0+19)
-SqrII(x 0+19,y 0-1)]/400- μ 2} 1/2
Then each microstructure features is done following normalization:
FV ( j ) = &sigma; 2 &sigma; &OverBar; FV &OverBar; ( j )
For the sample image of one 20 * 20 pixel, obtain 92267 dimension microstructure features FV (j) altogether, 1≤j≤92267;
Step 4. feature selecting and classifier design
Train each layer people face/non-face strong classifier with training of above training sample set and CS-AdaBoost algorithm, and the multilayer strong classifier cascaded up form a complete human-face detector, may further comprise the steps:
Step 4.1 initialization i=1; The training objective that defines every layer of strong classifier is false rejection rate FRR≤0.02% on people's face training set, wrong acceptance rate FAR≤60% on non-face training set; Define target FRR≤0.5% of whole human-face detector on people's face training set, the target FAR on non-face training set≤3.2 * 10 -6
Step 4.2 training i layer strong classifier;
Step 4.3 detects sample set with the preceding i layer sorter that training obtains, and calculates FRR, FAR:
FAR=is differentiated the non-face total sample number of non-face number of samples ÷ * 100% for people's face
FRR=is differentiated is non-face people's face number of samples ÷ people face total sample number * 100%
If step 4.4 FRR, FAR do not reach the predetermined value that step 4.1 is set, then the i value increases by 1, returns step 4.2 and proceeds training; Otherwise stop training;
Wherein step 4.2 contains following steps successively:
The initialization of step 4.2.1 parameter
The initialization of training sample misclassification risk; Misclassification risk for people's face sample C i = 2 c c + 1 , Misclassification risk to non-face sample C i = 2 c + 1 , C is that people's face classification is the misclassification risk multiple of non-face classification, and the c value is greater than 1 and along with increasing of the strong classifier number of plies reduces to approach 1 gradually;
The initialization of training sample weight; The weight of initial each sample is D 1 ( i ) = C i &Sigma; j C j ;
Select iterations T, T is the number of the Weak Classifier of wishing use, and T is along with increasing gradually of the strong classifier number of plies increased;
Maximum value Fmax of each characteristic distribution (j) and minimal value Fmin (j) on the statistical sample collection, wherein j is the feature sequence number, 1≤j≤92267;
Step 4.2.2 repeats following process T time, t=1 ..., T:
Step 4.2.2.1 uses j feature, 1≤j≤92267, structure Weak Classifier h j, exhaustive search threshold parameter θ between Fmin (j) and Fmax (j) then j, make h jError rate ε jMinimum, definition &epsiv; j = &Sigma; i = 1 n D i ( i ) &CenterDot; | h j ( sub i ) - l i | ;
Weak Classifier uses h j(sub) expression is called for short h j:
Figure C2005100862480005C5
Wherein:
Sub is the sample of one 20 * 20 pixel,
g j(sub) be j the feature that obtains from this sample extraction;
θ jBe based on the decision threshold of j feature, people face and j feature of the non-face sample requirement that make the FRR of people face sample satisfy regulation of this threshold value by adding up all collections obtains;
h j(sub) be to use the judgement output of the tree classification device of j latent structure, correspondingly obtain 92267 Weak Classifiers;
l iThe=0, the 1st, sample image sub iCategory label, respectively corresponding non-face classification and people's face classification, wherein people's face sample n FaceIndividual, non-face sample n NonfaceIndividual, common composing training sample set: L={ (sub i, l i), i=1 ..., n, l i=0,1;
Step 4.2.2.2 order &epsiv; t = arg min 1 &le; j &le; 92267 &epsiv; j , And the Weak Classifier that it is corresponding is as h t
Step 4.2.2.3 calculating parameter &alpha; t = 1 2 ln ( 1 - &epsiv; t &epsiv; t ) ;
Step 4.2.2.4 is the weight of new samples more D t + 1 ( i ) = D t ( i ) exp ( - &alpha; t l i h t ( sub i ) ) exp ( &lambda;&alpha; t l i ) Z t , Wherein &lambda; = c - 1 c + 1 , i=1,...,n, Z t = &Sigma; i = 1 n D t ( i ) exp ( - &alpha; t l i h t ( sub i ) ) exp ( &lambda;&alpha; t l i ) ;
The strong classifier that step 4.2.3 output is last
Figure C2005100862480006C6
Can train by above each step and to obtain a complete human-face detector;
At detection-phase, adopt following steps to judge whether comprise people's face in the input picture:
The collection of step 1. input picture
Utilize camera, digital camera, the arbitrary equipment images acquired of scanner;
The judgement of each wicket in the scaling of step 2. input picture and the image thereof
For detecting people's face of different size, the linear interpolation method that adopts training stage step 2.1 to use dwindles input picture according to a certain percentage continuously 12 times, obtain the input picture of 13 different sizes altogether, judge the window of each 20 * 20 pixel in every input picture respectively, can detect the people face of size like this from 20 * 20 pixels to 280 * 280 pixels; May further comprise the steps:
The scaling of step 2.1 input picture
Adopt linear interpolation method that training stage step 2.1 uses in proportion q=1.25 dwindle 12 input picture I continuously (x y) obtain input image sequence { I i(x, y) }, i=0 ..., 12;
The calculating of step 2.2 integral image
Use the iterative formula of training stage step 3.1 to calculate each image I respectively i(x, y) pairing integral image II i(x is y) with square integral image SqrII i(x, y), i=0 ..., 9;
The wicket of each in the step 2.3 judgement image
From every width of cloth image I i(x, upper left corner y) begins to differentiate the wicket of each 20 * 20 Pixel Dimensions in the image, to any wicket [x 0, y 0x 0+ 19, y 0+ 19] treatment step is as follows:
Step 2.3.1 utilizes the integrogram II of entire image i(x is y) with square integrogram SarII i(x, y) the average μ and the variances sigma of calculating wicket;
μ=[II i(x 0+19,y 0+19)+II i(x 0-1,y 0-1)-II i(x 0-1,y 0+19)-II i(x 0+19,y 0-1)]/400
σ={[SqrII i(x 0+19,y 0+19)+SqrII i(x 0-1,y 0-1)-SqrII i(x 0-1,y 0+19)
-SqrII i(x 0+19,y 0-1)]/400- μ 2} 1/2
Step 2.3.2 utilizes training stage step 3.2 introduction method to extract the microstructure features of this wicket, and carries out the feature normalized;
Step 2.3.3 adopts the multilayer people's face/non-face strong classifier that trains that wicket is judged; If, think that then this wicket comprises people's face, exports its position by the judgement of all layers strong classifier; Otherwise discard this wicket, do not carry out subsequent treatment;
Utilize above step can detect everyone face in the input picture.
CNB2005100862485A 2005-08-19 2005-08-19 Method of robust human face detection in complicated background image Expired - Fee Related CN100336070C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2005100862485A CN100336070C (en) 2005-08-19 2005-08-19 Method of robust human face detection in complicated background image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2005100862485A CN100336070C (en) 2005-08-19 2005-08-19 Method of robust human face detection in complicated background image

Publications (2)

Publication Number Publication Date
CN1731417A CN1731417A (en) 2006-02-08
CN100336070C true CN100336070C (en) 2007-09-05

Family

ID=35963765

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005100862485A Expired - Fee Related CN100336070C (en) 2005-08-19 2005-08-19 Method of robust human face detection in complicated background image

Country Status (1)

Country Link
CN (1) CN100336070C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010072153A1 (en) * 2008-12-25 2010-07-01 南京壹进制信息技术有限公司 Computer intelligent energy saving method

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100389429C (en) * 2006-06-01 2008-05-21 北京中星微电子有限公司 AdaBoost based characteristic extracting method for pattern recognition
CN101196984B (en) * 2006-12-18 2010-05-19 北京海鑫科金高科技股份有限公司 Fast face detecting method
JP5058681B2 (en) * 2007-05-31 2012-10-24 キヤノン株式会社 Information processing method and apparatus, program, and storage medium
CN101315670B (en) 2007-06-01 2010-08-11 清华大学 Specific shot body detection device, learning device and method thereof
WO2008151470A1 (en) * 2007-06-15 2008-12-18 Tsinghua University A robust human face detecting method in complicated background image
CN101406390B (en) * 2007-10-10 2012-07-18 三星电子株式会社 Method and apparatus for detecting part of human body and human, and method and apparatus for detecting objects
CN101196990B (en) * 2007-10-11 2011-04-27 北京海鑫科金高科技股份有限公司 Network built-in type multiplex face detecting system and method thereof
CN101178770B (en) * 2007-12-11 2011-02-16 北京中星微电子有限公司 Image detection method and apparatus
CN101655914B (en) * 2008-08-18 2014-10-22 索尼(中国)有限公司 Training device, training method and detection method
CN101350063B (en) * 2008-09-03 2011-12-28 北京中星微电子有限公司 Method and apparatus for locating human face characteristic point
CN101360246B (en) * 2008-09-09 2010-06-02 西南交通大学 Video error masking method combined with 3D human face model
CN101751551B (en) * 2008-12-05 2013-03-20 比亚迪股份有限公司 Method, device, system and device for identifying face based on image
CN101447023B (en) * 2008-12-23 2013-03-27 北京中星微电子有限公司 Method and system for detecting human head
CN101872477B (en) 2009-04-24 2014-07-16 索尼株式会社 Method and device for detecting object in image and system containing device
CN102024149B (en) * 2009-09-18 2014-02-05 北京中星微电子有限公司 Method of object detection and training method of classifier in hierarchical object detector
JP2011198268A (en) * 2010-03-23 2011-10-06 Sony Corp Information processing apparatus, method, and program
CN102004904B (en) * 2010-11-17 2013-06-19 东软集团股份有限公司 Automatic teller machine-based safe monitoring device and method and automatic teller machine
CN102136075B (en) * 2011-03-04 2013-05-15 杭州海康威视数字技术股份有限公司 Multiple-viewing-angle human face detecting method and device thereof under complex scene
CN102170563A (en) * 2011-03-24 2011-08-31 杭州海康威视软件有限公司 Intelligent person capture system and person monitoring management method
CN102436578B (en) * 2012-01-16 2014-06-04 宁波江丰生物信息技术有限公司 Formation method for dog face characteristic detector as well as dog face detection method and device
CN102693417A (en) * 2012-05-16 2012-09-26 清华大学 Method for collecting and optimizing face image sample based on heterogeneous active visual network
CN104318049A (en) * 2014-10-30 2015-01-28 济南大学 Coronal mass ejection event identification method
CN106295668A (en) * 2015-05-29 2017-01-04 中云智慧(北京)科技有限公司 Robust gun detection method
CN105205453B (en) * 2015-08-28 2019-01-08 中国科学院自动化研究所 Human eye detection and localization method based on depth self-encoding encoder
CN105488456B (en) * 2015-11-23 2019-04-23 中国科学院自动化研究所 Method for detecting human face based on adaptive threshold adjustment rejection sub-space learning
CN105488472B (en) * 2015-11-30 2019-04-09 华南理工大学 A kind of digital cosmetic method based on sample form
CN106725341A (en) * 2017-01-09 2017-05-31 燕山大学 A kind of enhanced lingual diagnosis system
CN106874867A (en) * 2017-02-14 2017-06-20 江苏科技大学 A kind of face self-adapting detecting and tracking for merging the colour of skin and profile screening
CN107784263B (en) * 2017-04-28 2021-03-30 新疆大学 Planar rotation face detection method based on improved accelerated robust features
CN109961455B (en) 2017-12-22 2022-03-04 杭州萤石软件有限公司 Target detection method and device
CN108108724B (en) * 2018-01-19 2020-05-08 浙江工商大学 Vehicle detector training method based on multi-subregion image feature automatic learning
CN108537143B (en) * 2018-03-21 2019-02-15 光控特斯联(上海)信息科技有限公司 A kind of face identification method and system based on key area aspect ratio pair
CN109344868B (en) * 2018-08-28 2021-11-16 广东奥普特科技股份有限公司 General method for distinguishing different types of objects which are mutually axisymmetric
CN110502992B (en) * 2019-07-18 2021-06-15 武汉科技大学 Relation graph based fast face recognition method for fixed scene video
CN110674690B (en) * 2019-08-21 2022-06-14 成都华为技术有限公司 Detection method, detection device and detection equipment
CN113822105B (en) * 2020-07-07 2024-04-19 湖北亿立能科技股份有限公司 Artificial intelligence water level monitoring system based on online two classifiers of SVM water scale
CN113283378B (en) * 2021-06-10 2022-09-27 合肥工业大学 Pig face detection method based on trapezoidal region normalized pixel difference characteristics

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09101579A (en) * 1995-10-05 1997-04-15 Fuji Photo Film Co Ltd Face area extraction method and copying condition determination method
US5781650A (en) * 1994-02-18 1998-07-14 University Of Central Florida Automatic feature detection and age classification of human faces in digital images
CN1508752A (en) * 2002-12-13 2004-06-30 佳能株式会社 Image processing method and apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5781650A (en) * 1994-02-18 1998-07-14 University Of Central Florida Automatic feature detection and age classification of human faces in digital images
JPH09101579A (en) * 1995-10-05 1997-04-15 Fuji Photo Film Co Ltd Face area extraction method and copying condition determination method
CN1508752A (en) * 2002-12-13 2004-06-30 佳能株式会社 Image processing method and apparatus

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ROBUST PRECISE EYE LOCATION UNDERPROBABILISTIC FRAMEWORK Yong Ma,Xiaoqing Ding,Zhenger Wang,Ning Wang,IEEE FGR'04,Vol.2004 2004 *
基于层次型支持向量机的人脸检测 马勇、丁晓青,清华大学学报(自然科学版),第43卷第1期 2003 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010072153A1 (en) * 2008-12-25 2010-07-01 南京壹进制信息技术有限公司 Computer intelligent energy saving method

Also Published As

Publication number Publication date
CN1731417A (en) 2006-02-08

Similar Documents

Publication Publication Date Title
CN100336070C (en) Method of robust human face detection in complicated background image
CN1214340C (en) Multi-neural net imaging appts. and method
CN1794266A (en) Biocharacteristics fusioned identity distinguishing and identification method
CN100336071C (en) Method of robust accurate eye positioning in complicated background image
CN1811793A (en) Automatic positioning method for characteristic point of human faces
CN100345165C (en) Method and apparatus for image-based photorealistic 3D face modeling
He et al. Real-time human face detection in color image
CN1552041A (en) Face meta-data creation and face similarity calculation
US8582897B2 (en) Information processing apparatus and method, program, and recording medium
US8306282B2 (en) Hierarchical face recognition training method and hierarchical face recognition method thereof
CN1664846A (en) On-line hand-written Chinese characters recognition method based on statistic structural features
CN1828632A (en) Object detection apparatus, learning apparatus, object detection system, object detection method
CN1973757A (en) Computerized disease sign analysis system based on tongue picture characteristics
CN110781829A (en) Light-weight deep learning intelligent business hall face recognition method
CN1818927A (en) Fingerprint identifying method and system
CN1975759A (en) Human face identifying method based on structural principal element analysis
CN1885310A (en) Human face model training module and method, human face real-time certification system and method
CN1932847A (en) Method for detecting colour image human face under complex background
CN1200387C (en) Statistic handwriting identification and verification method based on separate character
Ryu et al. Coarse-to-fine classification for image-based face detection
CN1305002C (en) Multiple registered fingerprint fusing method
CN1041773C (en) Character recognition method and apparatus based on 0-1 pattern representation of histogram of character image
WO2008151471A1 (en) A robust precise eye positioning method in complicated background image
Babu et al. Handwritten digit recognition using structural, statistical features and k-nearest neighbor classifier
CN1588424A (en) Finger print identifying method based on broken fingerprint detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20070905

Termination date: 20190819