CN102622584A

CN102622584A - Method for detecting mask faces in video monitor

Info

Publication number: CN102622584A
Application number: CN2012100527167A
Authority: CN
Inventors: 师改梅; 胡入幻; 白云; 杨云; 缪泽; 补建; 罗安; 周聪俊
Original assignee: CHENGDU SANTAI ELECTRONIC INDUSTRY Co Ltd
Current assignee: Chengdu Santai Intelligent Technology Co ltd
Priority date: 2012-03-02
Filing date: 2012-03-02
Publication date: 2012-08-01
Anticipated expiration: 2032-03-02
Also published as: CN102622584B

Abstract

The invention relates to a method for detecting mask faces in a video monitor, which aims to solve the problem that illegal actions of masks cannot be prevented by adopting a traditional video monitoring system. The method for detecting the mask faces in the video monitor comprises the following steps of: a-, converting a color video image acquired from a monitoring site into a gray level image; b-, performing scaling on the gray level image; c-, performing head detection on the gray level image; d-, performing interframe matching on every head when the heads appear; e-, performing face detection on the heads; and f-, performing a mask judgment, marking an original color video image judged to have the mask faces, and giving an alarm.

Description

The detection method of masked man's face in the video monitoring

Technical field:

The present invention relates to Flame Image Process and area of pattern recognition, the method that particularly people's face detects in the video monitoring.

Background technology:

It is that people's face is detected from video image background that people's face in the video monitoring detects, owing to receive the influence of image background, brightness variation and other each side factors, makes people's face be detected as the research topic for a complicacy.At present, the cascade adaboost method training recognition methods based on the haar characteristic is considered to ripe, the most effective method for detecting human face.And masked man's face perhaps wears masks like wear dark glasses, is a kind of special people's face, and it is different that the special people's face of this type detects traditional relatively people's face detection characteristic, and similar process is arranged, but not identical entirely.The comparatively ideal situation of masked detection is exactly on the number of people, not detect complete facial image, and the people's face that for this reason detects on the basis at the number of people detects the main thought that becomes research.

In patent 201010122033.5; Mentioned a kind of human body head detection method, this method is utilized the contour feature of human body head, distinguishes a plurality of human body targets; This method is used for camera more and overlooks shooting at a distance, with the more complete occasion of number of people top embodiment.And in the certain applications field of video monitoring, like the ATM monitoring, because the singularity of the installation site of image capture device, mostly the image of the surveyed area that obtains is human body front face information, can utilize the face detail characteristic to detect.Min Li; The shape facility that forms according to the number of people and shoulder among the Estimatingthe Number of People in Crowded Scenes by MID Based Foreground Segmentationand Head-shoulder Detection of Zhaoxiang Zhang etc.; People's head inspecting method based on gradient orientation histogram hog has been proposed; This method can be carried out the number of people more exactly and detected, but does not do analysis for the face detail of people's face, can not prevent the lawbreaking activities of masked camouflage.

Summary of the invention:

The purpose of this invention is to provide and a kind ofly can judge whether masked man's face is arranged in the video image, prevent the malfeasant masked man's face detecting method of masked camouflage.

The present invention includes following steps:

1. will monitor the on-the-spot color video frequency image that obtains and convert gray level image into;

2. gray level image is carried out convergent-divergent;

3. detect at the enterprising pedestrian's head of gray level image, when having detected the number of people, the step below getting into, the no number of people is circulation step 1 to 3 then;

4. each number of people is mated in interframe;

5. carrying out people's face detects;

6. carry out masked judgement, the original color video image that rules out masked man's face is carried out mark, and report to the police.

In step 3, adopt the moving window method, the edge is from left to right; The top-down direction moves moving window by pixel, gray level image is divided into the video in window of corresponding each moving window; Video in window is carried out the number of people detect, when moving window is positioned at first video in window:

(1) the horizontal gradient G of each pixel of calculation window image _x[i, j] and VG (vertical gradient) G _y[i, j]:

A.G _x[i, j] and G _y[i, j] initialization:

G _x[i, j] and G _yThe value initialization of each pixel is 0 in [i, j], all pixels on [i, j] cycling among windows image, and i is a variable, the horizontal level of pixel in the expression video in window, value is i=1,2 ..., W ₀, j is a variable, the upright position of pixel in the expression video in window, and value is j=1,2 ..., H ₀, W ₀, H ₀Be respectively the width and the height of video in window;

B. on video in window, calculate the horizontal gradient G of each pixel _x[i, j] and VG (vertical gradient) G _y[i, j]:

With Sobel horizontal edge operator as the computing template; Translation computing template center is to each pixel place; With each pixel and corresponding the multiplying each other of each element of computing template in the image-region under the covering of computing template, all sum of products are as the horizontal gradient G of each pixel _x[i, j] as the computing template, obtains the VG (vertical gradient) G of each pixel with Sobel vertical edge operator _y[i, j]:

G_{x} [i, j] = Σ_{k = 1}^{3} Σ_{l = 1}^{3} I [i - 2 + k, j - 2 + l] S_{x} [k, l],

G_{y} [i, j] = Σ_{k = 1}^{3} Σ_{l = 1}^{3} I [i - 2 + k, j - 2 + l] S_{y} [k, l],

Wherein, i=2,3 ..., W ₀-1, j=2,3 ..., H ₀-1, each gray values of pixel points of I [i, j] expression video in window, S _xThe value of the capable l row of [k, l] expression Sobel horizontal edge operator k, S _yThe value of the capable l row of [k, l] expression Sobel vertical edge operator k;

(2) the gradient magnitude G of each pixel of calculation window image ₁[i, j] and gradient direction G ₂[i, j]:

G_{1} [i, j] = \sqrt{G_{x} {[i, j]}^{2} + G_{y} {[i, j]}^{2}},

Wherein, i=1,2 ..., W ₀, j=1,2 ..., H ₀, arctan () is an arctan function,

Be downward rounding operation symbol,

Expression is not more than

Maximum integer;

(3) utilize the gradient magnitude of each pixel of video in window and direction to carry out the gradient orientation histogram statistics, obtain the proper vector of video in window:

Video in window is divided into the identical connected region of size, and each connected region is made up of 8 * 8 pixels, is called a cell unit, with between square region of 2 * 2 cell unit compositions, have between interval and the interval 50% overlapping;

A. the gradient orientation histogram statistical value of each cell unit in the calculation window image:

The gradient direction span of each pixel is 1 to 9 in the video in window, and each cell unit is made up of 9 passages, utilizes the gradient magnitude and the gradient direction of each pixel of each cell unit; Each the cell unit gradient magnitude of each passage counterparty in scope added up, obtain the gradient orientation histogram statistical value of each cell unit, H [m] [p] representes the statistics with histogram value of p the passage in m cell unit in each video in window; M is a variable, is the cell element numerals, since 1 by from left to right; The top-down order adds 1 successively, and p is a variable, is the passage label; L representes the cell unit number of each video in window horizontal direction, and M representes each video in window cell unit number altogether, and L and M are constant; Only relevant with the size of video in window, get m=1, during p=1; H [1] [1] representes the 1st cell unit, the statistics with histogram value of the 1st passage, and computing formula is:

H [1] [1] = Σ_{i = 1}^{8} Σ_{j = 1}^{8} (G_{1} [i, j] \cdot w [i, j]),

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = 1 \\ 0, G_{2} [i, j] &NotEqual; 1 \end{matrix},

P adds 1 successively, up to p=9, obtains H [1] [2] from p=2 respectively, H [1] [3] ..., H [1] [9], computing formula is:

H [1] [p] = Σ_{i = 1}^{8} Σ_{j = 1}^{8} (G_{1} [i, j] \cdot w [i, j]), p = 2,3, . . ., 9,

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = p \\ 0, G_{2} [i, j] &NotEqual; p \end{matrix},

M adds 1 successively, from m=2 to m=M, whenever adds once, and to 9 statistics with histogram values should be arranged, the corresponding calculated formula is:

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = p \\ 0, G_{2} [i, j] &NotEqual; p \end{matrix},

Expression is not more than (m-1)/5;

B. with each interval each the interior cell unit gradient orientation histogram statistical value normalization of each video in window, extract the proper vector of video in window:

S [n] representes the normalized factor of n interval gradient orientation histogram statistical value in each video in window, and n is a variable, is interval label; Since 1 by from left to right, the top-down order adds 1 successively, comprises L-1 interval on each video in window horizontal direction; N representes each video in window interval number altogether; N is a constant, and is relevant with the video in window size, when getting n=1; The normalized factor S [1] of the 1st interval gradient orientation histogram statistical value of video in window be the 1st all passage gradient orientation histogram statistical values of interval all cells unit with

S [1] = Σ_{p = 1}^{9} (H [1] [p] + H [2] [p] + H [L + 1] [p] + H [L + 2] [p])

The 1st cell unit with the 1st interval in the video in window; Also be the 1st cell unit of video in window; The gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval; Totally 9 values are followed successively by

as the 1st to the 9th value of video in window proper vector

The 2nd cell unit with the 1st interval in the video in window; Also be the 2nd cell unit of video in window; The gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval; Totally 9 values are followed successively by

as the 10th to 18 value of video in window proper vector

The 3rd cell unit with the 1st interval in the video in window; Also be L+1 cell unit of video in window; The gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval; Totally 9 values are followed successively by

as the 19th to 27 value of video in window proper vector

The 4th cell unit with the 1st interval in the video in window; Also be L+2 cell unit of video in window; The gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval; Totally 9 values are followed successively by

as the 28th to 36 value of video in window proper vector

N adds 1 successively, up to n=N, calculates the normalized factor of all interval gradient orientation histogram statistical values from n=2,

Wherein,

expression be not more than

maximum integer;

With the gradient orientation histogram statistical value of n each interval each passage of cell unit in the video in window normalized factor divided by n interval; Further obtain other 36 * (N-1) individual values of video in window proper vector, the proper vector of each video in window is totally 36 * N dimension;

C. 36 * N dimensional feature vector of each video in window and the number of people disaggregated model that trains are in advance sent into SVMs software storehouse; Select for use mode classification and the LINEAR kernel function of ONE_CLASS to classify; Judge whether this video in window is number of people image, the words that are regard this video in window as number of people video in window;

And the like, repeating step (1) until travel through all video in windows, obtains all number of people video in windows in the present frame gray level image to step (3), and the label of number of people video in window is in proper order identical with traversal order, according to from left to right, the top-down principle.

The flow process of execution in step 4 is following:

(1) all number of people video in windows with n number of people video in window of present frame gray level image and previous frame gray level image carry out position and area matched:

The location matches parameter

T (m) = \sqrt{{(p_{n} - x_{m})}^{2} + {(q_{n} - y_{m})}^{2}}

Area matched parameter A (m)=| Q _n-S _m|

N, m is a variable, is respectively the label of number of people video in window in present frame and the previous frame gray level image, n number of people video in window of present frame center position is [p _n, q _n], area is Q _n, m number of people video in window of previous frame center position is [x _m, y _m], area is S _m, n=1,2 ..., N; M=1,2 ..., M, N represent the number of people video in window number in the present frame gray level image; M representes the number of people video in window number in the previous frame gray level image, gets n=1, and whole values of traversal m are calculated the m value that makes that location matches parameter T (m) is minimum, are designated as

The optimum matching of first number of people video in window of expression present frame is previous frame J ₁Individual number of people video in window is if T is (J ₁)≤th ₁And A (J ₁)≤th ₂, th ₁=15, th ₂=100, represent that then it is J that first number of people video in window of present frame can find the number of people video in window of coupling in the previous frame gray level image ₁Individual, if T is (J ₁)＞th ₁Or A (J ₁)＞th ₂, with J ₁Put 0, i.e. J ₁=0, represent that then any number of people video in window of first number of people video in window of present frame and previous frame does not all match, be emerging number of people video in window;

(2) n adds 1 successively, and up to n=N, repeating step (1) is found out the number of people video in window J of all couplings from n=2 ₂, J ₃..., J _NIf, J ₁, J ₂..., J _NDo not comprise certain value K, show that then K number of people video in window of previous frame lose from 1 to M;

(3) n number of people video in window of present frame gray level image carried out position and area matched with above second frame to all number of people video in windows of above H frame gray level image respectively; Repeat above-mentioned steps (1); (2); If the situation that number of people video in window is lost does not take place, then n number of people video in window of present frame gray level image is the analysis window image, and H is the arbitrary integer between 5 to 10.

The flow process of execution in step 5 is following:

On the analysis window image, use cascade adaboost method respectively, utilize the people's face disaggregated model that trains to carry out people's face and detect the number of people video in window that obtains detecting the number of people video in window of people's face and do not detect people's face based on the haar characteristic.

The flow process of execution in step 6 is following:

(1) detecting on the number of people video in window of people's face, number of people video in window done six five equilibriums in vertical direction divide, on the horizontal direction with a left side The zone, the right side

Analysis is not done in the zone, and B is wherein got in the zone in the middle of only analyzing ₁And B ₂The zone is as analyzed area image, B ₁Be the 2nd zone of counting from top to bottom, B ₂Be the 5th zone of counting from top to bottom, B ₁[i, j], B ₂[i, j] representes area image B respectively ₁, B ₂In level i, vertical j gray values of pixel points, calculating B ₁, B ₂Level i is individual in the area image, the difference value D of vertical j pixel ₁[i, j]:

D ₁[i，j]＝|B ₁[i，j]-B ₂[i，j]|

All pixels on [i, j] traversal area image, i=1,2 ..., W ₁, j=1,2 ..., H ₁, W ₁Expression area image B ₁, B ₂Width, H ₁Expression area image B ₁, B ₂Height, the statistics D ₁[i, j]＞th ₁The pixel number, be designated as C ₁, when The time, then adjudicating has masked man's face to exist in the gray level image, otherwise is normal person's face, th ₁=18;

(2) do not detecting on the number of people video in window of people's face, dividing two area B in the middle of getting doing the quartern on the number of people video in window horizontal direction ₃, B ₄As analyzed area, B ₃[i, j], B ₄[i, j] representes area B respectively ₃, B ₄Level i vertical j gray values of pixel points, the average of gray-scale value is respectively E ₁, E ₂:

E_{1} = Σ_{i = 1}^{W_{2}} Σ_{j = 1}^{H_{2}} B_{3} [i, j]

E_{2} = Σ_{i = 1}^{W_{2}} Σ_{j = 1}^{H_{2}} B_{4} [i, j]

W ₂Expression area image B ₃, B ₄Width, H ₂Expression area image B ₃, B ₄Height, two mean value of areas differences are Δ E=|E ₁-E ₂|, as Δ E＞th ₂, then showing has people from side face or the behavior of bowing, and is not masked man's face, th ₂=25;

(3) at the fooled Δ E≤th of the number of people video in window that does not detect people's face ₂The time, number of people video in window is done six five equilibriums in vertical direction divides, on the horizontal direction with a left side

The zone, the right side

Analysis is not done in the zone, and B is wherein got in the zone in the middle of only analyzing ₅And B ₂Regional as analyzed area, B ₅Be the 3rd zone of counting from top to bottom, B ₂Be the 5th zone of counting from top to bottom, B ₅[i, j], B ₂[i, j] representes area B respectively ₅, B ₂In level i, vertical j gray values of pixel points, calculating B ₅, B ₂Level i is individual in the zone, the difference value D of vertical j pixel ₂[i, j]:

D ₂[i，j]＝|B ₅[i，j]-B ₂[i，j]|

I=1,2 ..., W ₃, j=1,2 ..., H ₃, W ₃Expression area image B ₅, B ₂Width, H ₃The expression area B ₅, B ₂Height, the statistics D ₂[i, j]＞th ₃The pixel number, be designated as C ₂, when

The time, then adjudicating has masked man's face to exist in the gray level image, otherwise is normal person's face, th ₃=18.

When judgement has masked man's face,, upload the center of receiving a crime report with the location positioning and the mark picture frame of this number of people video in window in former video color image.

In step 3, the production process of number of people disaggregated model is following:

(1) collecting the positive negative sample of the number of people, is positive sample with the 5000 secondary gray scale pictures that comprise the head shoulder, and the 10000 secondary gray scale pictures that do not comprise the number of people are as negative sample, and sample size is consistent;

(2) the gradient orientation histogram statistical value of the positive negative sample of extraction; And with gradient orientation histogram normalization; As each value in the proper vector, method is identical with the method that the gray level image that converts to the on-the-spot video image that obtains of moving window method extraction monitoring extracts proper vector with the value after the normalization;

5000 the positive samples that (3) will extract and the proper vector of 10000 negative samples are input in the SVMs software storehouse, select for use the mode classification of ONE_CLASS and LINEAR kernel function to train, and obtain an optimum number of people disaggregated model.

In step 5, the production process of people's face disaggregated model is following:

(1) collect 5000 secondary people face gray scale pictures as positive sample, unified 20 * 20 the pixel size of zooming to, the gray scale picture of any size of collecting 10000 secondary unmanned faces is as negative sample;

(2) increase by 6 haar tagsort devices, the employed characteristic of each tagsort device defines with position in shape, the area-of-interest and scale-up factor, is followed successively by:

A. the tagsort device 1; Whole rectangular area is 5 * 3 rectangle; 5 pixels are arranged on the horizontal direction; 3 pixels are arranged on the vertical direction, and the black rectangle zone is the square area in the rectangle upper left corner 2 * 2, respond into whole rectangular area interior pixel and 4 demultiplication black removal rectangular area interior pixels and 15 times;

B. the tagsort device 2; Whole rectangular area is 5 * 3 rectangle; 5 pixels are arranged on the horizontal direction; 3 pixels are arranged on the vertical direction, and the black rectangle zone is the square area in the rectangle lower left corner 2 * 2, respond into whole rectangular area interior pixel and 4 demultiplication black removal rectangular area interior pixels and 15 times;

C. the tagsort device 3; Whole rectangular area is 5 * 3 rectangle; 5 pixels are arranged on the horizontal direction; 3 pixels are arranged on the vertical direction, and the black rectangle zone is the square area in the rectangle upper right corner 2 * 2, respond into whole rectangular area interior pixel and 4 demultiplication black removal rectangular area interior pixels and 15 times;

D. the tagsort device 4; Whole rectangular area is 5 * 3 rectangle; 5 pixels are arranged on the horizontal direction; 3 pixels are arranged on the vertical direction, and the black rectangle zone is the square area in the rectangle lower right corner 2 * 2, respond into whole rectangular area interior pixel and 4 demultiplication black removal rectangular area interior pixels and 15 times;

E. the tagsort device 5; Whole rectangular area is 7 * 1 rectangle; 7 pixels are arranged on the horizontal direction, 1 pixel is arranged on the vertical direction, the black rectangle zone is a rectangle level the 5th; The zones at 6 pixels place, respond into whole rectangular area interior pixel and 2 demultiplication black removal rectangular area interior pixels and 7 times;

F. the tagsort device 6; Whole rectangular area is 7 * 1 rectangle; 7 pixels are arranged on the horizontal direction, 1 pixel is arranged on the vertical direction, the black rectangle zone is a rectangle level the 2nd; The zones at 3 pixels place, respond into whole rectangular area interior pixel and 2 demultiplication black removal rectangular area interior pixels and 7 times;

(3) utilize the haartraining storehouse training training pattern among the opencv to obtain people's face disaggregated model.

The present invention utilizes image processing techniques and mode identification method that the human body that occurs in the video image is analyzed, and then the existence of judging people's face in the number of people whether, to be used for the judgement of masked detection.Research for masked identification in the video monitoring has very important meaning, can effectively prevent the lawbreaking activities of masked camouflage.

Description of drawings:

Shown in Figure 1 is the structured flowchart of the said system of the present invention's masked man's face detecting method provided by the invention.

Shown in Figure 2 is the process flow diagram of masked man's face detecting method provided by the invention.

Shown in Figure 3 is the interval synoptic diagram that concerns with the cell unit in the embodiment of the invention.

Shown in Figure 4 is the process flow diagram that the number of people is followed the tracks of in the embodiment of the invention.

Shown in Figure 5 is masked decision-making synoptic diagram 1 in the embodiment of the invention.

Shown in Figure 6 is masked decision-making synoptic diagram 2 in the embodiment of the invention.

Shown in Figure 7 is the synoptic diagram of newly-increased haar central feature 1 in the embodiment of the invention.

Shown in Figure 8 is the synoptic diagram of newly-increased haar central feature 2 in the embodiment of the invention.

Shown in Figure 9 is the synoptic diagram of newly-increased haar central feature 3 in the embodiment of the invention.

Shown in Figure 10 is the synoptic diagram of newly-increased haar central feature 4 in the embodiment of the invention.

Shown in Figure 11 is the synoptic diagram of newly-increased haar linear feature 1 in the embodiment of the invention.

Shown in Figure 12 is the synoptic diagram of newly-increased haar linear feature 2 in the embodiment of the invention.

Embodiment:

The invention provides the masked man's face detecting method in a kind of video monitoring, the system architecture diagram of this method is as shown in Figure 1, comprises video acquisition unit, masked detecting unit and alarm unit.

The major function of video acquisition unit is monitoring scene to be taken and obtained analog video image through the general-purpose simulation camera, converts DID into through general video frequency collection card then.Here setting height(from bottom) and the angle to camera is to have certain requirements; The installation of camera will make the head shoulder zone of human body all appear in the video pictures; The setting up of video camera will make can be known in the video pictures and show people's face positive information; With best, so require the closely positive shooting of camera over against people's face.

The major function of masked detecting unit is to be gray level image to the color digital image data-switching of sending into; Whether on gray level image, detect then has masked people's face to exist; In order to improve detection efficiency; Before detection, at first gray level image is dwindled, gray level image is reduced into the standard detection gray-scale map of 176 * 144 pixel size.If when having masked man's face to exist, the position of extracting this masked man's face, and in the corresponding position of original color image this masked people's face mark is come out.

If, then report to the police to there being masked man's face to exist in masked detection, the image uploading alarm unit of masked mark will be arranged.

The invention provides the masked man's face detecting method in a kind of video monitoring, this method is as shown in Figure 2, specifically comprises the steps:

One, converts coloured image into gray level image s1

Each step of masked man's face detecting method of mentioning among the present invention is all carried out on the gray level image basis, so at first will convert coloured image into gray level image.

Two, gray level image is carried out convergent-divergent s2

In order to improve detection efficiency; Gray level image has been carried out reduction operation, image is reduced into 176 * 144 pixel size, image zoom need be at treatment effeciency; Do a balance on result's smoothness and the sharpness; The present method comparative maturity of image zoom, not in research range of the present invention, each step that the back is mentioned all is to carry out on the basis of the gray level image after dwindling.

Three, detect s3 at the enterprising pedestrian's head of gray level image

Through the step 2 scaled images, adopt the moving window method, the moving window size is 40 * 40; The horizontal scanning step-length is 3, and the vertical scanning step-length is 2, and the edge from left to right; The top-down direction moves moving window by pixel, until complete image of scanning; The corresponding down window gray level image of each moving window is carried out the number of people respectively detect, hereinafter to be referred as video in window;

I representes the horizontal ordinate position of certain point in the video in window, and j representes the ordinate position of certain point in the video in window, the gray-scale value of the video in window that I [i, j] remarked pixel point [i, j] is located, and each pixel of [i, j] cycling among windows image, the video in window width is W ₀=40, highly be H ₀=40, when moving window is positioned at first video in window:

1. to each pixel compute gradient direction and size of video in window, be specially:

(1) horizontal gradient of each pixel of calculation window image and VG (vertical gradient)

At first, with the horizontal gradient G of each pixel of video in window _x[i, j] and VG (vertical gradient) G _y[i, j] all initially is changed to 0:

G _x[i，j]＝0，i＝1，2，...，W ₀，j＝1，2，...，H ₀ (1-a)

G _y[i，j]＝0，i＝1，2，...，W ₀，j＝1，2，...，H ₀ (1-b)

On video in window, as the computing template, from left to right, translation computing template arrives each pixel [i successively from top to bottom with boundary operator; J] locate, for preventing to cross the border, do not handle and go up most, down; The most left, the pictorial element on the rightest four limits, calculation template and pixel [i; J] weighting of field gray-scale value, each respective value of the corresponding computing of each weighted value wherein obtains the horizontal gradient G of each pixel _x[i, j] and VG (vertical gradient) G _y[i, j], boundary operator can be got the Roberts boundary operator, the Sobel boundary operator, Prewitt boundary operator and Kirsch boundary operator, the present invention is with the Sobel boundary operator

S_{x} = [\begin{matrix} - 1 & 0 & 1 \\ - 2 & 0 & 2 \\ - 1 & 0 & 1 \end{matrix}]

With

S_{y} = [\begin{matrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ - 1 & - 2 & - 1 \end{matrix}]

For example describes, horizontal gradient G _x[i, j] and VG (vertical gradient) G _yThe computing method of [i, j] are:

G_{x} [i, j] = Σ_{k = 1}^{3} Σ_{l = 1}^{3} I [i - 2 + k, j - 2 + l] S_{x} [k, l]

= I [i + 1, j - 1] + 2 \cdot I [i + 1, j] + I [i + 1, j + 1],

i＝2，3，...，W ₀-1，j＝2，3，...，H ₀-1(2-a)

- I [i - 1, j - 1] - 2 \cdot I [i - 1, j] - I [i - 1, j + 1]

G_{y} [i, j] = Σ_{k = 1}^{3} Σ_{l = 1}^{3} I [i - 2 + k, j - 2 + l] S_{y} [k, l]

= I [i - 1, j - 1] + 2 \cdot I [i, j - 1] + I [i + 1, j - 1],

i＝2，3，...，W ₀-1，j＝2，3，...，H ₀-1(2-b)

- I [i - 1, j + 1] - 2 \cdot I [i, j + 1] - I [i + 1, j + 1]

Wherein, each gray values of pixel points of I [i, j] expression video in window, S _xThe value of the capable l row of [k, l] expression Sobel horizontal edge operator k, S _yThe value of the capable l row of [k, l] expression Sobel vertical edge operator k;

(2) the gradient magnitude G at each pixel place of calculation window image ₁[i, j] and gradient direction G ₂[i, j]:

G_{1} [i, j] = \sqrt{G_{x} {[i, j]}^{2} + G_{y} {[i, j]}^{2}},

i＝1，2，...，W ₀，j＝1，2，...，H ₀ (3-a)

Wherein, arctan () is an arctan function,

Be downward rounding operation symbol, Expression is not more than

Maximum integer; P representes the passage number, can get the arbitrary integer between 2 to 180, in embodiments of the present invention, is that example describes with P=9, then gradient direction G ₂[i, j] can be expressed as:

2. utilize the gradient magnitude obtain to carry out histogram of gradients and add up with direction, with each value of the value after the normalization of histogram of gradients statistical value as proper vector:

At first video in window is divided into the identical connected region of size, each connected region is exactly a cell unit cell, adds up the gradient orientation histogram of each pixel in the cell unit then.In order to adapt to the influence of illumination variation and shade better, a plurality of cell are formed an interval block, the cell in each block is carried out the normalization of gradient orientation histogram statistical value.

The pixel number that every row of each cell comprises can be the arbitrary integer between 2 to 20; The pixel number that every row of each cell comprise can be the arbitrary integer between 2 to 20; The pixel number that the every row of each block comprises can be the arbitrary integer between 4 to 40, and the pixel number that every row of each block comprise can be the arbitrary integer between 4 to 40.With each cell 8 * 8 pixels are arranged in embodiments of the present invention, each block consists of example by 2 * 2=4 cell and describes.Be illustrated in figure 3 as the synoptic diagram of a block among this embodiment, the video in window for 40 * 40 has 25 cell, and 50% overlap arranged between block and the block, and then image block number altogether has 4 * 4=16.9 direction passages are arranged in each cell, and the characteristic of each block has 4 * 9=36 characteristic, always has 16 * 36=576 dimensional feature.Computation process is following:

(1) the gradient orientation histogram statistical value of each cell unit in the calculation window image

Each pixel in the video in window in each cell unit all is certain histogram passage ballot based on gradient direction, and the gradient magnitude of this pixel is as the ballot weights.Represent the statistics with histogram value of p the passage in m cell unit in each video in window with H [m] [p], m is a variable, is the cell element numerals, since 1 by from left to right; The top-down order adds 1 successively, and p is a variable, is the passage label, and L representes the cell unit number of each video in window horizontal direction; M representes each video in window cell unit number altogether, and L and M are constant, and be only relevant with the size of video in window; In embodiments of the present invention, L=5, M=25;

Get m=1, during p=1, H [1] [1] representes the 1st cell unit, the statistics with histogram value of the 1st passage, and computing formula is:

H [1] [1] = Σ_{i = 1}^{8} Σ_{j = 1}^{8} (G_{1} [i, j] \cdot w [i, j]) - - - (4 - a)

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = 1 \\ 0, G_{2} [i, j] &NotEqual; 1 \end{matrix},

H [1] [p] = Σ_{i = 1}^{8} Σ_{j = 1}^{8} (G_{1} [i, j] \cdot w [i, j]), p = 2,3, . . ., 9 - - - (4 - b)

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = p \\ 0, G_{2} [i, j] &NotEqual; p \end{matrix},

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = p \\ 0, G_{2} [i, j] &NotEqual; p \end{matrix},

Expression is not more than the maximum integer of (m-1)/5;

(2), extract the proper vector of video in window with each interval each the interior cell unit gradient orientation histogram statistical value normalization of each video in window.

Each interval each interior cell unit gradient orientation histogram statistical value of each video in window is added up, as normalized factor.S [n] representes the normalized factor of n interval gradient orientation histogram statistical value in each video in window, and n is a variable, is interval label; Since 1 by from left to right, the top-down order adds 1 successively, comprises L-1 interval on each video in window horizontal direction; N representes each video in window interval number altogether; N is a constant, and is relevant with the video in window size, in embodiments of the present invention; Each video in window interval number N=16 altogether has 4 intervals on each video in window horizontal direction;

When getting n=1n=1, the normalized factor S [1] of the 1st interval gradient orientation histogram statistical value of video in window be the 1st all passage gradient orientation histogram statistical values of interval all cells unit with,

S [1] = Σ_{p = 1}^{9} (H [1] [p] + H [2] [p] + H [6] [p] + H [7] [p]) - - - (5 - a)

as the 1st to the 9th value of video in window proper vector

as the 10th to 18 value of video in window proper vector

The 3rd cell unit with the 1st interval in the video in window; Also be the 6th cell unit of video in window; The gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval; Totally 9 values are followed successively by

as the 19th to 27 value of video in window proper vector

The 4th cell unit with the 1st interval in the video in window; Also be the 7th cell unit of video in window; The gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval; Totally 9 values are followed successively by

as the 28th to 36 value of video in window proper vector

N adds 1 successively, up to n=N, calculates the normalized factor of all interval gradient orientation histogram statistical values from n=2, and computing method are:

Wherein,

expression be not more than

maximum integer;

The gradient orientation histogram statistical value of each each passage of cell unit that n in the video in window is interval is divided by the interval normalized factor of nn; Further obtain other 15 * 36 values of video in window proper vector, the proper vector of each video in window is totally 16 * 36=576 dimension;

576 dimensional feature vectors of each video in window and the number of people disaggregated model that trains are in advance sent into SVMs software storehouse; Select for use mode classification and the LINEAR kernel function of ONE_CLASS to classify; Judge whether this video in window is number of people image, the words that are regard this video in window as number of people video in window;

And the like; Until having scanned all video in windows, obtain all number of people video in windows in the present image, obtain all number of people video in windows in the present frame gray level image; The label order of number of people video in window is identical with traversal order; According to from left to right, the top-down principle describes in detail in the 7th part for headform's training.

Four, each number of people is mated s4 in interframe

Masked man's face occurs in video, can not flash and mistake.Therefore, on the basis of step 3, the number of people that detects is followed the tracks of, to get rid of the interference that instantaneous object appearing causes, concrete implementation procedure is as shown in Figure 4.

Suppose to detect in the previous frame gray level image M number of people video in window, the present frame gray level image detects N number of people video in window, and n, m are variable; Be respectively the label of number of people video in window in present frame and the previous frame gray level image, n=1,2 ...; N, m=1,2; ..., M, m number of people video in window of previous frame center position is [x _m, y _m], area is S _m, n number of people video in window of present frame center position is [p _n, q _n], area is Q _n

(1), carries out position and area matched with all number of people video in windows of previous frame gray level image to n number of people video in window of present frame gray level image.The center position difference T (m) and the difference in areas A (m) of n number of people video in window of present frame and m number of people video in window of previous frame are respectively:

T (m) = \sqrt{{(p_{n} - x_{m})}^{2} + {(q_{n} - y_{m})}^{2}} - - - (6 - a)

A(m)＝|Q _n-S _m| (6-b)

Get n=1, whole values of traversal m are calculated the m value that makes that location matches parameter T (m) is minimum, are designated as

The optimum matching of first number of people video in window of expression present frame is previous frame J ₁Individual number of people video in window is if T is (J ₁)≤th ₁And A (J ₁)≤th ₂, th ₁Be the threshold value of number of people video in window center interframe variation, th in embodiments of the present invention ₁=15, th ₂Be number of people video in window area interframe change threshold, th in embodiments of the present invention ₂=100, represent that then it is J that first number of people video in window of present frame can find the number of people video in window of coupling in the previous frame gray level image ₁Individual, if T is (J ₁)＞th ₁Or A (J ₁)＞th ₂, with J ₁Put 0, i.e. J ₁=0, represent that then any number of people video in window of first number of people video in window of present frame and previous frame does not all match, be emerging number of people video in window;

(3) n number of people video in window of present frame gray level image carried out position and area matched with above second frame to all number of people video in windows of above H frame gray level image respectively; Repeat above-mentioned steps (1), the situation that number of people video in window is lost if do not take place in (2); Then n number of people video in window of present frame gray level image is the analysis window image; H is the arbitrary integer between 5 to 10, in embodiments of the present invention, and H=8.

Five, people's face detects s5

On the basis of step 4; When continuous number of people video in window exists; On the analysis window image, use cascade adaboost method respectively, utilize the people's face disaggregated model that trains to carry out people's face and detect the number of people video in window that obtains detecting the number of people video in window of people's face and do not detect people's face based on haar; In embodiments of the present invention, the method that detects at the enterprising pedestrian's face of each analysis window image is identical.

Six, masked judgement s6

The flow process of on the basis that the number of people detects and people's face detects, carrying out masked decision-making is following:

1. detecting on the number of people video in window of people's face, according to method shown in Figure 5, number of people video in window is done six five equilibriums in vertical direction divides, on the horizontal direction with a left side

The zone, the right side

Analysis is not done in the zone, and B is wherein got in the zone in the middle of only analyzing ₁And B ₂Regional as analyzed area, B ₁Be the 2nd zone of counting from top to bottom, B ₁Be the 5th zone of counting from top to bottom, area B ₁And area B ₂Wide identical, for number of people video in window width

Area B ₁And area B ₂High identical, for number of people video in window height

B ₁[i, j], B ₂[i, j] representes area B respectively ₁, B ₂In level i, vertical j gray values of pixel points, calculating B ₁, B ₂Level i is individual in the zone, the difference value D of vertical j pixel ₁[i, j], from left to right, scanning successively from top to bottom, B in will scheming ₁Zone and B ₂The gray-scale value of corresponding position, zone is done difference, D ₁[i, j] representes B ₁Zone and B ₂Zone level i, the difference value at vertical j pixel place, the computing method of difference value are:

D ₁[i，j]＝|B ₁[i，j]-B ₂[i，j]| (7)

[i, j] travels through area B ₁And area B ₂All pixels, i=1,2 ..., W ₁, j=1,2 ..., H ₁, W ₁The expression area B ₁, B ₂Width, H ₁The expression area B ₁, B ₂Height, the statistics D ₁[i, j]＞th ₁The pixel number, be designated as C ₁, when

The time, then adjudicating has masked man's face to exist in the gray level image, otherwise is normal person's face, in embodiments of the present invention, and th ₁=18;

2. not detecting on the number of people video in window of people's face,, divide two area B in the middle of getting with doing the quartern on the number of people video in window horizontal direction according to method shown in Figure 6 ₃, B ₄As analyzed area, area B ₃, B ₄Wide identical, for number of people video in window width

Area B ₃, B ₄High identical, identical with number of people video in window height, B ₃[i, j], B ₃[i, j] representes area B respectively ₃, B ₄Level i vertical j gray values of pixel points.From left to right, scanning successively from top to bottom, according to each gray values of pixel points, zoning B ₃, B ₄Average be respectively E ₁, E ₂, computing method are:

E_{1} = Σ_{i = 1}^{W_{2}} Σ_{j = 1}^{H_{2}} B_{3} [i, j] - - - (8 - a)

E_{2} = Σ_{i = 1}^{W_{2}} Σ_{j = 1}^{H_{2}} B_{4} [i, j] - - - (8 - b)

W ₂The expression area B ₃, B ₄Width, H ₂The expression area B ₃, B ₄Height, the discrepancy delta E=|E of two regional averages ₁-E ₂|, as Δ E＞th ₂, then showing has people from side face or the behavior of bowing, and or not masked man's face, otherwise get into next step, in embodiments of the present invention, th ₂=25;

3. do not detecting on the number of people video in window of people's face, as Δ E≤th ₂The time, according to method shown in Figure 5, number of people video in window is done six five equilibriums in vertical direction divides, on the horizontal direction with a left side The zone, the right side

Analysis is not done in the zone, and B is wherein got in the zone in the middle of only analyzing ₅And B ₂Regional as analyzed area, B ₅Be the 3rd zone of counting from top to bottom, B ₂Be the 5th zone of counting from top to bottom, area B ₅And area B ₂Wide identical, for number of people video in window width Area B ₅And area B ₂High identical, for number of people video in window height Use B ₅[i, j], B ₂[i, j] representes area B respectively ₅, B ₂In level i, vertical j gray values of pixel points from left to right, scans B in will scheming from top to bottom successively ₅Zone and B ₂The gray-scale value of corresponding position, zone is done difference, D ₂[i, j] representes B ₅Zone and B ₂Zone level i, the difference value at vertical j pixel place, computing method are:

D ₂[i，j]＝|B ₅[i，j]-B ₂[i，j]| (9)

[i, j] travels through area B ₅, B ₂All pixels, i=1,2 ..., W ₃, j=1,2 ..., H ₃, W ₃The expression area B ₅, B ₂Width, H ₃The expression area B ₅, B ₂Height, the statistics D ₂[i, j]＞th ₃The pixel number, be designated as C ₂, when

The time, then adjudicating has masked man's face to exist in the gray level image, otherwise is normal person's face, in embodiments of the present invention, and th ₃=18.

When on judging number of people video in window, having masked people's face to exist, location positioning and the mark of this number of people video in window in original color image come out, upload the center of receiving a crime report.

Seven, the headform trains s7

For step 3 in the process that the enterprising pedestrian's head of gray level image detects; Used the number of people disaggregated model that trains in advance; The s7 unit that training process is as shown in Figure 2; Comprise and collect the positive negative sample of the number of people, extract gradient orientation histogram hog characteristic and use SVMs software storehouse svmlight training of human head model.

1. collect the positive negative sample of the number of people

Collection comprises 5000 width of cloth gray scale pictures of head shoulder as positive sample, and the 10000 secondary gray scale pictures that collection does not comprise the number of people are as negative sample, and the sample size unification is scaled 40 * 40 sizes.

2. extract the hog proper vector

Extract the gradient orientation histogram statistical value of positive negative sample; And with gradient orientation histogram normalization; As each value in the proper vector, it is identical on video in window, to extract the method for proper vector with the moving window method in method and the step 3 with the value after the normalization.

3. use SVMs software storehouse svmlight training of human head model

5000 the positive samples that extract and the proper vector of 10000 negative samples are input in the SVMs software storehouse, select for use the mode classification of ONE_CLASS and LINEAR kernel function to train, obtain an optimum number of people disaggregated model.

Eight, the faceform trains s8

Detect at the enterprising pedestrian's face of the number of people image that extracts for step 5, used the faceform who trains in advance in the testing process, the s8 unit that training process is as shown in Figure 2 comprises and collects the positive negative sample of people's face, increases haar characteristic and training faceform.

1. collect the positive negative sample of people's face

Collect 5000 secondary people face gray scale pictures, unification zooms to 20 * 20 pixel size, as positive sample.Collect the big or small arbitrarily gray scale picture of 10000 secondary unmanned faces as negative sample.

2. increase by 6 haar tagsort devices

For offside dough figurine face better detects, in embodiments of the present invention, increased by 6 haar tagsort devices, the employed characteristic of each tagsort device defines with position in shape, the area-of-interest and scale-up factor, is followed successively by:

(1) the tagsort device 1; As shown in Figure 7; Whole rectangular area is 5 * 3 rectangle, and 5 pixels are arranged on the horizontal direction, and 3 pixels are arranged on the vertical direction; The black rectangle zone is the square area in the rectangle upper left corner 2 * 2, respond into whole rectangular area interior pixel and 4 demultiplication black removal rectangular area interior pixels and 15 times;

(2) the tagsort device 2; As shown in Figure 8; Whole rectangular area is 5 * 3 rectangle, and 5 pixels are arranged on the horizontal direction, and 3 pixels are arranged on the vertical direction; The black rectangle zone is the square area in the rectangle lower left corner 2 * 2, respond into whole rectangular area interior pixel and 4 demultiplication black removal rectangular area interior pixels and 15 times;

(3) the tagsort device 3; As shown in Figure 9; Whole rectangular area is 5 * 3 rectangle, and 5 pixels are arranged on the horizontal direction, and 3 pixels are arranged on the vertical direction; The black rectangle zone is the square area in the rectangle upper right corner 2 * 2, respond into whole rectangular area interior pixel and 4 demultiplication black removal rectangular area interior pixels and 15 times;

(4) the tagsort device 4; Shown in figure 10; Whole rectangular area is 5 * 3 rectangle, and 5 pixels are arranged on the horizontal direction, and 3 pixels are arranged on the vertical direction; The black rectangle zone is the square area in the rectangle lower right corner 2 * 2, respond into whole rectangular area interior pixel and 4 demultiplication black removal rectangular area interior pixels and 15 times;

(5) the tagsort device 5; Shown in figure 11, whole rectangular area is 7 * 1 rectangle, and 7 pixels are arranged on the horizontal direction; 1 pixel is arranged on the vertical direction; The black rectangle zone is the zone at the 5th, 6 pixel of rectangle level place, respond into whole rectangular area interior pixel and 2 demultiplication black removal rectangular area interior pixels and 7 times;

(6) the tagsort device 6; Shown in figure 12, whole rectangular area is 7 * 1 rectangle, and 7 pixels are arranged on the horizontal direction; 1 pixel is arranged on the vertical direction; The black rectangle zone is the zone at the 2nd, 3 pixel of rectangle level place, respond into whole rectangular area interior pixel and 2 demultiplication black removal rectangular area interior pixels and 7 times;

3. train the faceform

Utilize the haartraining storehouse training faceform of comparative maturity among the opencv.

Claims

1. the detection method of masked man's face in the video monitoring comprises the steps:

(1) will monitor the on-the-spot color video frequency image that obtains and convert gray level image into;

(2) gray level image is carried out convergent-divergent;

(3) detect at the enterprising pedestrian's head of gray level image, when having detected the number of people, get into next step, the no number of people then circulation step (1) arrives (3);

(4) each number of people is mated in interframe;

(5) carrying out people's face detects;

(6) carry out masked judgement, the original color video image that rules out masked man's face is carried out mark, and report to the police.

2. method according to claim 1; It is characterized in that adopting in the step (3) the moving window method, along from left to right, the top-down direction; Move moving window by pixel; Gray level image is divided into the video in window of corresponding each moving window, video in window is carried out the number of people detect, when moving window is positioned at first video in window:

(1) the horizontal gradient G of each pixel of calculation window image _x[i, j] and VG (vertical gradient) G _y[i, j];

A.G _x[i, j] and G _yThe value initialization of [i, j] each pixel is 0, all pixels on [i, j] cycling among windows image, and i is a variable, the horizontal level of pixel in the expression video in window, value is i=1,2 ..., W ₀, j is a variable, the upright position of pixel in the expression video in window, and value is j=1,2 ..., H ₀, W ₀, H ₀Be respectively the width and the height of video in window;

B. on video in window; With Sobel horizontal edge operator as the computing template; Translation computing template center is to each pixel place, and with each pixel and corresponding the multiplying each other of each element of computing template in the image-region under the covering of computing template, all sum of products are as the horizontal gradient G of each pixel _x[i, j] as the computing template, obtains the VG (vertical gradient) G of each pixel with Sobel vertical edge operator _y[i, j]:

G_{x} [i, j] = Σ_{k = 1}^{3} Σ_{l = 1}^{3} I [i - 2 + k, j - 2 + l] S_{x} [k, l],

G_{y} [i, j] = Σ_{k = 1}^{3} Σ_{l = 1}^{3} I [i - 2 + k, j - 2 + l] S_{y} [k, l],

G_{1} [i, j] = \sqrt{G_{x} {[i, j]}^{2} + G_{y} {[i, j]}^{2}},

Wherein, i=1,2 ..., W ₀, j=1,2 ..., H ₀, arctan () is an arctan function, Be downward rounding operation symbol,

Expression is not more than Maximum integer;

The gradient direction span of each each pixel of cell unit is 1 to 9 in the video in window, and each cell unit is made up of 9 passages, utilizes the gradient magnitude and the gradient direction of each pixel of each cell unit; Each the cell unit gradient magnitude of each passage counterparty in scope added up, obtain the gradient orientation histogram statistical value of each cell unit, H [m] [p] representes the statistics with histogram value of p the passage in m cell unit in each video in window; M is a variable, is the cell element numerals, since 1 by from left to right; The top-down order adds 1 successively, and p is a variable, is the passage label; L representes the cell unit number of each video in window horizontal direction, and M representes each video in window cell unit number altogether, and L and M are constant; Only relevant with the size of video in window, get m=1, during p=1; H [1] [1] representes the 1st cell unit, the statistics with histogram value of the 1st passage, and computing formula is:

H [1] [1] = Σ_{i = 1}^{8} Σ_{j = 1}^{8} (G_{1} [i, j] \cdot w [i, j]),

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = 1 \\ 0, G_{2} [i, j] &NotEqual; 1 \end{matrix},

H [1] [p] = Σ_{i = 1}^{8} Σ_{j = 1}^{8} (G_{1} [i, j] \cdot w [i, j]), p = 2,3, . . ., 9,

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = p \\ 0, G_{2} [i, j] &NotEqual; p \end{matrix},

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = p \\ 0, G_{2} [i, j] &NotEqual; p \end{matrix},

Expression is not more than (m-1)/5;

B. with each interval each the interior cell unit gradient orientation histogram statistical value normalization of each video in window, extract the proper vector of video in window, S [n] representes the normalized factor of n interval gradient orientation histogram statistical value in each video in window; N is a variable, is interval label, since 1 by from left to right; The top-down order adds 1 successively; Comprise L-1 interval on each video in window horizontal direction, N representes each video in window interval number altogether, and N is a constant; Relevant with the video in window size; When getting n=1, the normalized factor S [1] of the 1st interval gradient orientation histogram statistical value of video in window be the 1st all passage gradient orientation histogram statistical values of interval all cells unit with

S [1] = Σ_{p = 1}^{9} (H [1] [p] + H [2] [p] + H [L + 1] [p] + H [L + 2] [p])

as the 1st to the 9th value of video in window proper vector

The 2nd cell unit with the 1st interval in the video in window; Also be the 2nd cell unit of video in window; The gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval; Totally 9 values are followed successively by as the 10th to 18 value of video in window proper vector

as the 19th to 27 value of video in window proper vector

as the 28th to 36 value of video in window proper vector

Wherein,

expression be not more than

maximum integer;

3. according to claim 1 and 2 described methods, it is characterized in that each number of people is following in the flow process of interframe coupling:

The location matches parameter

T (m) = \sqrt{{(p_{n} - x_{m})}^{2} + {(q_{n} - y_{m})}^{2}}

Area matched parameter A (m)=| Q _n-S _m|

N, m is a variable, is respectively the label of number of people video in window in present frame and the previous frame gray level image, n number of people video in window of present frame center position is [p _n, q _n], area is Q _n, m number of people video in window of previous frame center position is [x _m, y _m], area is S _n, n=1,2 ..., N; M=1,2 ..., M, N represent the number of people video in window number in the present frame gray level image; M representes the number of people video in window number in the previous frame gray level image, gets n=1, and whole values of traversal m are calculated the m value that makes that location matches parameter T (m) is minimum, are designated as

4. method according to claim 3, it is following to it is characterized in that carrying out the flow process that people's face detects:

5. method according to claim 4 is characterized in that detecting on the number of people video in window of people's face, number of people video in window is done six five equilibriums in vertical direction divides, on the horizontal direction with a left side

The zone, the right side Analysis is not done in the zone, and B is wherein got in the zone in the middle of only analyzing ₁And B ₂The zone is as analyzed area image, B ₁Be the 2nd zone of counting from top to bottom, B ₂Be the 5th zone of counting from top to bottom, B ₁[i, j], B ₂[i, j] representes area image B respectively ₁, B ₂In level i, vertical j gray values of pixel points, calculating B ₁, B ₂Level i is individual in the area image, the difference value D of vertical j pixel ₁[i, j]:

D ₁[i，j]＝|B ₁[i，j]-B ₂[i，j]|

All pixels on [i, j] traversal area image, i=1,2 ..., W ₁, j=1,2 ..., H ₁, W ₁Expression area image B ₁, B ₂Width, H ₁Expression area image B ₁, B ₂Height, the statistics D ₁[i, j]＞th ₁The pixel number, be designated as C ₁, when

The time, then adjudicating has masked man's face to exist in the gray level image, otherwise is normal person's face, th ₁=18.

6. method according to claim 4 is characterized in that, is not detecting on the number of people video in window of people's face, divides two area B in the middle of getting with doing the quartern on the number of people video in window horizontal direction ₃, B ₄As analyzed area, B ₃[i, j], B ₄[i, j] representes area B respectively ₃, B ₄Level i vertical j gray values of pixel points, the average of gray-scale value is respectively E ₁, E ₂:

E_{1} = Σ_{i = 1}^{W_{2}} Σ_{j = 1}^{H_{2}} B_{3} [i, j]

E_{2} = Σ_{i = 1}^{W_{2}} Σ_{j = 1}^{H_{2}} B_{4} [i, j]

W ₂Expression area image B ₃, B ₄Width, H ₂Expression area image B ₃, B ₄Height, two mean value of areas differences are Δ E=|E ₁-E ₂|, as Δ E＞th ₂, then showing has people from side face or the behavior of bowing, and is not masked man's face, th ₂=25.

7. according to claim 4 and 6 described methods, it is characterized in that, at the fooled Δ E≤th of the number of people video in window that does not detect people's face ₂The time, number of people video in window is done six five equilibriums in vertical direction divides, on the horizontal direction with a left side The zone, the right side

D ₂[i，j]＝|B ₅[i，j]-B ₂[i，j]|

8. method according to claim 2 is characterized in that the production process of the number of people disaggregated model described in carrying out the number of people detects is following:

9. method according to claim 4 is characterized in that the production process of the people's face disaggregated model described in carrying out people's face detects is following: