CN102622584B

CN102622584B - Method for detecting mask faces in video monitor

Info

Publication number: CN102622584B
Application number: CN201210052716.7A
Authority: CN
Inventors: 师改梅; 胡入幻; 白云; 杨云; 缪泽; 补建; 罗安; 周聪俊
Original assignee: CHENGDU SANTAI ELECTRONIC INDUSTRY Co Ltd
Current assignee: Chengdu Santai Intelligent Technology Co ltd
Priority date: 2012-03-02
Filing date: 2012-03-02
Publication date: 2014-03-12
Anticipated expiration: 2032-03-02
Also published as: CN102622584A

Abstract

The invention relates to a method for detecting mask faces in a video monitor, which aims to solve the problem that illegal actions of masks cannot be prevented by adopting a traditional video monitoring system. The method for detecting the mask faces in the video monitor comprises the following steps of: a-, converting a color video image acquired from a monitoring site into a gray level image; b-, performing scaling on the gray level image; c-, performing head detection on the gray level image; d-, performing interframe matching on every head when the heads appear; e-, performing face detection on the heads; and f-, performing a mask judgment, marking an original color video image judged to have the mask faces, and giving an alarm.

Description

The detection method of masked man's face in video monitoring

Technical field:

The present invention relates to image and process and area of pattern recognition, the method that particularly in video monitoring, people's face detects.

Background technology:

It is that people's face is detected from video image background that people's face in video monitoring detects, and owing to being subject to the impact of image background, brightness variation and other each side factors, people's face is detected as for a complicated research topic.At present, the training of the cascade adaboost method based on haar feature recognition methods is considered to the most ripe, the most effective method for detecting human face.And masked man's face as wear dark glasses or wear masks, is a kind of special people's face, the special people's face of this class detects relatively traditional people's face, and to detect feature different, has similar process, but entirely not identical.The comparatively ideal situation of masked detection is exactly complete facial image on the number of people, not detected, and the people's face for this reason detecting on basis at the number of people detects the main thought that becomes research.

In patent 201010122033.5, mentioned a kind of human body head detection method, the method is utilized the contour feature of human body crown portion, distinguishes a plurality of human body targets, the method is used for camera and overlooks at a distance shooting, the more complete occasion of ratio that number of people top is embodied.And in the certain applications field of video monitoring, as ATM monitoring, due to the singularity of the installation site of image capture device, mostly the image of the surveyed area obtaining, be human body front face information, can utilize face detail feature to detect.Min Li, the shape facility forming according to the number of people and shoulder in the Estimating the Number of People in Crowded Scenes by MID Based Foreground Segmentation and Head-shoulder Detection of Zhaoxiang Zhang etc., people's head inspecting method based on gradient orientation histogram hog has been proposed, the method can be carried out number of people detection more exactly, but the face detail for people's face is not analyzed, can not prevent the lawbreaking activities of masked camouflage.

Summary of the invention:

The object of this invention is to provide a kind ofly can judge in video image whether have masked man's face, prevent the malfeasant masked man's face detecting method of masked camouflage.

The present invention includes following steps:

1. color video frequency image monitoring site being obtained is converted to gray level image;

2. gray level image is carried out to convergent-divergent;

3. at the enterprising pedestrian's head of gray level image, detect, while the number of people having been detected, enter step below, without number of people circulation step 1 to 3;

4. each number of people is mated in interframe;

5. carry out the detection of people's face;

6. carry out masked judgement, to ruling out the original color video image of masked man's face, carry out mark, and report to the police.

In step 3, adopt moving window method, edge is from left to right, top-down direction, moves moving window by pixel, gray level image is divided into the video in window of corresponding each moving window, video in window is carried out to number of people detection, when moving window is positioned at first video in window:

(1) the horizontal gradient G of each pixel of calculation window image _x[i, j] and VG (vertical gradient) G _y[i, j]:

A.G _x[i, j] and G _y[i, j] initialization:

G _x[i, j] and G _yin [i, j], the value initialization of each pixel is 0, all pixels on [i, j] cycling among windows image, and i is variable, represents the horizontal level of pixel in video in window, value is i=1,2 ..., W ₀, j is variable, represents the upright position of pixel in video in window, value is j=1, and 2 ..., H ₀, W ₀, H ₀be respectively width and the height of video in window;

B. on video in window, calculate the horizontal gradient G of each pixel _x[i, j] and VG (vertical gradient) G _y[i, j]:

Using Sobel horizontal edge operator as computing template, translation computing template center is to each pixel place, each pixel and corresponding the multiplying each other of each element of computing template in image-region under computing template is covered, all sum of products are as the horizontal gradient G of each pixel _x[i, j], using Sobel vertical edge operator as computing template, obtains the VG (vertical gradient) G of each pixel _y[i, j], for preventing crossing the border, does not process and goes up most, under, the most left, the pictorial element on the rightest four limits, also work as j=1, i=1,2 ... W ₀or j=H ₀, i=1,2 ..., W ₀or i=1, j=1,2 ... H ₀or i=W ₀, j=1,2 ... H ₀time, G _x[i, j] and G _y[i, j] is initial value 0, like this, works as i=2, and 3 ..., W ₀-1, j=2,3 ..., H ₀-1 o'clock,

G_{x} [i, j] = Σ_{k = 1}^{3} Σ_{l = 1}^{3} I [i - 2 + k, j - 2 + l] S_{x} [k, l],

G_{y} [i, j] = Σ_{k = 1}^{3} Σ_{l = 1}^{3} I [i - 2 + k, j - 2 + l] S_{y} [k, l],

Wherein, I[i, j] represent the gray-scale value of each pixel of video in window, S _x[k, l] represents the value of the capable l row of Sobel horizontal edge operator k, S _y[k, l] represents the value of the capable l row of Sobel vertical edge operator k;

(2) the gradient magnitude G of each pixel of calculation window image ₁[i, j] and gradient direction G ₂[i, j]:

G_{1} [i, j] = \sqrt{G_{x} {[i, j]}^{2} + G_{y} {[i, j]}^{2}},

Wherein, i=1,2 ..., W ₀, j=1,2 ..., H ₀, arctan () is arctan function,

for downward rounding operation symbol, expression is not more than

\frac{9}{π} \times (\arctan (\frac{G_{y} [i, j]}{G_{x} [i, j]}) + \frac{π}{2})

Maximum integer;

(3) utilize the gradient magnitude of each pixel of video in window and direction to carry out gradient orientation histogram statistics, obtain the proper vector of video in window:

Video in window is divided into the identical connected region of size, and each connected region is comprised of 8 * 8 pixels, is called a cell unit, by between a square region of composition, 2 * 2 cell unit, between interval and interval, have 50% overlapping;

A. the gradient orientation histogram statistical value of each cell unit in calculation window image:

In video in window, the gradient direction span of each pixel is 1 to 9, each cell unit is comprised of 9 passages, utilize gradient magnitude and the gradient direction of each pixel of each cell unit, gradient magnitude within the scope of each passage correspondence direction of each cell unit is added up, obtain the gradient orientation histogram statistical value of each cell unit, H[m] [p] represent the statistics with histogram value of p the passage in m cell unit in each video in window, m is variable, it is cell element numerals, since 1 by from left to right, top-down order adds 1 successively, p is variable, it is passage label, L represents the cell unit number of each video in window horizontal direction, M represents each video in window cell unit number altogether, L and M are constant, only relevant with the size of video in window, get m=1, during p=1, H[1] [1] represent the 1st cell unit, the statistics with histogram value of the 1st passage, computing formula is:

H [1] [1] = Σ_{i = 1}^{8} Σ_{j = 1}^{8} (G_{1} [i, j] \cdot w [i, j]),

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = 1 \\ 0, G_{2} [i, j] &NotEqual; 1 \end{matrix},

P adds 1 successively, from p=2 until p=9 obtains respectively H[1] [2], H[1] [3] ..., H[1] [9], computing formula is:

H [1] [p] = Σ_{i = 1}^{8} Σ_{j = 1}^{8} (G_{1} [i, j] \cdot w [i, j]),

p＝2，3，...，9，

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = p \\ 0, G_{2} [i, j] &NotEqual; p \end{matrix},

M adds 1 successively, from m=2 to m=M, often adds once, and to there being 9 statistics with histogram values, corresponding computing formula is:

p＝1，2，...，9

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = p \\ 0, G_{2} [i, j] &NotEqual; p \end{matrix},

represent not wooden in (m-1)/5;

B. by each cell unit gradient orientation histogram statistical value normalization in each interval of each video in window, extract the proper vector of video in window:

S[n] represent the normalized factor of n in each video in window interval gradient orientation histogram statistical value, n is variable, it is interval label, since 1 by from left to right, top-down order adds 1 successively, in each video in window horizontal direction, comprise L-1 interval, N represents each video in window interval number altogether, N is constant, relevant with video in window size, while getting n=1, the normalized factor S[1 of the 1st interval gradient orientation histogram statistical value of video in window] be the 1st interval all cells unit all passage gradient orientation histograms statistical value and

S [1] = Σ_{p = 1}^{9} (H [1] [p] + H [2] [p] + H [L + 1] [p] + H [L + 2] [p])

By the 1st the cell unit in the 1st interval in video in window, also be the 1st cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 1st to the 9th value of video in window proper vector, are followed successively by

By the 2nd the cell unit in the 1st interval in video in window, also be the 2nd cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 10th to 18 values of video in window proper vector, are followed successively by

By the 3rd the cell unit in the 1st interval in video in window, also be L+1 cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 19th to 27 values of video in window proper vector, are followed successively by

By the 4th the cell unit in the 1st interval in video in window, also be L+2 cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 28th to 36 values of video in window proper vector, are followed successively by

N adds 1 successively, from n=2 until n=N, calculates the normalized factor of the gradient orientation histogram statistical value in all intervals,

Wherein, expression is not more than

maximum integer;

Normalized factor by the gradient orientation histogram statistical value of n each interval each passage of cell unit in video in window divided by n interval, other 36 * (N-1) individual values that further obtain video in window proper vector, the proper vector of each video in window is totally 36 * N dimension;

C. 36 * N dimensional feature vector of each video in window and the number of people disaggregated model that trains are in advance sent into support vector machine software storehouse, select mode classification and the LINEAR kernel function of ONE_CLASS to classify, judge whether this video in window is number of people image, the words that are regard this video in window as number of people video in window;

The like, repeating step (1), to step (3) until traveled through all video in windows, obtains all number of people video in windows in present frame gray level image, and the label order of number of people video in window is identical with traversal order, according to from left to right, top-down principle.

The flow process of execution step 4 is as follows:

(1) all number of people video in windows of n number of people video in window of present frame gray level image and previous frame gray level image are carried out to position and area matched:

Location matches parameter

T (m) = \sqrt{{(p_{n} - x_{m})}^{2} + {(q_{n} - y_{m})}^{2}}

Area matched parameter A (m)=| Q _n-S _m|

N, m is variable, is respectively the label of number of people video in window in present frame and previous frame gray level image, n number of people video in window center position of present frame is [p _n, q _n], area is Q _n, m number of people video in window center position of previous frame is [x _m, y _m], area is S _m, n=1,2 ..., N, m=1,2 ..., M, N represents the number of people video in window number in present frame gray level image, M represents the number of people video in window number in previous frame gray level image, gets n=1, and whole values of traversal m, calculate the m value that makes location matches parameter T (m) minimum, are designated as

the optimum matching that represents first number of people video in window of present frame is previous frame J ₁individual number of people video in window, if T is (J ₁)≤th ₁and A (J ₁)≤th ₂, th ₁=15, th ₂=100, represent that it is J that first number of people video in window of present frame can find the number of people video in window of coupling in previous frame gray level image ₁individual, if T is (J ₁) > th ₁or A (J ₁) > th ₂, by J ₁set to 0, i.e. J ₁=0, represent that first number of people video in window of present frame does not mate with any number of people video in window of previous frame, be emerging number of people video in window;

(2) n adds 1 successively, and from n=2 until n=N, repeating step (1), finds out the number of people video in window J of all couplings ₂, J ₃..., J _nif, J ₁, J ₂..., J _ndo not comprise certain the value K from 1 to M, show K number of people video in window loss of previous frame;

(3) n number of people video in window of present frame gray level image carried out to position and area matched with above the second frame to all number of people video in windows of above H frame gray level image respectively, repeat above-mentioned steps (1), (2), if there is not the situation that number of people video in window is lost, n number of people video in window of present frame gray level image is analysis window image, and H is the arbitrary integer between 5 to 10.

The flow process of execution step 5 is as follows:

On analysis window image, use respectively the cascade adaboost method based on haar feature, utilize the face classification model training to carry out the detection of people's face, the number of people video in window that obtains the number of people video in window of people's face being detected and people's face do not detected.

The flow process of execution step 6 is as follows:

(1) detecting on the number of people video in window of people's face, number of people video in window done to six deciles in vertical direction and divide, in horizontal direction by a left side

region, the right side

region is not analyzed, and the region in the middle of only analyzing, gets B wherein ₁and B ₂region is as analyzed area image, B ₁for the 2nd region of counting from top to bottom, B ₂for the 5th region of counting from top to bottom, B ₁[i, j], B ₂[i, j] represents respectively area image B ₁, B ₂in level i, the gray-scale value of vertical j pixel, calculating B ₁, B ₂in area image, level i is individual, the difference value D of vertical j pixel ₁[i, j]:

D ₁[i，j]＝|B ₁[i，j]-B ₂[i，j]|

All pixels on [i, j] traversal area image, i=1,2 ..., W ₁, j=1,2 ..., H ₁, W ₁represent area image B ₁, B ₂width, H ₁represent area image B ₁, B ₂height, statistics D ₁[i, j] > th ₁pixel number, be designated as C ₁, when

time, adjudicate in gray level image and have masked man's face to exist, otherwise be normal person's face, th ₁=18;

(2) do not detecting on the number of people video in window of people's face, by doing the quartern in number of people video in window horizontal direction, dividing two region B in the middle of getting ₃, B ₄as analyzed area, B ₃[i, j], B ₄[i, j] represents respectively region B ₃, B ₄the gray-scale value of level i vertical j pixel, the average of gray-scale value is respectively E ₁, E ₂:

E_{1} = Σ_{i = 1}^{W_{2}} Σ_{j = 1}^{H_{2}} B_{3} [i, j]

E_{2} = Σ_{i = 1}^{W_{2}} Σ_{j = 1}^{H_{2}} B_{4} [i, j]

W ₂represent area image B ₃, B ₄width, H ₂represent area image B ₃, B ₄height, the average difference in two regions is Δ E=|E ₁-E ₂|, as Δ E > th ₂, show to have people from side face or the behavior of bowing, not masked man's face, th ₂=25;

(3) at the fooled Δ E≤th of the number of people video in window that people's face do not detected ₂time, number of people video in window is done to six deciles in vertical direction and divides, in horizontal direction by a left side

region, the right side

region is not analyzed, and the region in the middle of only analyzing, gets B wherein ₅and B ₂region is as analyzed area, B ₅for the 3rd region of counting from top to bottom, B ₂for the 5th region of counting from top to bottom, B ₅[i, j], B ₂[i, j] represents respectively region B ₅, B ₂in level i, the gray-scale value of vertical j pixel, calculating B ₅, B ₂in region, level i is individual, the difference value D of vertical j pixel ₂[i, j]:

D ₂[i，j]＝|B ₅[i，j]-B ₂[i，j]|

I=1,2 ..., W ₃, j=1,2 ..., H ₃, W ₃represent area image B ₅, B ₂width, H ₃represent region B ₅, B ₂height, statistics D ₂[i, j] > th ₃pixel number, be designated as C ₂, when

time, adjudicate in gray level image and have masked man's face to exist, otherwise be normal person's face, th ₃=18.

When being determined with masked man's face, location, position mark picture frame by this number of people video in window in former video color image, upload alarm center.

In step 3, the production process of number of people disaggregated model is as follows:

(1) collect the positive negative sample of the number of people, the 5000 secondary gray scale pictures that comprise head shoulder of take are positive sample, do not comprise 10000 secondary gray scale pictures of the number of people as negative sample, and sample size is consistent;

(2) extract the gradient orientation histogram statistical value of positive negative sample, and by gradient orientation histogram normalization, each value using the value after normalization in proper vector, the method for the gray level image extraction proper vector that method converts to video image with moving window method extraction monitoring site is obtained is identical;

(3) 5000 positive samples that extract and the proper vector of 10000 negative samples are input in support vector machine software storehouse, select the mode classification of ONE_CLASS and LINEAR kernel function to train, obtain an optimum number of people disaggregated model.

In step 5, the production process of face classification model is as follows:

(1) collect 5000 secondary people face gray scale pictures as positive sample, unified 20 * 20 the pixel size that zooms to, collects the gray scale picture of arbitrary size of 10000 secondary unmanned faces as negative sample;

(2) increase by 6 haar tagsort devices, the position in shape, area-of-interest for feature and scale-up factor that each tagsort device is used define, and are followed successively by:

A. tagsort device 1, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle upper left corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;

B. tagsort device 2, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle lower left corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;

C. tagsort device 3, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle upper right corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;

D. tagsort device 4, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle lower right corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;

E. tagsort device 5, whole rectangular area is 7 * 1 rectangle, in horizontal direction, there are 7 pixels, in vertical direction, there is 1 pixel, black rectangle region is rectangle level the 5th, the region at 6 pixel places, respond into pixel in whole rectangular area and 2 demultiplication black removal rectangular areas in pixel and 7 times;

F. tagsort device 6, whole rectangular area is 7 * 1 rectangle, in horizontal direction, there are 7 pixels, in vertical direction, there is 1 pixel, black rectangle region is rectangle level the 2nd, the region at 3 pixel places, respond into pixel in whole rectangular area and 2 demultiplication black removal rectangular areas in pixel and 7 times;

(3) utilize the haartraining storehouse training training pattern in opencv to obtain people's face disaggregated model.

The present invention utilizes image processing techniques and mode identification method to analyze the human body occurring in video image, so in the judgement number of people people's face existence whether, for the judgement of masked detection.Research tool for masked identification in video monitoring has very important significance, and can effectively prevent the lawbreaking activities of masked camouflage.

Accompanying drawing explanation:

It shown in Fig. 1, is the structured flowchart of system described in the present invention's masked man's face detecting method provided by the invention.

It shown in Fig. 2, is the process flow diagram of masked man's face detecting method provided by the invention.

It shown in Fig. 3, is the schematic diagram of interval in the embodiment of the present invention and cell unit relation.

It shown in Fig. 4, is the process flow diagram that in the embodiment of the present invention, the number of people is followed the tracks of.

It shown in Fig. 5, is masked decision-making schematic diagram 1 in the embodiment of the present invention.

It shown in Fig. 6, is masked decision-making schematic diagram 2 in the embodiment of the present invention.

It shown in Fig. 7, is the schematic diagram of newly-increased haar central feature 1 in the embodiment of the present invention.

It shown in Fig. 8, is the schematic diagram of newly-increased haar central feature 2 in the embodiment of the present invention.

It shown in Fig. 9, is the schematic diagram of newly-increased haar central feature 3 in the embodiment of the present invention.

It shown in Figure 10, is the schematic diagram of newly-increased haar central feature 4 in the embodiment of the present invention.

It shown in Figure 11, is the schematic diagram of newly-increased haar linear feature 1 in the embodiment of the present invention.

It shown in Figure 12, is the schematic diagram of newly-increased haar linear feature 2 in the embodiment of the present invention.

Embodiment:

The invention provides the masked man's face detecting method in a kind of video monitoring, the system architecture diagram of the method as shown in Figure 1, comprises video acquisition unit, masked detecting unit and alarm unit.

The major function of video acquisition unit is by general-purpose simulation camera, monitoring scene to be taken and obtained analog video image, then by general video frequency collection card, is converted to Digital Image Data.Here to the setting height(from bottom) of camera and angle, be to have certain requirements, the installation of camera will make the head shoulder region of human body all appear in video pictures, setting up of video camera will make can know and show people's face positive information in video pictures, with best over against people's face, so require closely positive shooting of camera.

The major function of masked detecting unit is that the color digital image data to sending into are converted to gray level image, then on gray level image, detect and whether have masked people's face to exist, in order to improve detection efficiency, before detection, first gray level image is dwindled, gray level image is reduced into the standard detection gray-scale map of 176 * 144 pixel size.If while having masked man's face to exist, the position of extracting this masked man's face, and in the corresponding position of original color image, this masked people's face is marked.

If masked man's face detected at masked detecting unit, exist, report to the police, will have the image uploading alarm unit of masked mark.

The invention provides the masked man's face detecting method in a kind of video monitoring, the method as shown in Figure 2, specifically comprises the steps:

One, coloured image is converted to gray level image s1

Each step of masked man's face detecting method of mentioning in the present invention is carried out on gray level image basis, so first coloured image will be converted to gray level image.

Two, gray level image is carried out to convergent-divergent s2

In order to improve detection efficiency, gray level image has been carried out to reduction operation, image is reduced into 176 * 144 pixel size, image scaling need to be at treatment effeciency, in the smoothness of result and sharpness, do a balance, the current method comparative maturity of image scaling, not in research range of the present invention, each step of mentioning is below to carry out on the basis of the gray level image after dwindling.

Three, at the enterprising pedestrian's head of gray level image, detect s3

Image after step 2 convergent-divergent, adopt moving window method, moving window size is 40 * 40, and horizontal scanning step-length is 3, and vertical scanning step-length is 2, edge from left to right, top-down direction, moves moving window by pixel, until scan complete image, window gray level image corresponding under each moving window is carried out respectively to number of people detection, hereinafter to be referred as video in window;

I represents the horizontal ordinate position of certain point in video in window, and j represents the ordinate position of certain point in video in window, I[i, j] represent the gray-scale value of the video in window that pixel [i, j] is located, each pixel of [i, j] cycling among windows image, video in window width is W ₀=40, be highly H ₀=40, when moving window is positioned at first video in window:

1. each pixel compute gradient direction and size of pair video in window, be specially:

(1) horizontal gradient of each pixel of calculation window image and VG (vertical gradient)

First, by the horizontal gradient G of each pixel of video in window _x[i, j] and VG (vertical gradient) G _y[i, j] is all initially set to 0:

G _x[i，j]＝0，i＝1，2，...，W ₀，j＝1，2，...，H ₀ (1-a)

G _y[i，j]＝0，i＝1，2，...，W ₀，j＝1，2，...，H ₀ (1-b)

On video in window, using boundary operator as computing template, from left to right, translation computing template is located to each pixel [i, j] successively from top to bottom, for preventing crossing the border, do not process and go up most,, the most left, the pictorial element on the rightest four limits, calculation template and pixel [i, j] weighting of field gray-scale value, each respective value of the corresponding computing of each weighted value wherein, obtains the horizontal gradient G of each pixel _x[i, j] and VG (vertical gradient) G _y[i, j], boundary operator can be got Roberts boundary operator, Sobel boundary operator, Prewitt boundary operator and Kirsch boundary operator, the present invention is with Sobel boundary operator

S_{x} = [\begin{matrix} - 1 & 0 & 1 \\ - 2 & 0 & 2 \\ - 1 & 0 & 1 \end{matrix}]

With

S_{y} = [\begin{matrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ - 1 & - 2 & - 1 \end{matrix}]

For example describes, horizontal gradient G _x[i, j] and VG (vertical gradient) G _ythe computing method of [i, j] are:

G_{x} [i, j] = Σ_{k = 1}^{3} Σ_{l = 1}^{3} I [i - 2 + k, j - 2 + l] S_{x} [k, l]

= I [i + 1, j - 1] + 2 \cdot I [i + 1, j] + I [i + 1, j + 1],

i＝2，3，...，W ₀-1，j＝2，3，...，H ₀-1 (2-a)

- I [i - 1, j - 1] - 2 \cdot I [i - 1, j] - I [i - 1, j + 1]

G_{y} [i, j] = Σ_{k = 1}^{3} Σ_{l = 1}^{3} I [i - 2 + k, j - 2 + l] S_{y} [k, l]

= I [i - 1, j - 1] + 2 \cdot I [i, j - 1] + I [i + 1, j - 1],

i＝2，3，...，W ₀-1，j＝2，3，...，H ₀-1 (2-b)

- I [i - 1, j + 1] - 2 \cdot I [i, j + 1] - I [i + 1, j + 1]

(2) the gradient magnitude G at each pixel place of calculation window image ₁[i, j] and gradient direction G ₂[i, j]:

G_{1} [i, j] = \sqrt{G_{x} {[i, j]}^{2} + G_{y} {[i, j]}^{2}},

i＝1，2，...，W ₀，j＝1，2，...，H ₀ (3-a)

i＝1，2，...，W ₀，j＝1，2，...，H ₀ (3-b)

Wherein, arctan () is arctan function,

for downward rounding operation symbol,

expression is not more than maximum integer; P represents passage number, can get the arbitrary integer between 2 to 180, and in embodiments of the present invention, the P=9 of take describes as example, gradient direction G ₂[i, j] can be expressed as:

2. utilize the gradient magnitude and the direction that obtain to carry out histogram of gradients statistics, each value using the value after the normalization of histogram of gradients statistical value as proper vector:

First video in window is divided into the connected region that size is identical, each connected region is exactly a cell unit cell, then adds up the gradient orientation histogram of each pixel in cell unit.In order to adapt to better illumination variation and shade impact, a plurality of cell are formed to an interval block, the cell in each block is carried out to the normalization of gradient orientation histogram statistical value.

The pixel number that every row of each cell comprises can be the arbitrary integer between 2 to 20, the pixel number that every row of each cell comprise can be the arbitrary integer between 2 to 20, the pixel number that the every row of each block comprises can be the arbitrary integer between 4 to 40, and the pixel number that every row of each block comprise can be the arbitrary integer between 4 to 40.With each cell, have 8 * 8 pixels in embodiments of the present invention, each block consists of example by 2 * 2=4 cell and describes.Be illustrated in figure 3 the schematic diagram of a block in this embodiment, the video in window for 40 * 40, has 25 cell, has 50% overlap between block and block, and image block number altogether has 4 * 4=16.In each cell, have 9 direction passages, the feature of each block has 4 * 9=36 feature, always has 16 * 36=576 dimensional feature.Computation process is as follows:

(1) the gradient orientation histogram statistical value of each cell unit in calculation window image

Each pixel in video in window in each cell unit is certain histogram passage ballot based on gradient direction, and the gradient magnitude of this pixel is as ballot weights.With H[m] [p] represent the statistics with histogram value of p the passage in m cell unit in each video in window, m is variable, is cell element numerals, since 1, by from left to right, it is variable that top-down order adds 1, p successively, it is passage label, L represents the cell unit number of each video in window horizontal direction, and M represents each video in window cell unit number altogether, and L and M are constant, only relevant with the size of video in window, in embodiments of the present invention, L=5, M=25;

Get m=1, during p=1, H[1] [1] represent the 1st cell unit, the statistics with histogram value of the 1st passage, computing formula is:

H [1] [1] = Σ_{i = 1}^{8} Σ_{j = 1}^{8} (G_{1} [i, j] \cdot w [i, j]) - - - (4 - a)

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = 1 \\ 0, G_{2} [i, j] &NotEqual; 1 \end{matrix},

H [1] [p] = Σ_{i = 1}^{8} Σ_{j = 1}^{8} (G_{1} [i, j] \cdot w [i, j]), p = 2,3, . . ., 9 - - - (4 - b)

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = p \\ 0, G_{2} [i, j] &NotEqual; p \end{matrix},

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = p \\ 0, G_{2} [i, j] &NotEqual; p \end{matrix},

represent to be not more than the maximum integer of (m-1)/5;

(2), by each cell unit gradient orientation histogram statistical value normalization in each interval of each video in window, extract the proper vector of video in window.

Each cell unit gradient orientation histogram statistical value in each interval of each video in window is added up, as normalized factor.S[n] represent the normalized factor of n in each video in window interval gradient orientation histogram statistical value, n is variable, it is interval label, since 1 by from left to right, top-down order adds 1 successively, in each video in window horizontal direction, comprise L-1 interval, N represents each video in window interval number altogether, N is constant, relevant with video in window size, in embodiments of the present invention, each video in window interval number N=16 altogether, has 4 intervals in each video in window horizontal direction;

While getting n=1n=1, the normalized factor S[1 of the 1st interval gradient orientation histogram statistical value of video in window] be the 1st interval all cells unit all passage gradient orientation histograms statistical value and,

S [1] = Σ_{p = 1}^{9} (H [1] [p] + H [2] [p] + H [6] [p] + H [7] [p]) - - - (5 - a)

By the 3rd the cell unit in the 1st interval in video in window, also be the 6th cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 19th to 27 values of video in window proper vector, are followed successively by

By the 4th the cell unit in the 1st interval in video in window, also be the 7th cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 28th to 36 values of video in window proper vector, are followed successively by

N adds 1 successively, from n=2 until n=N, calculates the normalized factor of the gradient orientation histogram statistical value in all intervals, and computing method are:

Wherein, expression is not more than

maximum integer;

Normalized factor by the gradient orientation histogram statistical value of each each passage of cell unit in n interval in video in window divided by nn interval, further obtain other 15 * 36 values of video in window proper vector, the proper vector of each video in window is totally 16 * 36=576 dimension;

576 dimensional feature vectors of each video in window and the number of people disaggregated model training are in advance sent into support vector machine software storehouse, select mode classification and the LINEAR kernel function of ONE_CLASS to classify, judge whether this video in window is number of people image, the words that are regard this video in window as number of people video in window;

The like, until scanned all video in windows, obtain all number of people video in windows in present image, obtain all number of people video in windows in present frame gray level image, the label order of number of people video in window is identical with traversal order, according to from left to right, top-down principle, describes in detail in the 7th part for headform's training.

Four, each number of people is mated to s4 in interframe

Masked man's face occurs in video, can not flash and mistake.Therefore, on the basis of step 3, the number of people detecting is followed the tracks of, the interference causing to get rid of the target of instantaneous appearance, specific implementation process is as shown in Figure 4.

Suppose M number of people video in window to be detected in previous frame gray level image, present frame gray level image detects N number of people video in window, n, m is variable, is respectively the label of number of people video in window in present frame and previous frame gray level image, n=1,2, ..., N, m=1,2, ..., M, m number of people video in window center position of previous frame is [x _m, y _m], area is S _m, n number of people video in window center position of present frame is [p _n, q _n], area is Q _n.

(1) n the number of people video in window to present frame gray level image, carries out position and area matched with all number of people video in windows of previous frame gray level image.The poor T of center position (m) and the area difference A (m) of n number of people video in window of present frame and m number of people video in window of previous frame are respectively:

T (m) = \sqrt{{(p_{n} - x_{m})}^{2} + {(q_{n} - y_{m})}^{2}} - - - (6 - a)

A(m)＝|Q _n-S _m| (6-b)

Get n=1, whole values of traversal m, calculate the m value that makes location matches parameter T (m) minimum, are designated as

the optimum matching that represents first number of people video in window of present frame is previous frame J ₁individual number of people video in window, if T is (J ₁)≤th ₁and A (J ₁)≤th ₂, th ₁for the threshold value of number of people video in window center interframe variation, in embodiments of the present invention th ₁=15, th ₂for number of people video in window area interframe change threshold, in embodiments of the present invention th ₂=100, represent that it is J that first number of people video in window of present frame can find the number of people video in window of coupling in previous frame gray level image ₁individual, if T is (J ₁) > th ₁or A (J ₁) > th ₂, by J ₁set to 0, i.e. J ₁=0, represent that first number of people video in window of present frame does not mate with any number of people video in window of previous frame, be emerging number of people video in window;

(3) n number of people video in window of present frame gray level image carried out to position and area matched with above the second frame to all number of people video in windows of above H frame gray level image respectively, repeat above-mentioned steps (1), (2), if there is not the situation that number of people video in window is lost, n number of people video in window of present frame gray level image is analysis window image, H is the arbitrary integer between 5 to 10, in embodiments of the present invention, and H=8.

Five, people's face detects s5

On the basis of step 4, when having continuous number of people video in window to exist, on analysis window image, use respectively the cascade adaboost method based on haar, the face classification model that utilization trains carries out the detection of people's face, the number of people video in window that obtains the number of people video in window of people's face being detected and people's face do not detected, in embodiments of the present invention, the method detecting at the enterprising pedestrian's face of each analysis window image is identical.

Six, masked judgement s6

The flow process of carrying out masked decision-making on the basis that the number of people detects and people's face detects is as follows:

1. detecting on the number of people video in window of people's face, according to the method shown in Fig. 5, number of people video in window is done to six deciles in vertical direction and divides, in horizontal direction by a left side

region, the right side

region is not analyzed, and the region in the middle of only analyzing, gets B wherein ₁and B ₂region is as analyzed area, B ₁for the 2nd region of counting from top to bottom, B ₁for the 5th region of counting from top to bottom, region B ₁with region B ₂wide identical, for number of people video in window width

region B ₁with region B ₂height identical, for number of people video in window height

b ₁[i, j], B ₂[i, j] represents respectively region B ₁, B ₂in level i, the gray-scale value of vertical j pixel, calculating B ₁, B ₂in region, level i is individual, the difference value D of vertical j pixel ₁[i, j], from left to right, scanning successively from top to bottom, by B in figure ₁region and B ₂the gray-scale value of corresponding position, region is done difference, D ₁[i, j] represents B ₁region and B ₂zone level i, the difference value at vertical j pixel place, the computing method of difference value are:

D ₁[i，j]＝|B ₁[i，j]-B ₂[i，j] (7)

[i, j] travels through region B ₁with region B ₂all pixels, i=1,2 ..., W ₁, j=1,2 ..., H ₁, W ₁represent region B ₁, B ₂width, H ₁represent region B ₁, B ₂height, statistics D ₁[i, j] > th ₁pixel number, be designated as C ₁, when

time, adjudicate in gray level image and have masked man's face to exist, otherwise be normal person's face, in embodiments of the present invention, th ₁=18;

2. not detecting on the number of people video in window of people's face, according to the method shown in Fig. 6, by doing the quartern in number of people video in window horizontal direction, divide two region B in the middle of getting ₃, B ₄as analyzed area, region B ₃, B ₄wide identical, for number of people video in window width

region B ₃, B ₄height identical, identical with number of people video in window height, B ₃[i, j], B ₄[i, j] represents respectively region B ₃, B ₄the gray-scale value of level i vertical J pixel.From left to right, scanning successively from top to bottom, according to the gray-scale value of each pixel, zoning B ₃, B ₄average be respectively E ₁, E ₂, computing method are:

E_{1} = Σ_{i = 1}^{W_{2}} Σ_{j = 1}^{H_{2}} B_{3} [i, j] - - - (8 - a)

E_{2} = Σ_{i = 1}^{W_{2}} Σ_{j = 1}^{H_{2}} B_{4} [i, j] - - - (8 - b)

W ₂represent region B ₃, B ₄width, H ₂represent region B ₃, B ₄height, the discrepancy delta E=|E of two regional average values ₁-E ₂|, as Δ E > th ₂, show to have people from side face or the behavior of bowing, not masked man's face, otherwise enter next step, in embodiments of the present invention, th ₂=25;

3. do not detecting on the number of people video in window of people's face, as Δ E≤th ₂time, according to the method shown in Fig. 5, number of people video in window is done to six deciles in vertical direction and divides, in horizontal direction by a left side

region, the right side

region is not analyzed, and the region in the middle of only analyzing, gets B wherein ₅and B ₂region is as analyzed area, B ₅for the 3rd region of counting from top to bottom, B ₂for the 5th region of counting from top to bottom, region B ₅with region B ₂wide identical, for number of people video in window width

region B ₅with region B ₂height identical, for number of people video in window height

use B ₅[i, j], B ₂[i, j] represents respectively region B ₅, B ₂in level i, the gray-scale value of vertical j pixel, from left to right, scans, from top to bottom successively by B in figure ₅region and B ₂the gray-scale value of corresponding position, region is done difference, D ₂[i, j] represents B ₅region and B ₂zone level i, the difference value at vertical j pixel place, computing method are:

D ₂[i，j]＝|B ₅[i，j]-B ₂[i，j]| (9)

[i, j] travels through region B ₅, B ₂all pixels, i=1,2 ..., W ₃, j=1,2 ..., H ₃, W ₃represent region B ₅, B ₂width, H ₃represent region B ₅, B ₂height, statistics D ₂[i, j] > th ₃pixel number, be designated as C ₂, when

time, adjudicate in gray level image and have masked man's face to exist, otherwise be normal person's face, in embodiments of the present invention, th ₃=18.

While having masked people's face to exist on judging number of people video in window, this number of people video in window position in original color image is located and is marked, upload alarm center.

Seven, headform trains s7

In the process detecting at the enterprising pedestrian's head of gray level image for step 3, used the number of people disaggregated model training in advance, training process s7 unit as shown in Figure 2, comprise and collect the positive negative sample of the number of people, extract gradient orientation histogram hog feature and use support vector machine software storehouse svmlight training of human head model.

1. collect the positive negative sample of the number of people

The 5000 width gray scale pictures that collection comprises head shoulder are as positive sample, and the 10000 secondary gray scale pictures that collection does not comprise the number of people are as negative sample, and sample size unification is scaled 40 * 40 sizes.

2. extract hog proper vector

Extract the gradient orientation histogram statistical value of positive negative sample, and by gradient orientation histogram normalization, each value using the value after normalization in proper vector, method is with to extract the method for proper vector by moving window method on video in window in step 3 identical.

3. use support vector machine software storehouse svmlight training of human head model

5000 positive samples that extract and the proper vector of 10000 negative samples are input in support vector machine software storehouse, select the mode classification of ONE_CLASS and LINEAR kernel function to train, obtain an optimum number of people disaggregated model.

Eight, faceform trains s8

For step 5, at the enterprising pedestrian's face of the number of people image extracting, detect, used the faceform who trains in advance in testing process, training process s8 unit as shown in Figure 2, comprises and collects the positive negative sample of people's face, increases haar feature and training faceform.

1. collect the positive negative sample of people's face

Collect 5000 secondary people face gray scale pictures, unified 20 * 20 the pixel size that zooms to, as positive sample.Collect the gray scale picture of 10000 secondary unmanned face arbitrary sizes as negative sample.

2. increase by 6 haar tagsort devices

For offside dough figurine face better detects, in embodiments of the present invention, increased by 6 haar tagsort devices, the position in shape, area-of-interest for feature and scale-up factor that each tagsort device is used define, and are followed successively by:

(1) tagsort device 1, as shown in Figure 7, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle upper left corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;

(2) tagsort device 2, as shown in Figure 8, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle lower left corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;

(3) tagsort device 3, as shown in Figure 9, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle upper right corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;

(4) tagsort device 4, as shown in figure 10, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle lower right corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;

(5) tagsort device 5, as shown in figure 11, whole rectangular area is 7 * 1 rectangle, in horizontal direction, there are 7 pixels, in vertical direction, there is 1 pixel, black rectangle region is rectangle level the 5th, the region at 6 pixel places, respond into pixel in whole rectangular area and 2 demultiplication black removal rectangular areas in pixel and 7 times;

(6) tagsort device 6, as shown in figure 12, whole rectangular area is 7 * 1 rectangle, in horizontal direction, there are 7 pixels, in vertical direction, there is 1 pixel, black rectangle region is rectangle level the 2nd, the region at 3 pixel places, respond into pixel in whole rectangular area and 2 demultiplication black removal rectangular areas in pixel and 7 times;

3. train faceform

Utilize the haartraining storehouse training faceform of comparative maturity in opencv.

Claims

1. the detection method of masked man's face in video monitoring, comprises the steps:

(1) color video frequency image monitoring site being obtained is converted to gray level image;

(2) gray level image is carried out to convergent-divergent;

(3) at the enterprising pedestrian's head of gray level image, detect, while the number of people having been detected, enter next step, without the number of people circulation step (1) to (3);

(4) each number of people is mated in interframe;

(5) carry out the detection of people's face;

(6) carry out masked judgement, to ruling out the original color video image of masked man's face, carry out mark, and report to the police,

Adopt moving window method in step (3), along from left to right, top-down direction, by pixel, move moving window, the video in window that gray level image is divided into corresponding each moving window, carries out number of people detection to video in window, when moving window is positioned at first video in window:

(1) the horizontal gradient G of each pixel of calculation window image _x[i, j] and VG (vertical gradient) G _y[i, j];

A.G _x[i, j] and G _ythe value initialization of [i, j] each pixel is 0, all pixels on [i, j] cycling among windows image, and i is variable, represents the horizontal level of pixel in video in window, value is i=1,2 ..., W ₀, j is variable, represents the upright position of pixel in video in window, value is j=1, and 2 ..., H ₀, W ₀, H ₀be respectively width and the height of video in window;

B. on video in window, using Sobel horizontal edge operator as computing template, translation computing template center is to each pixel place, and by each pixel and corresponding the multiplying each other of each element of computing template in the image-region under the covering of computing template, all sum of products are as the horizontal gradient G of each pixel _x[i, j], using Sobel vertical edge operator as computing template, obtains the VG (vertical gradient) G of each pixel _y[i, j], for preventing crossing the border, does not process and goes up most, under, the most left, the pictorial element on the rightest four limits, also work as j=1, i=1,2 ... W ₀or j=H ₀, i=1,2 ..., W ₀or i=1, j=1,2 ... H ₀or i=W ₀, j=1,2 ... H ₀time, G _x[i, j] and G _y[i, j] is initial value 0, like this, works as i=2, and 3 ..., W ₀-1, j=2,3 ..., H ₀-1 o'clock,

G_{x} [i, j] = Σ_{k = 1}^{3} Σ_{l = 1}^{3} I [i - 2 + k, j - 2 + l] S_{x} [k, l],

G_{y} [i, j] = Σ_{k = 1}^{3} Σ_{l = 1}^{3} I [i - 2 + k, j - 2 + l] S_{y} [k, l],

G_{1} [i, j] = \sqrt{G_{x} {[i, j]}^{2} + G_{y} {[i, j]}^{2}},

Wherein, i=1,2 ..., W ₀, j=1,2 ..., H ₀, arctan () is arctan function, for downward rounding operation symbol, expression is not more than

\frac{9}{π} \times (\arctan (\frac{G_{y} [i, j]}{G_{x} [i, j]}) + \frac{π}{2})

Maximum integer;

A. video in window is divided into the identical connected region of size, each connected region is comprised of 8 * 8 pixels, is called a cell unit, by between a square region of composition, 2 * 2 cell unit, between interval and interval, have 50% overlapping;

In video in window, the gradient direction span of each each pixel of cell unit is 1 to 9, each cell unit is comprised of 9 passages, utilize gradient magnitude and the gradient direction of each pixel of each cell unit, gradient magnitude within the scope of each passage correspondence direction of each cell unit is added up, obtain the gradient orientation histogram statistical value of each cell unit, H[m] [p] represent the statistics with histogram value of p the passage in m cell unit in each video in window, m is variable, it is cell element numerals, since 1 by from left to right, top-down order adds 1 successively, p is variable, it is passage label, L represents the cell unit number of each video in window horizontal direction, M represents each video in window cell unit number altogether, L and M are constant, only relevant with the size of video in window, get m=1, during p=1, H[1] [1] represent the 1st cell unit, the statistics with histogram value of the 1st passage, computing formula is:

H [1] [1] = Σ_{i = 1}^{8} Σ_{j = 1}^{8} (G_{1} [i, j] \cdot w [i, j]),

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = 1 \\ 0, G_{2} [i, j] &NotEqual; 1 \end{matrix},

H [1] [p] = Σ_{i = 1}^{8} Σ_{j = 1}^{8} (G_{1} [i, j] \cdot w [i, j]),

p＝2，3，...，9，

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = p \\ 0, G_{2} [i, j] &NotEqual; p \end{matrix},

p＝1，2，...，9

Wherein,

w [i, j] = \{\begin{matrix} 1, G_{2} [i, j] = p \\ 0, G_{2} [i, j] &NotEqual; p \end{matrix},

represent to be not more than (m-1)/5;

B. by each cell unit gradient orientation histogram statistical value normalization in each interval of each video in window, extract the proper vector of video in window, S[n] represent the normalized factor of n in each video in window interval gradient orientation histogram statistical value, n is variable, it is interval label, since 1 by from left to right, top-down order adds 1 successively, in each video in window horizontal direction, comprise L-1 interval, N represents each video in window interval number altogether, N is constant, relevant with video in window size, while getting n=1, the normalized factor S[1 of the 1st interval gradient orientation histogram statistical value of video in window] be the 1st interval all cells unit all passage gradient orientation histograms statistical value and,

S [1] = Σ_{p = 1}^{9} (H [1] [p] + H [2] [p] + H [L + 1] [p] + H [L + 2] [p])

Wherein,

expression is not more than

maximum integer;

The like, repeating step (1), to step (3) until traveled through all video in windows, obtains all number of people video in windows in present frame gray level image, the label order of number of people video in window is identical with traversal order, according to from left to right, top-down principle

Each number of people is as follows in the flow process of frame matching:

Location matches parameter

T (m) = \sqrt{{(p_{n} - x_{m})}^{2} + {(q_{n} - y_{m})}^{2}}

Area matched parameter A (m)=| Q _n-S _m|

(3) n number of people video in window of present frame gray level image carried out to position and area matched with above the second frame to all number of people video in windows of above H frame gray level image respectively, repeat above-mentioned steps (1), (2), if there is not the situation that number of people video in window is lost, n number of people video in window of present frame gray level image is analysis window image, H is the arbitrary integer between 5 to 10

The flow process of carrying out the detection of people's face is as follows:

On analysis window image, use respectively the cascade adaboost method based on haar feature, utilize the face classification model training to carry out the detection of people's face, the number of people video in window that obtains the number of people video in window of people's face being detected and people's face do not detected,

Detecting on the number of people video in window of people's face, number of people video in window done to six deciles in vertical direction and divide, in horizontal direction by a left side region, the right side

D ₁[i，j]＝|B ₁[i，j]-B ₂[i，j]|

All pixels on [i, j] traversal area image, i=1,2 ..., W ₁, j=1,2 ..., H ₁, W ₁represent area image B ₁, B ₂width, H ₁represent area image B ₁, B ₂height, statistics D ₁[i, j] > th ₁pixel number, be designated as C ₁, when time, adjudicate in gray level image and have masked man's face to exist, otherwise be normal person's face, th ₁=18.

2. method according to claim 1, is characterized in that, is not detecting on the number of people video in window of people's face, divides two region B in the middle of getting by doing the quartern in number of people video in window horizontal direction ₃, B ₄as analyzed area, B ₃[i, j], B ₄[i, j] represents respectively region B ₃, B ₄the gray-scale value of level i vertical j pixel, the average of gray-scale value is respectively E ₁, E ₂:

E_{1} = Σ_{i = 1}^{W_{2}} Σ_{j = 1}^{H_{2}} B_{3} [i, j]

E_{2} = Σ_{i = 1}^{W_{2}} Σ_{j = 1}^{H_{2}} B_{4} [i, j]

W ₂represent area image B ₃, B ₄width, H ₂represent area image B ₃, B ₄height, the average difference in two regions is Δ E=|E ₁-E ₂|, as Δ E > th ₂, show to have people from side face or the behavior of bowing, not masked man's face, th ₂=25.

3. method according to claim 1, is characterized in that, at the fooled Δ E≤th of the number of people video in window that people's face do not detected ₂time, number of people video in window is done to six deciles in vertical direction and divides, in horizontal direction by a left side

region, the right side

D ₂[i，j]＝|B ₅[i，j]-B ₂[i，j]

4. method according to claim 1, the production process that it is characterized in that carrying out the number of people disaggregated model described in number of people detection is as follows:

5. method according to claim 4, the production process that it is characterized in that carrying out the face classification model described in the detection of people's face is as follows: