CN102622584B - Method for detecting mask faces in video monitor - Google Patents

Method for detecting mask faces in video monitor Download PDF

Info

Publication number
CN102622584B
CN102622584B CN201210052716.7A CN201210052716A CN102622584B CN 102622584 B CN102622584 B CN 102622584B CN 201210052716 A CN201210052716 A CN 201210052716A CN 102622584 B CN102622584 B CN 102622584B
Authority
CN
China
Prior art keywords
window
video
people
pixel
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210052716.7A
Other languages
Chinese (zh)
Other versions
CN102622584A (en
Inventor
师改梅
胡入幻
白云
杨云
缪泽
补建
罗安
周聪俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Santai Intelligent Technology Co ltd
Original Assignee
CHENGDU SANTAI ELECTRONIC INDUSTRY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHENGDU SANTAI ELECTRONIC INDUSTRY Co Ltd filed Critical CHENGDU SANTAI ELECTRONIC INDUSTRY Co Ltd
Priority to CN201210052716.7A priority Critical patent/CN102622584B/en
Publication of CN102622584A publication Critical patent/CN102622584A/en
Application granted granted Critical
Publication of CN102622584B publication Critical patent/CN102622584B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a method for detecting mask faces in a video monitor, which aims to solve the problem that illegal actions of masks cannot be prevented by adopting a traditional video monitoring system. The method for detecting the mask faces in the video monitor comprises the following steps of: a-, converting a color video image acquired from a monitoring site into a gray level image; b-, performing scaling on the gray level image; c-, performing head detection on the gray level image; d-, performing interframe matching on every head when the heads appear; e-, performing face detection on the heads; and f-, performing a mask judgment, marking an original color video image judged to have the mask faces, and giving an alarm.

Description

The detection method of masked man's face in video monitoring
Technical field:
The present invention relates to image and process and area of pattern recognition, the method that particularly in video monitoring, people's face detects.
Background technology:
It is that people's face is detected from video image background that people's face in video monitoring detects, and owing to being subject to the impact of image background, brightness variation and other each side factors, people's face is detected as for a complicated research topic.At present, the training of the cascade adaboost method based on haar feature recognition methods is considered to the most ripe, the most effective method for detecting human face.And masked man's face as wear dark glasses or wear masks, is a kind of special people's face, the special people's face of this class detects relatively traditional people's face, and to detect feature different, has similar process, but entirely not identical.The comparatively ideal situation of masked detection is exactly complete facial image on the number of people, not detected, and the people's face for this reason detecting on basis at the number of people detects the main thought that becomes research.
In patent 201010122033.5, mentioned a kind of human body head detection method, the method is utilized the contour feature of human body crown portion, distinguishes a plurality of human body targets, the method is used for camera and overlooks at a distance shooting, the more complete occasion of ratio that number of people top is embodied.And in the certain applications field of video monitoring, as ATM monitoring, due to the singularity of the installation site of image capture device, mostly the image of the surveyed area obtaining, be human body front face information, can utilize face detail feature to detect.Min Li, the shape facility forming according to the number of people and shoulder in the Estimating the Number of People in Crowded Scenes by MID Based Foreground Segmentation and Head-shoulder Detection of Zhaoxiang Zhang etc., people's head inspecting method based on gradient orientation histogram hog has been proposed, the method can be carried out number of people detection more exactly, but the face detail for people's face is not analyzed, can not prevent the lawbreaking activities of masked camouflage.
Summary of the invention:
The object of this invention is to provide a kind ofly can judge in video image whether have masked man's face, prevent the malfeasant masked man's face detecting method of masked camouflage.
The present invention includes following steps:
1. color video frequency image monitoring site being obtained is converted to gray level image;
2. gray level image is carried out to convergent-divergent;
3. at the enterprising pedestrian's head of gray level image, detect, while the number of people having been detected, enter step below, without number of people circulation step 1 to 3;
4. each number of people is mated in interframe;
5. carry out the detection of people's face;
6. carry out masked judgement, to ruling out the original color video image of masked man's face, carry out mark, and report to the police.
In step 3, adopt moving window method, edge is from left to right, top-down direction, moves moving window by pixel, gray level image is divided into the video in window of corresponding each moving window, video in window is carried out to number of people detection, when moving window is positioned at first video in window:
(1) the horizontal gradient G of each pixel of calculation window image x[i, j] and VG (vertical gradient) G y[i, j]:
A.G x[i, j] and G y[i, j] initialization:
G x[i, j] and G yin [i, j], the value initialization of each pixel is 0, all pixels on [i, j] cycling among windows image, and i is variable, represents the horizontal level of pixel in video in window, value is i=1,2 ..., W 0, j is variable, represents the upright position of pixel in video in window, value is j=1, and 2 ..., H 0, W 0, H 0be respectively width and the height of video in window;
B. on video in window, calculate the horizontal gradient G of each pixel x[i, j] and VG (vertical gradient) G y[i, j]:
Using Sobel horizontal edge operator as computing template, translation computing template center is to each pixel place, each pixel and corresponding the multiplying each other of each element of computing template in image-region under computing template is covered, all sum of products are as the horizontal gradient G of each pixel x[i, j], using Sobel vertical edge operator as computing template, obtains the VG (vertical gradient) G of each pixel y[i, j], for preventing crossing the border, does not process and goes up most, under, the most left, the pictorial element on the rightest four limits, also work as j=1, i=1,2 ... W 0or j=H 0, i=1,2 ..., W 0or i=1, j=1,2 ... H 0or i=W 0, j=1,2 ... H 0time, G x[i, j] and G y[i, j] is initial value 0, like this, works as i=2, and 3 ..., W 0-1, j=2,3 ..., H 0-1 o'clock,
G x [ i , j ] = Σ k = 1 3 Σ l = 1 3 I [ i - 2 + k , j - 2 + l ] S x [ k , l ] ,
G y [ i , j ] = Σ k = 1 3 Σ l = 1 3 I [ i - 2 + k , j - 2 + l ] S y [ k , l ] ,
Wherein, I[i, j] represent the gray-scale value of each pixel of video in window, S x[k, l] represents the value of the capable l row of Sobel horizontal edge operator k, S y[k, l] represents the value of the capable l row of Sobel vertical edge operator k;
(2) the gradient magnitude G of each pixel of calculation window image 1[i, j] and gradient direction G 2[i, j]:
G 1 [ i , j ] = G x [ i , j ] 2 + G y [ i , j ] 2 ,
Figure GSB00001116210100024
Wherein, i=1,2 ..., W 0, j=1,2 ..., H 0, arctan () is arctan function,
Figure GSB00001116210100031
for downward rounding operation symbol, expression is not more than 9 π × ( arctan ( G y [ i , j ] G x [ i , j ] ) + π 2 ) Maximum integer;
(3) utilize the gradient magnitude of each pixel of video in window and direction to carry out gradient orientation histogram statistics, obtain the proper vector of video in window:
Video in window is divided into the identical connected region of size, and each connected region is comprised of 8 * 8 pixels, is called a cell unit, by between a square region of composition, 2 * 2 cell unit, between interval and interval, have 50% overlapping;
A. the gradient orientation histogram statistical value of each cell unit in calculation window image:
In video in window, the gradient direction span of each pixel is 1 to 9, each cell unit is comprised of 9 passages, utilize gradient magnitude and the gradient direction of each pixel of each cell unit, gradient magnitude within the scope of each passage correspondence direction of each cell unit is added up, obtain the gradient orientation histogram statistical value of each cell unit, H[m] [p] represent the statistics with histogram value of p the passage in m cell unit in each video in window, m is variable, it is cell element numerals, since 1 by from left to right, top-down order adds 1 successively, p is variable, it is passage label, L represents the cell unit number of each video in window horizontal direction, M represents each video in window cell unit number altogether, L and M are constant, only relevant with the size of video in window, get m=1, during p=1, H[1] [1] represent the 1st cell unit, the statistics with histogram value of the 1st passage, computing formula is:
H [ 1 ] [ 1 ] = Σ i = 1 8 Σ j = 1 8 ( G 1 [ i , j ] · w [ i , j ] ) ,
Wherein, w [ i , j ] = 1 , G 2 [ i , j ] = 1 0 , G 2 [ i , j ] ≠ 1 ,
P adds 1 successively, from p=2 until p=9 obtains respectively H[1] [2], H[1] [3] ..., H[1] [9], computing formula is:
H [ 1 ] [ p ] = Σ i = 1 8 Σ j = 1 8 ( G 1 [ i , j ] · w [ i , j ] ) , p=2,3,...,9,
Wherein, w [ i , j ] = 1 , G 2 [ i , j ] = p 0 , G 2 [ i , j ] ≠ p ,
M adds 1 successively, from m=2 to m=M, often adds once, and to there being 9 statistics with histogram values, corresponding computing formula is:
Figure GSB00001116210100038
p=1,2,...,9
Wherein, w [ i , j ] = 1 , G 2 [ i , j ] = p 0 , G 2 [ i , j ] ≠ p ,
Figure GSB00001116210100042
represent not wooden in (m-1)/5;
B. by each cell unit gradient orientation histogram statistical value normalization in each interval of each video in window, extract the proper vector of video in window:
S[n] represent the normalized factor of n in each video in window interval gradient orientation histogram statistical value, n is variable, it is interval label, since 1 by from left to right, top-down order adds 1 successively, in each video in window horizontal direction, comprise L-1 interval, N represents each video in window interval number altogether, N is constant, relevant with video in window size, while getting n=1, the normalized factor S[1 of the 1st interval gradient orientation histogram statistical value of video in window] be the 1st interval all cells unit all passage gradient orientation histograms statistical value and
S [ 1 ] = Σ p = 1 9 ( H [ 1 ] [ p ] + H [ 2 ] [ p ] + H [ L + 1 ] [ p ] + H [ L + 2 ] [ p ] )
By the 1st the cell unit in the 1st interval in video in window, also be the 1st cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 1st to the 9th value of video in window proper vector, are followed successively by
Figure GSB00001116210100044
By the 2nd the cell unit in the 1st interval in video in window, also be the 2nd cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 10th to 18 values of video in window proper vector, are followed successively by
Figure GSB00001116210100045
By the 3rd the cell unit in the 1st interval in video in window, also be L+1 cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 19th to 27 values of video in window proper vector, are followed successively by
Figure GSB00001116210100046
By the 4th the cell unit in the 1st interval in video in window, also be L+2 cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 28th to 36 values of video in window proper vector, are followed successively by
Figure GSB00001116210100047
N adds 1 successively, from n=2 until n=N, calculates the normalized factor of the gradient orientation histogram statistical value in all intervals,
Figure GSB00001116210100048
Wherein, expression is not more than
Figure GSB00001116210100052
maximum integer;
Normalized factor by the gradient orientation histogram statistical value of n each interval each passage of cell unit in video in window divided by n interval, other 36 * (N-1) individual values that further obtain video in window proper vector, the proper vector of each video in window is totally 36 * N dimension;
C. 36 * N dimensional feature vector of each video in window and the number of people disaggregated model that trains are in advance sent into support vector machine software storehouse, select mode classification and the LINEAR kernel function of ONE_CLASS to classify, judge whether this video in window is number of people image, the words that are regard this video in window as number of people video in window;
The like, repeating step (1), to step (3) until traveled through all video in windows, obtains all number of people video in windows in present frame gray level image, and the label order of number of people video in window is identical with traversal order, according to from left to right, top-down principle.
The flow process of execution step 4 is as follows:
(1) all number of people video in windows of n number of people video in window of present frame gray level image and previous frame gray level image are carried out to position and area matched:
Location matches parameter T ( m ) = ( p n - x m ) 2 + ( q n - y m ) 2
Area matched parameter A (m)=| Q n-S m|
N, m is variable, is respectively the label of number of people video in window in present frame and previous frame gray level image, n number of people video in window center position of present frame is [p n, q n], area is Q n, m number of people video in window center position of previous frame is [x m, y m], area is S m, n=1,2 ..., N, m=1,2 ..., M, N represents the number of people video in window number in present frame gray level image, M represents the number of people video in window number in previous frame gray level image, gets n=1, and whole values of traversal m, calculate the m value that makes location matches parameter T (m) minimum, are designated as
Figure GSB00001116210100054
the optimum matching that represents first number of people video in window of present frame is previous frame J 1individual number of people video in window, if T is (J 1)≤th 1and A (J 1)≤th 2, th 1=15, th 2=100, represent that it is J that first number of people video in window of present frame can find the number of people video in window of coupling in previous frame gray level image 1individual, if T is (J 1) > th 1or A (J 1) > th 2, by J 1set to 0, i.e. J 1=0, represent that first number of people video in window of present frame does not mate with any number of people video in window of previous frame, be emerging number of people video in window;
(2) n adds 1 successively, and from n=2 until n=N, repeating step (1), finds out the number of people video in window J of all couplings 2, J 3..., J nif, J 1, J 2..., J ndo not comprise certain the value K from 1 to M, show K number of people video in window loss of previous frame;
(3) n number of people video in window of present frame gray level image carried out to position and area matched with above the second frame to all number of people video in windows of above H frame gray level image respectively, repeat above-mentioned steps (1), (2), if there is not the situation that number of people video in window is lost, n number of people video in window of present frame gray level image is analysis window image, and H is the arbitrary integer between 5 to 10.
The flow process of execution step 5 is as follows:
On analysis window image, use respectively the cascade adaboost method based on haar feature, utilize the face classification model training to carry out the detection of people's face, the number of people video in window that obtains the number of people video in window of people's face being detected and people's face do not detected.
The flow process of execution step 6 is as follows:
(1) detecting on the number of people video in window of people's face, number of people video in window done to six deciles in vertical direction and divide, in horizontal direction by a left side
Figure GSB00001116210100061
region, the right side
Figure GSB00001116210100062
region is not analyzed, and the region in the middle of only analyzing, gets B wherein 1and B 2region is as analyzed area image, B 1for the 2nd region of counting from top to bottom, B 2for the 5th region of counting from top to bottom, B 1[i, j], B 2[i, j] represents respectively area image B 1, B 2in level i, the gray-scale value of vertical j pixel, calculating B 1, B 2in area image, level i is individual, the difference value D of vertical j pixel 1[i, j]:
D 1[i,j]=|B 1[i,j]-B 2[i,j]|
All pixels on [i, j] traversal area image, i=1,2 ..., W 1, j=1,2 ..., H 1, W 1represent area image B 1, B 2width, H 1represent area image B 1, B 2height, statistics D 1[i, j] > th 1pixel number, be designated as C 1, when
Figure GSB00001116210100063
time, adjudicate in gray level image and have masked man's face to exist, otherwise be normal person's face, th 1=18;
(2) do not detecting on the number of people video in window of people's face, by doing the quartern in number of people video in window horizontal direction, dividing two region B in the middle of getting 3, B 4as analyzed area, B 3[i, j], B 4[i, j] represents respectively region B 3, B 4the gray-scale value of level i vertical j pixel, the average of gray-scale value is respectively E 1, E 2:
E 1 = Σ i = 1 W 2 Σ j = 1 H 2 B 3 [ i , j ]
E 2 = Σ i = 1 W 2 Σ j = 1 H 2 B 4 [ i , j ]
W 2represent area image B 3, B 4width, H 2represent area image B 3, B 4height, the average difference in two regions is Δ E=|E 1-E 2|, as Δ E > th 2, show to have people from side face or the behavior of bowing, not masked man's face, th 2=25;
(3) at the fooled Δ E≤th of the number of people video in window that people's face do not detected 2time, number of people video in window is done to six deciles in vertical direction and divides, in horizontal direction by a left side
Figure GSB00001116210100071
region, the right side
Figure GSB00001116210100072
region is not analyzed, and the region in the middle of only analyzing, gets B wherein 5and B 2region is as analyzed area, B 5for the 3rd region of counting from top to bottom, B 2for the 5th region of counting from top to bottom, B 5[i, j], B 2[i, j] represents respectively region B 5, B 2in level i, the gray-scale value of vertical j pixel, calculating B 5, B 2in region, level i is individual, the difference value D of vertical j pixel 2[i, j]:
D 2[i,j]=|B 5[i,j]-B 2[i,j]|
I=1,2 ..., W 3, j=1,2 ..., H 3, W 3represent area image B 5, B 2width, H 3represent region B 5, B 2height, statistics D 2[i, j] > th 3pixel number, be designated as C 2, when
Figure GSB00001116210100073
time, adjudicate in gray level image and have masked man's face to exist, otherwise be normal person's face, th 3=18.
When being determined with masked man's face, location, position mark picture frame by this number of people video in window in former video color image, upload alarm center.
In step 3, the production process of number of people disaggregated model is as follows:
(1) collect the positive negative sample of the number of people, the 5000 secondary gray scale pictures that comprise head shoulder of take are positive sample, do not comprise 10000 secondary gray scale pictures of the number of people as negative sample, and sample size is consistent;
(2) extract the gradient orientation histogram statistical value of positive negative sample, and by gradient orientation histogram normalization, each value using the value after normalization in proper vector, the method for the gray level image extraction proper vector that method converts to video image with moving window method extraction monitoring site is obtained is identical;
(3) 5000 positive samples that extract and the proper vector of 10000 negative samples are input in support vector machine software storehouse, select the mode classification of ONE_CLASS and LINEAR kernel function to train, obtain an optimum number of people disaggregated model.
In step 5, the production process of face classification model is as follows:
(1) collect 5000 secondary people face gray scale pictures as positive sample, unified 20 * 20 the pixel size that zooms to, collects the gray scale picture of arbitrary size of 10000 secondary unmanned faces as negative sample;
(2) increase by 6 haar tagsort devices, the position in shape, area-of-interest for feature and scale-up factor that each tagsort device is used define, and are followed successively by:
A. tagsort device 1, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle upper left corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
B. tagsort device 2, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle lower left corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
C. tagsort device 3, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle upper right corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
D. tagsort device 4, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle lower right corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
E. tagsort device 5, whole rectangular area is 7 * 1 rectangle, in horizontal direction, there are 7 pixels, in vertical direction, there is 1 pixel, black rectangle region is rectangle level the 5th, the region at 6 pixel places, respond into pixel in whole rectangular area and 2 demultiplication black removal rectangular areas in pixel and 7 times;
F. tagsort device 6, whole rectangular area is 7 * 1 rectangle, in horizontal direction, there are 7 pixels, in vertical direction, there is 1 pixel, black rectangle region is rectangle level the 2nd, the region at 3 pixel places, respond into pixel in whole rectangular area and 2 demultiplication black removal rectangular areas in pixel and 7 times;
(3) utilize the haartraining storehouse training training pattern in opencv to obtain people's face disaggregated model.
The present invention utilizes image processing techniques and mode identification method to analyze the human body occurring in video image, so in the judgement number of people people's face existence whether, for the judgement of masked detection.Research tool for masked identification in video monitoring has very important significance, and can effectively prevent the lawbreaking activities of masked camouflage.
Accompanying drawing explanation:
It shown in Fig. 1, is the structured flowchart of system described in the present invention's masked man's face detecting method provided by the invention.
It shown in Fig. 2, is the process flow diagram of masked man's face detecting method provided by the invention.
It shown in Fig. 3, is the schematic diagram of interval in the embodiment of the present invention and cell unit relation.
It shown in Fig. 4, is the process flow diagram that in the embodiment of the present invention, the number of people is followed the tracks of.
It shown in Fig. 5, is masked decision-making schematic diagram 1 in the embodiment of the present invention.
It shown in Fig. 6, is masked decision-making schematic diagram 2 in the embodiment of the present invention.
It shown in Fig. 7, is the schematic diagram of newly-increased haar central feature 1 in the embodiment of the present invention.
It shown in Fig. 8, is the schematic diagram of newly-increased haar central feature 2 in the embodiment of the present invention.
It shown in Fig. 9, is the schematic diagram of newly-increased haar central feature 3 in the embodiment of the present invention.
It shown in Figure 10, is the schematic diagram of newly-increased haar central feature 4 in the embodiment of the present invention.
It shown in Figure 11, is the schematic diagram of newly-increased haar linear feature 1 in the embodiment of the present invention.
It shown in Figure 12, is the schematic diagram of newly-increased haar linear feature 2 in the embodiment of the present invention.
Embodiment:
The invention provides the masked man's face detecting method in a kind of video monitoring, the system architecture diagram of the method as shown in Figure 1, comprises video acquisition unit, masked detecting unit and alarm unit.
The major function of video acquisition unit is by general-purpose simulation camera, monitoring scene to be taken and obtained analog video image, then by general video frequency collection card, is converted to Digital Image Data.Here to the setting height(from bottom) of camera and angle, be to have certain requirements, the installation of camera will make the head shoulder region of human body all appear in video pictures, setting up of video camera will make can know and show people's face positive information in video pictures, with best over against people's face, so require closely positive shooting of camera.
The major function of masked detecting unit is that the color digital image data to sending into are converted to gray level image, then on gray level image, detect and whether have masked people's face to exist, in order to improve detection efficiency, before detection, first gray level image is dwindled, gray level image is reduced into the standard detection gray-scale map of 176 * 144 pixel size.If while having masked man's face to exist, the position of extracting this masked man's face, and in the corresponding position of original color image, this masked people's face is marked.
If masked man's face detected at masked detecting unit, exist, report to the police, will have the image uploading alarm unit of masked mark.
The invention provides the masked man's face detecting method in a kind of video monitoring, the method as shown in Figure 2, specifically comprises the steps:
One, coloured image is converted to gray level image s1
Each step of masked man's face detecting method of mentioning in the present invention is carried out on gray level image basis, so first coloured image will be converted to gray level image.
Two, gray level image is carried out to convergent-divergent s2
In order to improve detection efficiency, gray level image has been carried out to reduction operation, image is reduced into 176 * 144 pixel size, image scaling need to be at treatment effeciency, in the smoothness of result and sharpness, do a balance, the current method comparative maturity of image scaling, not in research range of the present invention, each step of mentioning is below to carry out on the basis of the gray level image after dwindling.
Three, at the enterprising pedestrian's head of gray level image, detect s3
Image after step 2 convergent-divergent, adopt moving window method, moving window size is 40 * 40, and horizontal scanning step-length is 3, and vertical scanning step-length is 2, edge from left to right, top-down direction, moves moving window by pixel, until scan complete image, window gray level image corresponding under each moving window is carried out respectively to number of people detection, hereinafter to be referred as video in window;
I represents the horizontal ordinate position of certain point in video in window, and j represents the ordinate position of certain point in video in window, I[i, j] represent the gray-scale value of the video in window that pixel [i, j] is located, each pixel of [i, j] cycling among windows image, video in window width is W 0=40, be highly H 0=40, when moving window is positioned at first video in window:
1. each pixel compute gradient direction and size of pair video in window, be specially:
(1) horizontal gradient of each pixel of calculation window image and VG (vertical gradient)
First, by the horizontal gradient G of each pixel of video in window x[i, j] and VG (vertical gradient) G y[i, j] is all initially set to 0:
G x[i,j]=0,i=1,2,...,W 0,j=1,2,...,H 0 (1-a)
G y[i,j]=0,i=1,2,...,W 0,j=1,2,...,H 0 (1-b)
On video in window, using boundary operator as computing template, from left to right, translation computing template is located to each pixel [i, j] successively from top to bottom, for preventing crossing the border, do not process and go up most,, the most left, the pictorial element on the rightest four limits, calculation template and pixel [i, j] weighting of field gray-scale value, each respective value of the corresponding computing of each weighted value wherein, obtains the horizontal gradient G of each pixel x[i, j] and VG (vertical gradient) G y[i, j], boundary operator can be got Roberts boundary operator, Sobel boundary operator, Prewitt boundary operator and Kirsch boundary operator, the present invention is with Sobel boundary operator S x = - 1 0 1 - 2 0 2 - 1 0 1 With S y = 1 2 1 0 0 0 - 1 - 2 - 1 For example describes, horizontal gradient G x[i, j] and VG (vertical gradient) G ythe computing method of [i, j] are:
G x [ i , j ] = Σ k = 1 3 Σ l = 1 3 I [ i - 2 + k , j - 2 + l ] S x [ k , l ]
= I [ i + 1 , j - 1 ] + 2 · I [ i + 1 , j ] + I [ i + 1 , j + 1 ] , i=2,3,...,W 0-1,j=2,3,...,H 0-1 (2-a)
- I [ i - 1 , j - 1 ] - 2 · I [ i - 1 , j ] - I [ i - 1 , j + 1 ]
G y [ i , j ] = Σ k = 1 3 Σ l = 1 3 I [ i - 2 + k , j - 2 + l ] S y [ k , l ]
= I [ i - 1 , j - 1 ] + 2 · I [ i , j - 1 ] + I [ i + 1 , j - 1 ] , i=2,3,...,W 0-1,j=2,3,...,H 0-1 (2-b)
- I [ i - 1 , j + 1 ] - 2 · I [ i , j + 1 ] - I [ i + 1 , j + 1 ]
Wherein, I[i, j] represent the gray-scale value of each pixel of video in window, S x[k, l] represents the value of the capable l row of Sobel horizontal edge operator k, S y[k, l] represents the value of the capable l row of Sobel vertical edge operator k;
(2) the gradient magnitude G at each pixel place of calculation window image 1[i, j] and gradient direction G 2[i, j]:
G 1 [ i , j ] = G x [ i , j ] 2 + G y [ i , j ] 2 , i=1,2,...,W 0,j=1,2,...,H 0 (3-a)
Figure GSB000011162101001010
i=1,2,...,W 0,j=1,2,...,H 0 (3-b)
Wherein, arctan () is arctan function,
Figure GSB000011162101001011
for downward rounding operation symbol,
Figure GSB000011162101001012
expression is not more than maximum integer; P represents passage number, can get the arbitrary integer between 2 to 180, and in embodiments of the present invention, the P=9 of take describes as example, gradient direction G 2[i, j] can be expressed as:
Figure GSB00001116210100111
2. utilize the gradient magnitude and the direction that obtain to carry out histogram of gradients statistics, each value using the value after the normalization of histogram of gradients statistical value as proper vector:
First video in window is divided into the connected region that size is identical, each connected region is exactly a cell unit cell, then adds up the gradient orientation histogram of each pixel in cell unit.In order to adapt to better illumination variation and shade impact, a plurality of cell are formed to an interval block, the cell in each block is carried out to the normalization of gradient orientation histogram statistical value.
The pixel number that every row of each cell comprises can be the arbitrary integer between 2 to 20, the pixel number that every row of each cell comprise can be the arbitrary integer between 2 to 20, the pixel number that the every row of each block comprises can be the arbitrary integer between 4 to 40, and the pixel number that every row of each block comprise can be the arbitrary integer between 4 to 40.With each cell, have 8 * 8 pixels in embodiments of the present invention, each block consists of example by 2 * 2=4 cell and describes.Be illustrated in figure 3 the schematic diagram of a block in this embodiment, the video in window for 40 * 40, has 25 cell, has 50% overlap between block and block, and image block number altogether has 4 * 4=16.In each cell, have 9 direction passages, the feature of each block has 4 * 9=36 feature, always has 16 * 36=576 dimensional feature.Computation process is as follows:
(1) the gradient orientation histogram statistical value of each cell unit in calculation window image
Each pixel in video in window in each cell unit is certain histogram passage ballot based on gradient direction, and the gradient magnitude of this pixel is as ballot weights.With H[m] [p] represent the statistics with histogram value of p the passage in m cell unit in each video in window, m is variable, is cell element numerals, since 1, by from left to right, it is variable that top-down order adds 1, p successively, it is passage label, L represents the cell unit number of each video in window horizontal direction, and M represents each video in window cell unit number altogether, and L and M are constant, only relevant with the size of video in window, in embodiments of the present invention, L=5, M=25;
Get m=1, during p=1, H[1] [1] represent the 1st cell unit, the statistics with histogram value of the 1st passage, computing formula is:
H [ 1 ] [ 1 ] = Σ i = 1 8 Σ j = 1 8 ( G 1 [ i , j ] · w [ i , j ] ) - - - ( 4 - a )
Wherein, w [ i , j ] = 1 , G 2 [ i , j ] = 1 0 , G 2 [ i , j ] ≠ 1 ,
P adds 1 successively, from p=2 until p=9 obtains respectively H[1] [2], H[1] [3] ..., H[1] [9], computing formula is:
H [ 1 ] [ p ] = Σ i = 1 8 Σ j = 1 8 ( G 1 [ i , j ] · w [ i , j ] ) , p = 2,3 , . . . , 9 - - - ( 4 - b )
Wherein, w [ i , j ] = 1 , G 2 [ i , j ] = p 0 , G 2 [ i , j ] ≠ p ,
M adds 1 successively, from m=2 to m=M, often adds once, and to there being 9 statistics with histogram values, corresponding computing formula is:
Figure GSB00001116210100123
Wherein, w [ i , j ] = 1 , G 2 [ i , j ] = p 0 , G 2 [ i , j ] ≠ p ,
Figure GSB00001116210100125
represent to be not more than the maximum integer of (m-1)/5;
(2), by each cell unit gradient orientation histogram statistical value normalization in each interval of each video in window, extract the proper vector of video in window.
Each cell unit gradient orientation histogram statistical value in each interval of each video in window is added up, as normalized factor.S[n] represent the normalized factor of n in each video in window interval gradient orientation histogram statistical value, n is variable, it is interval label, since 1 by from left to right, top-down order adds 1 successively, in each video in window horizontal direction, comprise L-1 interval, N represents each video in window interval number altogether, N is constant, relevant with video in window size, in embodiments of the present invention, each video in window interval number N=16 altogether, has 4 intervals in each video in window horizontal direction;
While getting n=1n=1, the normalized factor S[1 of the 1st interval gradient orientation histogram statistical value of video in window] be the 1st interval all cells unit all passage gradient orientation histograms statistical value and,
S [ 1 ] = Σ p = 1 9 ( H [ 1 ] [ p ] + H [ 2 ] [ p ] + H [ 6 ] [ p ] + H [ 7 ] [ p ] ) - - - ( 5 - a )
By the 1st the cell unit in the 1st interval in video in window, also be the 1st cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 1st to the 9th value of video in window proper vector, are followed successively by
Figure GSB00001116210100127
By the 2nd the cell unit in the 1st interval in video in window, also be the 2nd cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 10th to 18 values of video in window proper vector, are followed successively by
Figure GSB00001116210100128
By the 3rd the cell unit in the 1st interval in video in window, also be the 6th cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 19th to 27 values of video in window proper vector, are followed successively by
Figure GSB00001116210100131
By the 4th the cell unit in the 1st interval in video in window, also be the 7th cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 28th to 36 values of video in window proper vector, are followed successively by
Figure GSB00001116210100132
N adds 1 successively, from n=2 until n=N, calculates the normalized factor of the gradient orientation histogram statistical value in all intervals, and computing method are:
Figure GSB00001116210100133
Wherein, expression is not more than
Figure GSB00001116210100135
maximum integer;
Normalized factor by the gradient orientation histogram statistical value of each each passage of cell unit in n interval in video in window divided by nn interval, further obtain other 15 * 36 values of video in window proper vector, the proper vector of each video in window is totally 16 * 36=576 dimension;
576 dimensional feature vectors of each video in window and the number of people disaggregated model training are in advance sent into support vector machine software storehouse, select mode classification and the LINEAR kernel function of ONE_CLASS to classify, judge whether this video in window is number of people image, the words that are regard this video in window as number of people video in window;
The like, until scanned all video in windows, obtain all number of people video in windows in present image, obtain all number of people video in windows in present frame gray level image, the label order of number of people video in window is identical with traversal order, according to from left to right, top-down principle, describes in detail in the 7th part for headform's training.
Four, each number of people is mated to s4 in interframe
Masked man's face occurs in video, can not flash and mistake.Therefore, on the basis of step 3, the number of people detecting is followed the tracks of, the interference causing to get rid of the target of instantaneous appearance, specific implementation process is as shown in Figure 4.
Suppose M number of people video in window to be detected in previous frame gray level image, present frame gray level image detects N number of people video in window, n, m is variable, is respectively the label of number of people video in window in present frame and previous frame gray level image, n=1,2, ..., N, m=1,2, ..., M, m number of people video in window center position of previous frame is [x m, y m], area is S m, n number of people video in window center position of present frame is [p n, q n], area is Q n.
(1) n the number of people video in window to present frame gray level image, carries out position and area matched with all number of people video in windows of previous frame gray level image.The poor T of center position (m) and the area difference A (m) of n number of people video in window of present frame and m number of people video in window of previous frame are respectively:
T ( m ) = ( p n - x m ) 2 + ( q n - y m ) 2 - - - ( 6 - a )
A(m)=|Q n-S m| (6-b)
Get n=1, whole values of traversal m, calculate the m value that makes location matches parameter T (m) minimum, are designated as
Figure GSB00001116210100142
the optimum matching that represents first number of people video in window of present frame is previous frame J 1individual number of people video in window, if T is (J 1)≤th 1and A (J 1)≤th 2, th 1for the threshold value of number of people video in window center interframe variation, in embodiments of the present invention th 1=15, th 2for number of people video in window area interframe change threshold, in embodiments of the present invention th 2=100, represent that it is J that first number of people video in window of present frame can find the number of people video in window of coupling in previous frame gray level image 1individual, if T is (J 1) > th 1or A (J 1) > th 2, by J 1set to 0, i.e. J 1=0, represent that first number of people video in window of present frame does not mate with any number of people video in window of previous frame, be emerging number of people video in window;
(2) n adds 1 successively, and from n=2 until n=N, repeating step (1), finds out the number of people video in window J of all couplings 2, J 3..., J nif, J 1, J 2..., J ndo not comprise certain the value K from 1 to M, show K number of people video in window loss of previous frame;
(3) n number of people video in window of present frame gray level image carried out to position and area matched with above the second frame to all number of people video in windows of above H frame gray level image respectively, repeat above-mentioned steps (1), (2), if there is not the situation that number of people video in window is lost, n number of people video in window of present frame gray level image is analysis window image, H is the arbitrary integer between 5 to 10, in embodiments of the present invention, and H=8.
Five, people's face detects s5
On the basis of step 4, when having continuous number of people video in window to exist, on analysis window image, use respectively the cascade adaboost method based on haar, the face classification model that utilization trains carries out the detection of people's face, the number of people video in window that obtains the number of people video in window of people's face being detected and people's face do not detected, in embodiments of the present invention, the method detecting at the enterprising pedestrian's face of each analysis window image is identical.
Six, masked judgement s6
The flow process of carrying out masked decision-making on the basis that the number of people detects and people's face detects is as follows:
1. detecting on the number of people video in window of people's face, according to the method shown in Fig. 5, number of people video in window is done to six deciles in vertical direction and divides, in horizontal direction by a left side
Figure GSB00001116210100143
region, the right side
Figure GSB00001116210100144
region is not analyzed, and the region in the middle of only analyzing, gets B wherein 1and B 2region is as analyzed area, B 1for the 2nd region of counting from top to bottom, B 1for the 5th region of counting from top to bottom, region B 1with region B 2wide identical, for number of people video in window width
Figure GSB00001116210100151
region B 1with region B 2height identical, for number of people video in window height
Figure GSB00001116210100152
b 1[i, j], B 2[i, j] represents respectively region B 1, B 2in level i, the gray-scale value of vertical j pixel, calculating B 1, B 2in region, level i is individual, the difference value D of vertical j pixel 1[i, j], from left to right, scanning successively from top to bottom, by B in figure 1region and B 2the gray-scale value of corresponding position, region is done difference, D 1[i, j] represents B 1region and B 2zone level i, the difference value at vertical j pixel place, the computing method of difference value are:
D 1[i,j]=|B 1[i,j]-B 2[i,j] (7)
[i, j] travels through region B 1with region B 2all pixels, i=1,2 ..., W 1, j=1,2 ..., H 1, W 1represent region B 1, B 2width, H 1represent region B 1, B 2height, statistics D 1[i, j] > th 1pixel number, be designated as C 1, when
Figure GSB00001116210100153
time, adjudicate in gray level image and have masked man's face to exist, otherwise be normal person's face, in embodiments of the present invention, th 1=18;
2. not detecting on the number of people video in window of people's face, according to the method shown in Fig. 6, by doing the quartern in number of people video in window horizontal direction, divide two region B in the middle of getting 3, B 4as analyzed area, region B 3, B 4wide identical, for number of people video in window width
Figure GSB00001116210100154
region B 3, B 4height identical, identical with number of people video in window height, B 3[i, j], B 4[i, j] represents respectively region B 3, B 4the gray-scale value of level i vertical J pixel.From left to right, scanning successively from top to bottom, according to the gray-scale value of each pixel, zoning B 3, B 4average be respectively E 1, E 2, computing method are:
E 1 = Σ i = 1 W 2 Σ j = 1 H 2 B 3 [ i , j ] - - - ( 8 - a )
E 2 = Σ i = 1 W 2 Σ j = 1 H 2 B 4 [ i , j ] - - - ( 8 - b )
W 2represent region B 3, B 4width, H 2represent region B 3, B 4height, the discrepancy delta E=|E of two regional average values 1-E 2|, as Δ E > th 2, show to have people from side face or the behavior of bowing, not masked man's face, otherwise enter next step, in embodiments of the present invention, th 2=25;
3. do not detecting on the number of people video in window of people's face, as Δ E≤th 2time, according to the method shown in Fig. 5, number of people video in window is done to six deciles in vertical direction and divides, in horizontal direction by a left side
Figure GSB00001116210100161
region, the right side
Figure GSB00001116210100162
region is not analyzed, and the region in the middle of only analyzing, gets B wherein 5and B 2region is as analyzed area, B 5for the 3rd region of counting from top to bottom, B 2for the 5th region of counting from top to bottom, region B 5with region B 2wide identical, for number of people video in window width
Figure GSB00001116210100163
region B 5with region B 2height identical, for number of people video in window height
Figure GSB00001116210100164
use B 5[i, j], B 2[i, j] represents respectively region B 5, B 2in level i, the gray-scale value of vertical j pixel, from left to right, scans, from top to bottom successively by B in figure 5region and B 2the gray-scale value of corresponding position, region is done difference, D 2[i, j] represents B 5region and B 2zone level i, the difference value at vertical j pixel place, computing method are:
D 2[i,j]=|B 5[i,j]-B 2[i,j]| (9)
[i, j] travels through region B 5, B 2all pixels, i=1,2 ..., W 3, j=1,2 ..., H 3, W 3represent region B 5, B 2width, H 3represent region B 5, B 2height, statistics D 2[i, j] > th 3pixel number, be designated as C 2, when
Figure GSB00001116210100165
time, adjudicate in gray level image and have masked man's face to exist, otherwise be normal person's face, in embodiments of the present invention, th 3=18.
While having masked people's face to exist on judging number of people video in window, this number of people video in window position in original color image is located and is marked, upload alarm center.
Seven, headform trains s7
In the process detecting at the enterprising pedestrian's head of gray level image for step 3, used the number of people disaggregated model training in advance, training process s7 unit as shown in Figure 2, comprise and collect the positive negative sample of the number of people, extract gradient orientation histogram hog feature and use support vector machine software storehouse svmlight training of human head model.
1. collect the positive negative sample of the number of people
The 5000 width gray scale pictures that collection comprises head shoulder are as positive sample, and the 10000 secondary gray scale pictures that collection does not comprise the number of people are as negative sample, and sample size unification is scaled 40 * 40 sizes.
2. extract hog proper vector
Extract the gradient orientation histogram statistical value of positive negative sample, and by gradient orientation histogram normalization, each value using the value after normalization in proper vector, method is with to extract the method for proper vector by moving window method on video in window in step 3 identical.
3. use support vector machine software storehouse svmlight training of human head model
5000 positive samples that extract and the proper vector of 10000 negative samples are input in support vector machine software storehouse, select the mode classification of ONE_CLASS and LINEAR kernel function to train, obtain an optimum number of people disaggregated model.
Eight, faceform trains s8
For step 5, at the enterprising pedestrian's face of the number of people image extracting, detect, used the faceform who trains in advance in testing process, training process s8 unit as shown in Figure 2, comprises and collects the positive negative sample of people's face, increases haar feature and training faceform.
1. collect the positive negative sample of people's face
Collect 5000 secondary people face gray scale pictures, unified 20 * 20 the pixel size that zooms to, as positive sample.Collect the gray scale picture of 10000 secondary unmanned face arbitrary sizes as negative sample.
2. increase by 6 haar tagsort devices
For offside dough figurine face better detects, in embodiments of the present invention, increased by 6 haar tagsort devices, the position in shape, area-of-interest for feature and scale-up factor that each tagsort device is used define, and are followed successively by:
(1) tagsort device 1, as shown in Figure 7, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle upper left corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
(2) tagsort device 2, as shown in Figure 8, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle lower left corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
(3) tagsort device 3, as shown in Figure 9, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle upper right corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
(4) tagsort device 4, as shown in figure 10, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle lower right corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
(5) tagsort device 5, as shown in figure 11, whole rectangular area is 7 * 1 rectangle, in horizontal direction, there are 7 pixels, in vertical direction, there is 1 pixel, black rectangle region is rectangle level the 5th, the region at 6 pixel places, respond into pixel in whole rectangular area and 2 demultiplication black removal rectangular areas in pixel and 7 times;
(6) tagsort device 6, as shown in figure 12, whole rectangular area is 7 * 1 rectangle, in horizontal direction, there are 7 pixels, in vertical direction, there is 1 pixel, black rectangle region is rectangle level the 2nd, the region at 3 pixel places, respond into pixel in whole rectangular area and 2 demultiplication black removal rectangular areas in pixel and 7 times;
3. train faceform
Utilize the haartraining storehouse training faceform of comparative maturity in opencv.

Claims (5)

1. the detection method of masked man's face in video monitoring, comprises the steps:
(1) color video frequency image monitoring site being obtained is converted to gray level image;
(2) gray level image is carried out to convergent-divergent;
(3) at the enterprising pedestrian's head of gray level image, detect, while the number of people having been detected, enter next step, without the number of people circulation step (1) to (3);
(4) each number of people is mated in interframe;
(5) carry out the detection of people's face;
(6) carry out masked judgement, to ruling out the original color video image of masked man's face, carry out mark, and report to the police,
Adopt moving window method in step (3), along from left to right, top-down direction, by pixel, move moving window, the video in window that gray level image is divided into corresponding each moving window, carries out number of people detection to video in window, when moving window is positioned at first video in window:
(1) the horizontal gradient G of each pixel of calculation window image x[i, j] and VG (vertical gradient) G y[i, j];
A.G x[i, j] and G ythe value initialization of [i, j] each pixel is 0, all pixels on [i, j] cycling among windows image, and i is variable, represents the horizontal level of pixel in video in window, value is i=1,2 ..., W 0, j is variable, represents the upright position of pixel in video in window, value is j=1, and 2 ..., H 0, W 0, H 0be respectively width and the height of video in window;
B. on video in window, using Sobel horizontal edge operator as computing template, translation computing template center is to each pixel place, and by each pixel and corresponding the multiplying each other of each element of computing template in the image-region under the covering of computing template, all sum of products are as the horizontal gradient G of each pixel x[i, j], using Sobel vertical edge operator as computing template, obtains the VG (vertical gradient) G of each pixel y[i, j], for preventing crossing the border, does not process and goes up most, under, the most left, the pictorial element on the rightest four limits, also work as j=1, i=1,2 ... W 0or j=H 0, i=1,2 ..., W 0or i=1, j=1,2 ... H 0or i=W 0, j=1,2 ... H 0time, G x[i, j] and G y[i, j] is initial value 0, like this, works as i=2, and 3 ..., W 0-1, j=2,3 ..., H 0-1 o'clock,
G x [ i , j ] = Σ k = 1 3 Σ l = 1 3 I [ i - 2 + k , j - 2 + l ] S x [ k , l ] ,
G y [ i , j ] = Σ k = 1 3 Σ l = 1 3 I [ i - 2 + k , j - 2 + l ] S y [ k , l ] ,
Wherein, I[i, j] represent the gray-scale value of each pixel of video in window, S x[k, l] represents the value of the capable l row of Sobel horizontal edge operator k, S y[k, l] represents the value of the capable l row of Sobel vertical edge operator k;
(2) the gradient magnitude G of each pixel of calculation window image 1[i, j] and gradient direction G 2[i, j]:
G 1 [ i , j ] = G x [ i , j ] 2 + G y [ i , j ] 2 ,
Figure FSB00001116180000022
Wherein, i=1,2 ..., W 0, j=1,2 ..., H 0, arctan () is arctan function, for downward rounding operation symbol, expression is not more than 9 π × ( arctan ( G y [ i , j ] G x [ i , j ] ) + π 2 ) Maximum integer;
(3) utilize the gradient magnitude of each pixel of video in window and direction to carry out gradient orientation histogram statistics, obtain the proper vector of video in window:
A. video in window is divided into the identical connected region of size, each connected region is comprised of 8 * 8 pixels, is called a cell unit, by between a square region of composition, 2 * 2 cell unit, between interval and interval, have 50% overlapping;
In video in window, the gradient direction span of each each pixel of cell unit is 1 to 9, each cell unit is comprised of 9 passages, utilize gradient magnitude and the gradient direction of each pixel of each cell unit, gradient magnitude within the scope of each passage correspondence direction of each cell unit is added up, obtain the gradient orientation histogram statistical value of each cell unit, H[m] [p] represent the statistics with histogram value of p the passage in m cell unit in each video in window, m is variable, it is cell element numerals, since 1 by from left to right, top-down order adds 1 successively, p is variable, it is passage label, L represents the cell unit number of each video in window horizontal direction, M represents each video in window cell unit number altogether, L and M are constant, only relevant with the size of video in window, get m=1, during p=1, H[1] [1] represent the 1st cell unit, the statistics with histogram value of the 1st passage, computing formula is:
H [ 1 ] [ 1 ] = Σ i = 1 8 Σ j = 1 8 ( G 1 [ i , j ] · w [ i , j ] ) ,
Wherein, w [ i , j ] = 1 , G 2 [ i , j ] = 1 0 , G 2 [ i , j ] ≠ 1 ,
P adds 1 successively, from p=2 until p=9 obtains respectively H[1] [2], H[1] [3] ..., H[1] [9], computing formula is:
H [ 1 ] [ p ] = Σ i = 1 8 Σ j = 1 8 ( G 1 [ i , j ] · w [ i , j ] ) , p=2,3,...,9,
Wherein, w [ i , j ] = 1 , G 2 [ i , j ] = p 0 , G 2 [ i , j ] ≠ p ,
M adds 1 successively, from m=2 to m=M, often adds once, and to there being 9 statistics with histogram values, corresponding computing formula is:
Figure FSB00001116180000032
p=1,2,...,9
Wherein, w [ i , j ] = 1 , G 2 [ i , j ] = p 0 , G 2 [ i , j ] ≠ p ,
Figure FSB00001116180000034
represent to be not more than (m-1)/5;
B. by each cell unit gradient orientation histogram statistical value normalization in each interval of each video in window, extract the proper vector of video in window, S[n] represent the normalized factor of n in each video in window interval gradient orientation histogram statistical value, n is variable, it is interval label, since 1 by from left to right, top-down order adds 1 successively, in each video in window horizontal direction, comprise L-1 interval, N represents each video in window interval number altogether, N is constant, relevant with video in window size, while getting n=1, the normalized factor S[1 of the 1st interval gradient orientation histogram statistical value of video in window] be the 1st interval all cells unit all passage gradient orientation histograms statistical value and,
S [ 1 ] = Σ p = 1 9 ( H [ 1 ] [ p ] + H [ 2 ] [ p ] + H [ L + 1 ] [ p ] + H [ L + 2 ] [ p ] )
By the 1st the cell unit in the 1st interval in video in window, also be the 1st cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 1st to the 9th value of video in window proper vector, are followed successively by
Figure FSB00001116180000036
By the 2nd the cell unit in the 1st interval in video in window, also be the 2nd cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 10th to 18 values of video in window proper vector, are followed successively by
Figure FSB00001116180000037
By the 3rd the cell unit in the 1st interval in video in window, also be L+1 cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 19th to 27 values of video in window proper vector, are followed successively by
Figure FSB00001116180000038
By the 4th the cell unit in the 1st interval in video in window, also be L+2 cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 28th to 36 values of video in window proper vector, are followed successively by
Figure FSB00001116180000041
N adds 1 successively, from n=2 until n=N, calculates the normalized factor of the gradient orientation histogram statistical value in all intervals,
Wherein,
Figure FSB00001116180000043
expression is not more than
Figure FSB00001116180000044
maximum integer;
Normalized factor by the gradient orientation histogram statistical value of n each interval each passage of cell unit in video in window divided by n interval, other 36 * (N-1) individual values that further obtain video in window proper vector, the proper vector of each video in window is totally 36 * N dimension;
C. 36 * N dimensional feature vector of each video in window and the number of people disaggregated model that trains are in advance sent into support vector machine software storehouse, select mode classification and the LINEAR kernel function of ONE_CLASS to classify, judge whether this video in window is number of people image, the words that are regard this video in window as number of people video in window;
The like, repeating step (1), to step (3) until traveled through all video in windows, obtains all number of people video in windows in present frame gray level image, the label order of number of people video in window is identical with traversal order, according to from left to right, top-down principle
Each number of people is as follows in the flow process of frame matching:
(1) all number of people video in windows of n number of people video in window of present frame gray level image and previous frame gray level image are carried out to position and area matched:
Location matches parameter T ( m ) = ( p n - x m ) 2 + ( q n - y m ) 2
Area matched parameter A (m)=| Q n-S m|
N, m is variable, is respectively the label of number of people video in window in present frame and previous frame gray level image, n number of people video in window center position of present frame is [p n, q n], area is Q n, m number of people video in window center position of previous frame is [x m, y m], area is S m, n=1,2 ..., N, m=1,2 ..., M, N represents the number of people video in window number in present frame gray level image, M represents the number of people video in window number in previous frame gray level image, gets n=1, and whole values of traversal m, calculate the m value that makes location matches parameter T (m) minimum, are designated as
Figure FSB00001116180000046
the optimum matching that represents first number of people video in window of present frame is previous frame J 1individual number of people video in window, if T is (J 1)≤th 1and A (J 1)≤th 2, th 1=15, th 2=100, represent that it is J that first number of people video in window of present frame can find the number of people video in window of coupling in previous frame gray level image 1individual, if T is (J 1) > th 1or A (J 1) > th 2, by J 1set to 0, i.e. J 1=0, represent that first number of people video in window of present frame does not mate with any number of people video in window of previous frame, be emerging number of people video in window;
(2) n adds 1 successively, and from n=2 until n=N, repeating step (1), finds out the number of people video in window J of all couplings 2, J 3..., J nif, J 1, J 2..., J ndo not comprise certain the value K from 1 to M, show K number of people video in window loss of previous frame;
(3) n number of people video in window of present frame gray level image carried out to position and area matched with above the second frame to all number of people video in windows of above H frame gray level image respectively, repeat above-mentioned steps (1), (2), if there is not the situation that number of people video in window is lost, n number of people video in window of present frame gray level image is analysis window image, H is the arbitrary integer between 5 to 10
The flow process of carrying out the detection of people's face is as follows:
On analysis window image, use respectively the cascade adaboost method based on haar feature, utilize the face classification model training to carry out the detection of people's face, the number of people video in window that obtains the number of people video in window of people's face being detected and people's face do not detected,
Detecting on the number of people video in window of people's face, number of people video in window done to six deciles in vertical direction and divide, in horizontal direction by a left side region, the right side
Figure FSB00001116180000052
region is not analyzed, and the region in the middle of only analyzing, gets B wherein 1and B 2region is as analyzed area image, B 1for the 2nd region of counting from top to bottom, B 2for the 5th region of counting from top to bottom, B 1[i, j], B 2[i, j] represents respectively area image B 1, B 2in level i, the gray-scale value of vertical j pixel, calculating B 1, B 2in area image, level i is individual, the difference value D of vertical j pixel 1[i, j]:
D 1[i,j]=|B 1[i,j]-B 2[i,j]|
All pixels on [i, j] traversal area image, i=1,2 ..., W 1, j=1,2 ..., H 1, W 1represent area image B 1, B 2width, H 1represent area image B 1, B 2height, statistics D 1[i, j] > th 1pixel number, be designated as C 1, when time, adjudicate in gray level image and have masked man's face to exist, otherwise be normal person's face, th 1=18.
2. method according to claim 1, is characterized in that, is not detecting on the number of people video in window of people's face, divides two region B in the middle of getting by doing the quartern in number of people video in window horizontal direction 3, B 4as analyzed area, B 3[i, j], B 4[i, j] represents respectively region B 3, B 4the gray-scale value of level i vertical j pixel, the average of gray-scale value is respectively E 1, E 2:
E 1 = Σ i = 1 W 2 Σ j = 1 H 2 B 3 [ i , j ]
E 2 = Σ i = 1 W 2 Σ j = 1 H 2 B 4 [ i , j ]
W 2represent area image B 3, B 4width, H 2represent area image B 3, B 4height, the average difference in two regions is Δ E=|E 1-E 2|, as Δ E > th 2, show to have people from side face or the behavior of bowing, not masked man's face, th 2=25.
3. method according to claim 1, is characterized in that, at the fooled Δ E≤th of the number of people video in window that people's face do not detected 2time, number of people video in window is done to six deciles in vertical direction and divides, in horizontal direction by a left side
Figure FSB00001116180000062
region, the right side
Figure FSB00001116180000063
region is not analyzed, and the region in the middle of only analyzing, gets B wherein 5and B 2region is as analyzed area, B 5for the 3rd region of counting from top to bottom, B 2for the 5th region of counting from top to bottom, B 5[i, j], B 2[i, j] represents respectively region B 5, B 2in level i, the gray-scale value of vertical j pixel, calculating B 5, B 2in region, level i is individual, the difference value D of vertical j pixel 2[i, j]:
D 2[i,j]=|B 5[i,j]-B 2[i,j]
I=1,2 ..., W 3, j=1,2 ..., H 3, W 3represent area image B 5, B 2width, H 3represent region B 5, B 2height, statistics D 2[i, j] > th 3pixel number, be designated as C 2, when
Figure FSB00001116180000064
time, adjudicate in gray level image and have masked man's face to exist, otherwise be normal person's face, th 3=18.
4. method according to claim 1, the production process that it is characterized in that carrying out the number of people disaggregated model described in number of people detection is as follows:
(1) collect the positive negative sample of the number of people, the 5000 secondary gray scale pictures that comprise head shoulder of take are positive sample, do not comprise 10000 secondary gray scale pictures of the number of people as negative sample, and sample size is consistent;
(2) extract the gradient orientation histogram statistical value of positive negative sample, and by gradient orientation histogram normalization, each value using the value after normalization in proper vector, the method for the gray level image extraction proper vector that method converts to video image with moving window method extraction monitoring site is obtained is identical;
(3) 5000 positive samples that extract and the proper vector of 10000 negative samples are input in support vector machine software storehouse, select the mode classification of ONE_CLASS and LINEAR kernel function to train, obtain an optimum number of people disaggregated model.
5. method according to claim 4, the production process that it is characterized in that carrying out the face classification model described in the detection of people's face is as follows:
(1) collect 5000 secondary people face gray scale pictures as positive sample, unified 20 * 20 the pixel size that zooms to, collects the gray scale picture of arbitrary size of 10000 secondary unmanned faces as negative sample;
(2) increase by 6 haar tagsort devices, the position in shape, area-of-interest for feature and scale-up factor that each tagsort device is used define, and are followed successively by:
A. tagsort device 1, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle upper left corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
B. tagsort device 2, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle lower left corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
C. tagsort device 3, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle upper right corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
D. tagsort device 4, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle lower right corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
E. tagsort device 5, whole rectangular area is 7 * 1 rectangle, in horizontal direction, there are 7 pixels, in vertical direction, there is 1 pixel, black rectangle region is rectangle level the 5th, the region at 6 pixel places, respond into pixel in whole rectangular area and 2 demultiplication black removal rectangular areas in pixel and 7 times;
F. tagsort device 6, whole rectangular area is 7 * 1 rectangle, in horizontal direction, there are 7 pixels, in vertical direction, there is 1 pixel, black rectangle region is rectangle level the 2nd, the region at 3 pixel places, respond into pixel in whole rectangular area and 2 demultiplication black removal rectangular areas in pixel and 7 times;
(3) utilize the haartraining storehouse training training pattern in opencv to obtain people's face disaggregated model.
CN201210052716.7A 2012-03-02 2012-03-02 Method for detecting mask faces in video monitor Expired - Fee Related CN102622584B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210052716.7A CN102622584B (en) 2012-03-02 2012-03-02 Method for detecting mask faces in video monitor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210052716.7A CN102622584B (en) 2012-03-02 2012-03-02 Method for detecting mask faces in video monitor

Publications (2)

Publication Number Publication Date
CN102622584A CN102622584A (en) 2012-08-01
CN102622584B true CN102622584B (en) 2014-03-12

Family

ID=46562494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210052716.7A Expired - Fee Related CN102622584B (en) 2012-03-02 2012-03-02 Method for detecting mask faces in video monitor

Country Status (1)

Country Link
CN (1) CN102622584B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218600B (en) * 2013-03-29 2017-05-03 四川长虹电器股份有限公司 Real-time face detection algorithm
CN103544753B (en) * 2013-10-11 2015-12-09 深圳市捷顺科技实业股份有限公司 A kind of banister control method and system
CN103761516B (en) * 2014-02-14 2017-06-06 重庆科技学院 ATM abnormal human face detection based on video monitoring
CN104980692A (en) * 2014-04-04 2015-10-14 中国移动通信集团公司 Monitor method, monitor device, monitor system and server
CN104657712B (en) * 2015-02-09 2017-11-14 惠州学院 Masked man's detection method in a kind of monitor video
CN106022278A (en) * 2016-05-26 2016-10-12 天津艾思科尔科技有限公司 Method and system for detecting people wearing burka in video images
CN108734178A (en) * 2018-05-18 2018-11-02 电子科技大学 A kind of HOG feature extracting methods of rule-basedization template
CN110276309B (en) * 2019-06-25 2021-05-28 新华智云科技有限公司 Video processing method, video processing device, computer equipment and storage medium
CN110334670B (en) * 2019-07-10 2021-08-17 北京迈格威科技有限公司 Object monitoring method and device, electronic equipment and storage medium
CN110633648B (en) * 2019-08-21 2020-09-11 重庆特斯联智慧科技股份有限公司 Face recognition method and system in natural walking state
CN111899449A (en) * 2020-07-21 2020-11-06 深圳信息职业技术学院 Indoor security device and alarm control system based on image processing
CN111860456B (en) * 2020-08-04 2024-02-02 广州市微智联科技有限公司 Face recognition method
CN113283378B (en) * 2021-06-10 2022-09-27 合肥工业大学 Pig face detection method based on trapezoidal region normalized pixel difference characteristics

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101369310B (en) * 2008-09-27 2011-01-12 北京航空航天大学 Robust human face expression recognition method
CN102169544A (en) * 2011-04-18 2011-08-31 苏州市慧视通讯科技有限公司 Face-shielding detecting method based on multi-feature fusion

Also Published As

Publication number Publication date
CN102622584A (en) 2012-08-01

Similar Documents

Publication Publication Date Title
CN102622584B (en) Method for detecting mask faces in video monitor
CN102521565B (en) Garment identification method and system for low-resolution video
CN102629384B (en) Method for detecting abnormal behavior during video monitoring
CN102214291B (en) Method for quickly and accurately detecting and tracking human face based on video sequence
Li et al. Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection
CN103886308B (en) A kind of pedestrian detection method of use converging channels feature and soft cascade grader
CN103824070B (en) A kind of rapid pedestrian detection method based on computer vision
CN103186775B (en) Based on the human motion identification method of mix description
Zhou et al. LIDAR and vision-based real-time traffic sign detection and recognition algorithm for intelligent vehicle
CN102214309B (en) Special human body recognition method based on head and shoulder model
CN100463000C (en) Human eye state detection method based on cascade classification and hough circle transform
CN106127812B (en) A kind of passenger flow statistical method of the non-gate area in passenger station based on video monitoring
CN106127148A (en) A kind of escalator passenger's unusual checking algorithm based on machine vision
CN106127137A (en) A kind of target detection recognizer based on 3D trajectory analysis
CN103310194A (en) Method for detecting head and shoulders of pedestrian in video based on overhead pixel gradient direction
CN106203260A (en) Pedestrian's recognition and tracking method based on multiple-camera monitoring network
CN104166841A (en) Rapid detection identification method for specified pedestrian or vehicle in video monitoring network
CN110032932B (en) Human body posture identification method based on video processing and decision tree set threshold
CN105046206B (en) Based on the pedestrian detection method and device for moving prior information in video
CN106886216A (en) Robot automatic tracking method and system based on RGBD Face datections
CN102156880A (en) Method for detecting abnormal crowd behavior based on improved social force model
CN103020614B (en) Based on the human motion identification method that space-time interest points detects
CN106709438A (en) Method for collecting statistics of number of people based on video conference
CN113435336B (en) Running intelligent timing system and method based on artificial intelligence
CN103065163B (en) A kind of fast target based on static images detects recognition system and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Method for detecting mask faces in video monitor

Effective date of registration: 20150119

Granted publication date: 20140312

Pledgee: The Agricultural Bank of Chengdu branch of Limited by Share Ltd. China Chengdu

Pledgor: CHENGDU SANTAI ELECTRONIC INDUSTRY Co.,Ltd.

Registration number: 2015510000005

PLDC Enforcement, change and cancellation of contracts on pledge of patent right or utility model
C56 Change in the name or address of the patentee

Owner name: CHENGDU SANTAI HOLDINGS GROUP CO., LTD.

Free format text: FORMER NAME: CHENGDU SANTAI ELECTRONIC INDUSTRY CO., LTD.

CP03 Change of name, title or address

Address after: 610041 No. 42 Shu West Road, Jinniu District hi tech Industrial Park, Sichuan, Chengdu

Patentee after: CHENGDU SANTAI HOLDING GROUP CO.,LTD.

Address before: 610091, No. 42 Shu West Road, Tsing Jinniu District hi tech Industrial Park, Sichuan, Chengdu

Patentee before: CHENGDU SANTAI ELECTRONIC INDUSTRY Co.,Ltd.

PLDC Enforcement, change and cancellation of contracts on pledge of patent right or utility model
PM01 Change of the registration of the contract for pledge of patent right

Change date: 20160205

Registration number: 2015510000005

Pledgor after: CHENGDU SANTAI HOLDING GROUP CO.,LTD.

Pledgor before: CHENGDU SANTAI ELECTRONIC INDUSTRY Co.,Ltd.

PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20160229

Granted publication date: 20140312

Pledgee: The Agricultural Bank of Chengdu branch of Limited by Share Ltd. China Chengdu

Pledgor: CHENGDU SANTAI HOLDING GROUP CO.,LTD.

Registration number: 2015510000005

PLDC Enforcement, change and cancellation of contracts on pledge of patent right or utility model
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Method for detecting mask faces in video monitor

Effective date of registration: 20160317

Granted publication date: 20140312

Pledgee: The Agricultural Bank of Chengdu branch of Limited by Share Ltd. China Chengdu

Pledgor: CHENGDU SANTAI HOLDING GROUP CO.,LTD.

Registration number: 2016510000008

PLDC Enforcement, change and cancellation of contracts on pledge of patent right or utility model
PC01 Cancellation of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20210129

Granted publication date: 20140312

Pledgee: The Agricultural Bank of Chengdu branch of Limited by Share Ltd. China Chengdu

Pledgor: CHENGDU SANTAI HOLDING GROUP Co.,Ltd.

Registration number: 2016510000008

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210225

Address after: No. 1305, unit 1, building 1, No. 1700, North Tianfu Avenue, Chengdu hi tech Zone, China (Sichuan) pilot Free Trade Zone, Chengdu, Sichuan 610093

Patentee after: CHENGDU SANTAI INTELLIGENT TECHNOLOGY Co.,Ltd.

Address before: No.42 Shuxi Road, high tech Industrial Park, Jinniu District, Chengdu, Sichuan 610041

Patentee before: CHENGDU SANTAI HOLDING GROUP Co.,Ltd.

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140312