Summary of the invention:
The object of this invention is to provide a kind ofly can judge in video image whether have masked man's face, prevent the malfeasant masked man's face detecting method of masked camouflage.
The present invention includes following steps:
1. color video frequency image monitoring site being obtained is converted to gray level image;
2. gray level image is carried out to convergent-divergent;
3. at the enterprising pedestrian's head of gray level image, detect, while the number of people having been detected, enter step below, without number of people circulation step 1 to 3;
4. each number of people is mated in interframe;
5. carry out the detection of people's face;
6. carry out masked judgement, to ruling out the original color video image of masked man's face, carry out mark, and report to the police.
In step 3, adopt moving window method, edge is from left to right, top-down direction, moves moving window by pixel, gray level image is divided into the video in window of corresponding each moving window, video in window is carried out to number of people detection, when moving window is positioned at first video in window:
(1) the horizontal gradient G of each pixel of calculation window image
x[i, j] and VG (vertical gradient) G
y[i, j]:
A.G
x[i, j] and G
y[i, j] initialization:
G
x[i, j] and G
yin [i, j], the value initialization of each pixel is 0, all pixels on [i, j] cycling among windows image, and i is variable, represents the horizontal level of pixel in video in window, value is i=1,2 ..., W
0, j is variable, represents the upright position of pixel in video in window, value is j=1, and 2 ..., H
0, W
0, H
0be respectively width and the height of video in window;
B. on video in window, calculate the horizontal gradient G of each pixel
x[i, j] and VG (vertical gradient) G
y[i, j]:
Using Sobel horizontal edge operator as computing template, translation computing template center is to each pixel place, each pixel and corresponding the multiplying each other of each element of computing template in image-region under computing template is covered, all sum of products are as the horizontal gradient G of each pixel
x[i, j], using Sobel vertical edge operator as computing template, obtains the VG (vertical gradient) G of each pixel
y[i, j], for preventing crossing the border, does not process and goes up most, under, the most left, the pictorial element on the rightest four limits, also work as j=1, i=1,2 ... W
0or j=H
0, i=1,2 ..., W
0or i=1, j=1,2 ... H
0or i=W
0, j=1,2 ... H
0time, G
x[i, j] and G
y[i, j] is initial value 0, like this, works as i=2, and 3 ..., W
0-1, j=2,3 ..., H
0-1 o'clock,
Wherein, I[i, j] represent the gray-scale value of each pixel of video in window, S
x[k, l] represents the value of the capable l row of Sobel horizontal edge operator k, S
y[k, l] represents the value of the capable l row of Sobel vertical edge operator k;
(2) the gradient magnitude G of each pixel of calculation window image
1[i, j] and gradient direction G
2[i, j]:
Wherein, i=1,2 ..., W
0, j=1,2 ..., H
0, arctan () is arctan function,
for downward rounding operation symbol,
expression is not more than
Maximum integer;
(3) utilize the gradient magnitude of each pixel of video in window and direction to carry out gradient orientation histogram statistics, obtain the proper vector of video in window:
Video in window is divided into the identical connected region of size, and each connected region is comprised of 8 * 8 pixels, is called a cell unit, by between a square region of composition, 2 * 2 cell unit, between interval and interval, have 50% overlapping;
A. the gradient orientation histogram statistical value of each cell unit in calculation window image:
In video in window, the gradient direction span of each pixel is 1 to 9, each cell unit is comprised of 9 passages, utilize gradient magnitude and the gradient direction of each pixel of each cell unit, gradient magnitude within the scope of each passage correspondence direction of each cell unit is added up, obtain the gradient orientation histogram statistical value of each cell unit, H[m] [p] represent the statistics with histogram value of p the passage in m cell unit in each video in window, m is variable, it is cell element numerals, since 1 by from left to right, top-down order adds 1 successively, p is variable, it is passage label, L represents the cell unit number of each video in window horizontal direction, M represents each video in window cell unit number altogether, L and M are constant, only relevant with the size of video in window, get m=1, during p=1, H[1] [1] represent the 1st cell unit, the statistics with histogram value of the 1st passage, computing formula is:
Wherein,
P adds 1 successively, from p=2 until p=9 obtains respectively H[1] [2], H[1] [3] ..., H[1] [9], computing formula is:
p=2,3,...,9,
Wherein,
M adds 1 successively, from m=2 to m=M, often adds once, and to there being 9 statistics with histogram values, corresponding computing formula is:
Wherein,
represent not wooden in (m-1)/5;
B. by each cell unit gradient orientation histogram statistical value normalization in each interval of each video in window, extract the proper vector of video in window:
S[n] represent the normalized factor of n in each video in window interval gradient orientation histogram statistical value, n is variable, it is interval label, since 1 by from left to right, top-down order adds 1 successively, in each video in window horizontal direction, comprise L-1 interval, N represents each video in window interval number altogether, N is constant, relevant with video in window size, while getting n=1, the normalized factor S[1 of the 1st interval gradient orientation histogram statistical value of video in window] be the 1st interval all cells unit all passage gradient orientation histograms statistical value and
By the 1st the cell unit in the 1st interval in video in window, also be the 1st cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 1st to the 9th value of video in window proper vector, are followed successively by
By the 2nd the cell unit in the 1st interval in video in window, also be the 2nd cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 10th to 18 values of video in window proper vector, are followed successively by
By the 3rd the cell unit in the 1st interval in video in window, also be L+1 cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 19th to 27 values of video in window proper vector, are followed successively by
By the 4th the cell unit in the 1st interval in video in window, also be L+2 cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 28th to 36 values of video in window proper vector, are followed successively by
N adds 1 successively, from n=2 until n=N, calculates the normalized factor of the gradient orientation histogram statistical value in all intervals,
Wherein,
expression is not more than
maximum integer;
Normalized factor by the gradient orientation histogram statistical value of n each interval each passage of cell unit in video in window divided by n interval, other 36 * (N-1) individual values that further obtain video in window proper vector, the proper vector of each video in window is totally 36 * N dimension;
C. 36 * N dimensional feature vector of each video in window and the number of people disaggregated model that trains are in advance sent into support vector machine software storehouse, select mode classification and the LINEAR kernel function of ONE_CLASS to classify, judge whether this video in window is number of people image, the words that are regard this video in window as number of people video in window;
The like, repeating step (1), to step (3) until traveled through all video in windows, obtains all number of people video in windows in present frame gray level image, and the label order of number of people video in window is identical with traversal order, according to from left to right, top-down principle.
The flow process of execution step 4 is as follows:
(1) all number of people video in windows of n number of people video in window of present frame gray level image and previous frame gray level image are carried out to position and area matched:
Location matches parameter
Area matched parameter A (m)=| Q
n-S
m|
N, m is variable, is respectively the label of number of people video in window in present frame and previous frame gray level image, n number of people video in window center position of present frame is [p
n, q
n], area is Q
n, m number of people video in window center position of previous frame is [x
m, y
m], area is S
m, n=1,2 ..., N, m=1,2 ..., M, N represents the number of people video in window number in present frame gray level image, M represents the number of people video in window number in previous frame gray level image, gets n=1, and whole values of traversal m, calculate the m value that makes location matches parameter T (m) minimum, are designated as
the optimum matching that represents first number of people video in window of present frame is previous frame J
1individual number of people video in window, if T is (J
1)≤th
1and A (J
1)≤th
2, th
1=15, th
2=100, represent that it is J that first number of people video in window of present frame can find the number of people video in window of coupling in previous frame gray level image
1individual, if T is (J
1) > th
1or A (J
1) > th
2, by J
1set to 0, i.e. J
1=0, represent that first number of people video in window of present frame does not mate with any number of people video in window of previous frame, be emerging number of people video in window;
(2) n adds 1 successively, and from n=2 until n=N, repeating step (1), finds out the number of people video in window J of all couplings
2, J
3..., J
nif, J
1, J
2..., J
ndo not comprise certain the value K from 1 to M, show K number of people video in window loss of previous frame;
(3) n number of people video in window of present frame gray level image carried out to position and area matched with above the second frame to all number of people video in windows of above H frame gray level image respectively, repeat above-mentioned steps (1), (2), if there is not the situation that number of people video in window is lost, n number of people video in window of present frame gray level image is analysis window image, and H is the arbitrary integer between 5 to 10.
The flow process of execution step 5 is as follows:
On analysis window image, use respectively the cascade adaboost method based on haar feature, utilize the face classification model training to carry out the detection of people's face, the number of people video in window that obtains the number of people video in window of people's face being detected and people's face do not detected.
The flow process of execution step 6 is as follows:
(1) detecting on the number of people video in window of people's face, number of people video in window done to six deciles in vertical direction and divide, in horizontal direction by a left side
region, the right side
region is not analyzed, and the region in the middle of only analyzing, gets B wherein
1and B
2region is as analyzed area image, B
1for the 2nd region of counting from top to bottom, B
2for the 5th region of counting from top to bottom, B
1[i, j], B
2[i, j] represents respectively area image B
1, B
2in level i, the gray-scale value of vertical j pixel, calculating B
1, B
2in area image, level i is individual, the difference value D of vertical j pixel
1[i, j]:
D
1[i,j]=|B
1[i,j]-B
2[i,j]|
All pixels on [i, j] traversal area image, i=1,2 ..., W
1, j=1,2 ..., H
1, W
1represent area image B
1, B
2width, H
1represent area image B
1, B
2height, statistics D
1[i, j] > th
1pixel number, be designated as C
1, when
time, adjudicate in gray level image and have masked man's face to exist, otherwise be normal person's face, th
1=18;
(2) do not detecting on the number of people video in window of people's face, by doing the quartern in number of people video in window horizontal direction, dividing two region B in the middle of getting
3, B
4as analyzed area, B
3[i, j], B
4[i, j] represents respectively region B
3, B
4the gray-scale value of level i vertical j pixel, the average of gray-scale value is respectively E
1, E
2:
W
2represent area image B
3, B
4width, H
2represent area image B
3, B
4height, the average difference in two regions is Δ E=|E
1-E
2|, as Δ E > th
2, show to have people from side face or the behavior of bowing, not masked man's face, th
2=25;
(3) at the fooled Δ E≤th of the number of people video in window that people's face do not detected
2time, number of people video in window is done to six deciles in vertical direction and divides, in horizontal direction by a left side
region, the right side
region is not analyzed, and the region in the middle of only analyzing, gets B wherein
5and B
2region is as analyzed area, B
5for the 3rd region of counting from top to bottom, B
2for the 5th region of counting from top to bottom, B
5[i, j], B
2[i, j] represents respectively region B
5, B
2in level i, the gray-scale value of vertical j pixel, calculating B
5, B
2in region, level i is individual, the difference value D of vertical j pixel
2[i, j]:
D
2[i,j]=|B
5[i,j]-B
2[i,j]|
I=1,2 ..., W
3, j=1,2 ..., H
3, W
3represent area image B
5, B
2width, H
3represent region B
5, B
2height, statistics D
2[i, j] > th
3pixel number, be designated as C
2, when
time, adjudicate in gray level image and have masked man's face to exist, otherwise be normal person's face, th
3=18.
When being determined with masked man's face, location, position mark picture frame by this number of people video in window in former video color image, upload alarm center.
In step 3, the production process of number of people disaggregated model is as follows:
(1) collect the positive negative sample of the number of people, the 5000 secondary gray scale pictures that comprise head shoulder of take are positive sample, do not comprise 10000 secondary gray scale pictures of the number of people as negative sample, and sample size is consistent;
(2) extract the gradient orientation histogram statistical value of positive negative sample, and by gradient orientation histogram normalization, each value using the value after normalization in proper vector, the method for the gray level image extraction proper vector that method converts to video image with moving window method extraction monitoring site is obtained is identical;
(3) 5000 positive samples that extract and the proper vector of 10000 negative samples are input in support vector machine software storehouse, select the mode classification of ONE_CLASS and LINEAR kernel function to train, obtain an optimum number of people disaggregated model.
In step 5, the production process of face classification model is as follows:
(1) collect 5000 secondary people face gray scale pictures as positive sample, unified 20 * 20 the pixel size that zooms to, collects the gray scale picture of arbitrary size of 10000 secondary unmanned faces as negative sample;
(2) increase by 6 haar tagsort devices, the position in shape, area-of-interest for feature and scale-up factor that each tagsort device is used define, and are followed successively by:
A. tagsort device 1, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle upper left corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
B. tagsort device 2, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle lower left corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
C. tagsort device 3, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle upper right corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
D. tagsort device 4, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle lower right corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
E. tagsort device 5, whole rectangular area is 7 * 1 rectangle, in horizontal direction, there are 7 pixels, in vertical direction, there is 1 pixel, black rectangle region is rectangle level the 5th, the region at 6 pixel places, respond into pixel in whole rectangular area and 2 demultiplication black removal rectangular areas in pixel and 7 times;
F. tagsort device 6, whole rectangular area is 7 * 1 rectangle, in horizontal direction, there are 7 pixels, in vertical direction, there is 1 pixel, black rectangle region is rectangle level the 2nd, the region at 3 pixel places, respond into pixel in whole rectangular area and 2 demultiplication black removal rectangular areas in pixel and 7 times;
(3) utilize the haartraining storehouse training training pattern in opencv to obtain people's face disaggregated model.
The present invention utilizes image processing techniques and mode identification method to analyze the human body occurring in video image, so in the judgement number of people people's face existence whether, for the judgement of masked detection.Research tool for masked identification in video monitoring has very important significance, and can effectively prevent the lawbreaking activities of masked camouflage.
Embodiment:
The invention provides the masked man's face detecting method in a kind of video monitoring, the system architecture diagram of the method as shown in Figure 1, comprises video acquisition unit, masked detecting unit and alarm unit.
The major function of video acquisition unit is by general-purpose simulation camera, monitoring scene to be taken and obtained analog video image, then by general video frequency collection card, is converted to Digital Image Data.Here to the setting height(from bottom) of camera and angle, be to have certain requirements, the installation of camera will make the head shoulder region of human body all appear in video pictures, setting up of video camera will make can know and show people's face positive information in video pictures, with best over against people's face, so require closely positive shooting of camera.
The major function of masked detecting unit is that the color digital image data to sending into are converted to gray level image, then on gray level image, detect and whether have masked people's face to exist, in order to improve detection efficiency, before detection, first gray level image is dwindled, gray level image is reduced into the standard detection gray-scale map of 176 * 144 pixel size.If while having masked man's face to exist, the position of extracting this masked man's face, and in the corresponding position of original color image, this masked people's face is marked.
If masked man's face detected at masked detecting unit, exist, report to the police, will have the image uploading alarm unit of masked mark.
The invention provides the masked man's face detecting method in a kind of video monitoring, the method as shown in Figure 2, specifically comprises the steps:
One, coloured image is converted to gray level image s1
Each step of masked man's face detecting method of mentioning in the present invention is carried out on gray level image basis, so first coloured image will be converted to gray level image.
Two, gray level image is carried out to convergent-divergent s2
In order to improve detection efficiency, gray level image has been carried out to reduction operation, image is reduced into 176 * 144 pixel size, image scaling need to be at treatment effeciency, in the smoothness of result and sharpness, do a balance, the current method comparative maturity of image scaling, not in research range of the present invention, each step of mentioning is below to carry out on the basis of the gray level image after dwindling.
Three, at the enterprising pedestrian's head of gray level image, detect s3
Image after step 2 convergent-divergent, adopt moving window method, moving window size is 40 * 40, and horizontal scanning step-length is 3, and vertical scanning step-length is 2, edge from left to right, top-down direction, moves moving window by pixel, until scan complete image, window gray level image corresponding under each moving window is carried out respectively to number of people detection, hereinafter to be referred as video in window;
I represents the horizontal ordinate position of certain point in video in window, and j represents the ordinate position of certain point in video in window, I[i, j] represent the gray-scale value of the video in window that pixel [i, j] is located, each pixel of [i, j] cycling among windows image, video in window width is W
0=40, be highly H
0=40, when moving window is positioned at first video in window:
1. each pixel compute gradient direction and size of pair video in window, be specially:
(1) horizontal gradient of each pixel of calculation window image and VG (vertical gradient)
First, by the horizontal gradient G of each pixel of video in window
x[i, j] and VG (vertical gradient) G
y[i, j] is all initially set to 0:
G
x[i,j]=0,i=1,2,...,W
0,j=1,2,...,H
0 (1-a)
G
y[i,j]=0,i=1,2,...,W
0,j=1,2,...,H
0 (1-b)
On video in window, using boundary operator as computing template, from left to right, translation computing template is located to each pixel [i, j] successively from top to bottom, for preventing crossing the border, do not process and go up most,, the most left, the pictorial element on the rightest four limits, calculation template and pixel [i, j] weighting of field gray-scale value, each respective value of the corresponding computing of each weighted value wherein, obtains the horizontal gradient G of each pixel
x[i, j] and VG (vertical gradient) G
y[i, j], boundary operator can be got Roberts boundary operator, Sobel boundary operator, Prewitt boundary operator and Kirsch boundary operator, the present invention is with Sobel boundary operator
With
For example describes, horizontal gradient G
x[i, j] and VG (vertical gradient) G
ythe computing method of [i, j] are:
i=2,3,...,W
0-1,j=2,3,...,H
0-1 (2-a)
i=2,3,...,W
0-1,j=2,3,...,H
0-1 (2-b)
Wherein, I[i, j] represent the gray-scale value of each pixel of video in window, S
x[k, l] represents the value of the capable l row of Sobel horizontal edge operator k, S
y[k, l] represents the value of the capable l row of Sobel vertical edge operator k;
(2) the gradient magnitude G at each pixel place of calculation window image
1[i, j] and gradient direction G
2[i, j]:
i=1,2,...,W
0,j=1,2,...,H
0 (3-a)
i=1,2,...,W
0,j=1,2,...,H
0 (3-b)
Wherein, arctan () is arctan function,
for downward rounding operation symbol,
expression is not more than
maximum integer; P represents passage number, can get the arbitrary integer between 2 to 180, and in embodiments of the present invention, the P=9 of take describes as example, gradient direction G
2[i, j] can be expressed as:
2. utilize the gradient magnitude and the direction that obtain to carry out histogram of gradients statistics, each value using the value after the normalization of histogram of gradients statistical value as proper vector:
First video in window is divided into the connected region that size is identical, each connected region is exactly a cell unit cell, then adds up the gradient orientation histogram of each pixel in cell unit.In order to adapt to better illumination variation and shade impact, a plurality of cell are formed to an interval block, the cell in each block is carried out to the normalization of gradient orientation histogram statistical value.
The pixel number that every row of each cell comprises can be the arbitrary integer between 2 to 20, the pixel number that every row of each cell comprise can be the arbitrary integer between 2 to 20, the pixel number that the every row of each block comprises can be the arbitrary integer between 4 to 40, and the pixel number that every row of each block comprise can be the arbitrary integer between 4 to 40.With each cell, have 8 * 8 pixels in embodiments of the present invention, each block consists of example by 2 * 2=4 cell and describes.Be illustrated in figure 3 the schematic diagram of a block in this embodiment, the video in window for 40 * 40, has 25 cell, has 50% overlap between block and block, and image block number altogether has 4 * 4=16.In each cell, have 9 direction passages, the feature of each block has 4 * 9=36 feature, always has 16 * 36=576 dimensional feature.Computation process is as follows:
(1) the gradient orientation histogram statistical value of each cell unit in calculation window image
Each pixel in video in window in each cell unit is certain histogram passage ballot based on gradient direction, and the gradient magnitude of this pixel is as ballot weights.With H[m] [p] represent the statistics with histogram value of p the passage in m cell unit in each video in window, m is variable, is cell element numerals, since 1, by from left to right, it is variable that top-down order adds 1, p successively, it is passage label, L represents the cell unit number of each video in window horizontal direction, and M represents each video in window cell unit number altogether, and L and M are constant, only relevant with the size of video in window, in embodiments of the present invention, L=5, M=25;
Get m=1, during p=1, H[1] [1] represent the 1st cell unit, the statistics with histogram value of the 1st passage, computing formula is:
Wherein,
P adds 1 successively, from p=2 until p=9 obtains respectively H[1] [2], H[1] [3] ..., H[1] [9], computing formula is:
Wherein,
M adds 1 successively, from m=2 to m=M, often adds once, and to there being 9 statistics with histogram values, corresponding computing formula is:
Wherein,
represent to be not more than the maximum integer of (m-1)/5;
(2), by each cell unit gradient orientation histogram statistical value normalization in each interval of each video in window, extract the proper vector of video in window.
Each cell unit gradient orientation histogram statistical value in each interval of each video in window is added up, as normalized factor.S[n] represent the normalized factor of n in each video in window interval gradient orientation histogram statistical value, n is variable, it is interval label, since 1 by from left to right, top-down order adds 1 successively, in each video in window horizontal direction, comprise L-1 interval, N represents each video in window interval number altogether, N is constant, relevant with video in window size, in embodiments of the present invention, each video in window interval number N=16 altogether, has 4 intervals in each video in window horizontal direction;
While getting n=1n=1, the normalized factor S[1 of the 1st interval gradient orientation histogram statistical value of video in window] be the 1st interval all cells unit all passage gradient orientation histograms statistical value and,
By the 1st the cell unit in the 1st interval in video in window, also be the 1st cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 1st to the 9th value of video in window proper vector, are followed successively by
By the 2nd the cell unit in the 1st interval in video in window, also be the 2nd cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 10th to 18 values of video in window proper vector, are followed successively by
By the 3rd the cell unit in the 1st interval in video in window, also be the 6th cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 19th to 27 values of video in window proper vector, are followed successively by
By the 4th the cell unit in the 1st interval in video in window, also be the 7th cell unit of video in window, the gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval, and totally 9 values, as the 28th to 36 values of video in window proper vector, are followed successively by
N adds 1 successively, from n=2 until n=N, calculates the normalized factor of the gradient orientation histogram statistical value in all intervals, and computing method are:
Wherein,
expression is not more than
maximum integer;
Normalized factor by the gradient orientation histogram statistical value of each each passage of cell unit in n interval in video in window divided by nn interval, further obtain other 15 * 36 values of video in window proper vector, the proper vector of each video in window is totally 16 * 36=576 dimension;
576 dimensional feature vectors of each video in window and the number of people disaggregated model training are in advance sent into support vector machine software storehouse, select mode classification and the LINEAR kernel function of ONE_CLASS to classify, judge whether this video in window is number of people image, the words that are regard this video in window as number of people video in window;
The like, until scanned all video in windows, obtain all number of people video in windows in present image, obtain all number of people video in windows in present frame gray level image, the label order of number of people video in window is identical with traversal order, according to from left to right, top-down principle, describes in detail in the 7th part for headform's training.
Four, each number of people is mated to s4 in interframe
Masked man's face occurs in video, can not flash and mistake.Therefore, on the basis of step 3, the number of people detecting is followed the tracks of, the interference causing to get rid of the target of instantaneous appearance, specific implementation process is as shown in Figure 4.
Suppose M number of people video in window to be detected in previous frame gray level image, present frame gray level image detects N number of people video in window, n, m is variable, is respectively the label of number of people video in window in present frame and previous frame gray level image, n=1,2, ..., N, m=1,2, ..., M, m number of people video in window center position of previous frame is [x
m, y
m], area is S
m, n number of people video in window center position of present frame is [p
n, q
n], area is Q
n.
(1) n the number of people video in window to present frame gray level image, carries out position and area matched with all number of people video in windows of previous frame gray level image.The poor T of center position (m) and the area difference A (m) of n number of people video in window of present frame and m number of people video in window of previous frame are respectively:
A(m)=|Q
n-S
m| (6-b)
Get n=1, whole values of traversal m, calculate the m value that makes location matches parameter T (m) minimum, are designated as
the optimum matching that represents first number of people video in window of present frame is previous frame J
1individual number of people video in window, if T is (J
1)≤th
1and A (J
1)≤th
2, th
1for the threshold value of number of people video in window center interframe variation, in embodiments of the present invention th
1=15, th
2for number of people video in window area interframe change threshold, in embodiments of the present invention th
2=100, represent that it is J that first number of people video in window of present frame can find the number of people video in window of coupling in previous frame gray level image
1individual, if T is (J
1) > th
1or A (J
1) > th
2, by J
1set to 0, i.e. J
1=0, represent that first number of people video in window of present frame does not mate with any number of people video in window of previous frame, be emerging number of people video in window;
(2) n adds 1 successively, and from n=2 until n=N, repeating step (1), finds out the number of people video in window J of all couplings
2, J
3..., J
nif, J
1, J
2..., J
ndo not comprise certain the value K from 1 to M, show K number of people video in window loss of previous frame;
(3) n number of people video in window of present frame gray level image carried out to position and area matched with above the second frame to all number of people video in windows of above H frame gray level image respectively, repeat above-mentioned steps (1), (2), if there is not the situation that number of people video in window is lost, n number of people video in window of present frame gray level image is analysis window image, H is the arbitrary integer between 5 to 10, in embodiments of the present invention, and H=8.
Five, people's face detects s5
On the basis of step 4, when having continuous number of people video in window to exist, on analysis window image, use respectively the cascade adaboost method based on haar, the face classification model that utilization trains carries out the detection of people's face, the number of people video in window that obtains the number of people video in window of people's face being detected and people's face do not detected, in embodiments of the present invention, the method detecting at the enterprising pedestrian's face of each analysis window image is identical.
Six, masked judgement s6
The flow process of carrying out masked decision-making on the basis that the number of people detects and people's face detects is as follows:
1. detecting on the number of people video in window of people's face, according to the method shown in Fig. 5, number of people video in window is done to six deciles in vertical direction and divides, in horizontal direction by a left side
region, the right side
region is not analyzed, and the region in the middle of only analyzing, gets B wherein
1and B
2region is as analyzed area, B
1for the 2nd region of counting from top to bottom, B
1for the 5th region of counting from top to bottom, region B
1with region B
2wide identical, for number of people video in window width
region B
1with region B
2height identical, for number of people video in window height
b
1[i, j], B
2[i, j] represents respectively region B
1, B
2in level i, the gray-scale value of vertical j pixel, calculating B
1, B
2in region, level i is individual, the difference value D of vertical j pixel
1[i, j], from left to right, scanning successively from top to bottom, by B in figure
1region and B
2the gray-scale value of corresponding position, region is done difference, D
1[i, j] represents B
1region and B
2zone level i, the difference value at vertical j pixel place, the computing method of difference value are:
D
1[i,j]=|B
1[i,j]-B
2[i,j] (7)
[i, j] travels through region B
1with region B
2all pixels, i=1,2 ..., W
1, j=1,2 ..., H
1, W
1represent region B
1, B
2width, H
1represent region B
1, B
2height, statistics D
1[i, j] > th
1pixel number, be designated as C
1, when
time, adjudicate in gray level image and have masked man's face to exist, otherwise be normal person's face, in embodiments of the present invention, th
1=18;
2. not detecting on the number of people video in window of people's face, according to the method shown in Fig. 6, by doing the quartern in number of people video in window horizontal direction, divide two region B in the middle of getting
3, B
4as analyzed area, region B
3, B
4wide identical, for number of people video in window width
region B
3, B
4height identical, identical with number of people video in window height, B
3[i, j], B
4[i, j] represents respectively region B
3, B
4the gray-scale value of level i vertical J pixel.From left to right, scanning successively from top to bottom, according to the gray-scale value of each pixel, zoning B
3, B
4average be respectively E
1, E
2, computing method are:
W
2represent region B
3, B
4width, H
2represent region B
3, B
4height, the discrepancy delta E=|E of two regional average values
1-E
2|, as Δ E > th
2, show to have people from side face or the behavior of bowing, not masked man's face, otherwise enter next step, in embodiments of the present invention, th
2=25;
3. do not detecting on the number of people video in window of people's face, as Δ E≤th
2time, according to the method shown in Fig. 5, number of people video in window is done to six deciles in vertical direction and divides, in horizontal direction by a left side
region, the right side
region is not analyzed, and the region in the middle of only analyzing, gets B wherein
5and B
2region is as analyzed area, B
5for the 3rd region of counting from top to bottom, B
2for the 5th region of counting from top to bottom, region B
5with region B
2wide identical, for number of people video in window width
region B
5with region B
2height identical, for number of people video in window height
use B
5[i, j], B
2[i, j] represents respectively region B
5, B
2in level i, the gray-scale value of vertical j pixel, from left to right, scans, from top to bottom successively by B in figure
5region and B
2the gray-scale value of corresponding position, region is done difference, D
2[i, j] represents B
5region and B
2zone level i, the difference value at vertical j pixel place, computing method are:
D
2[i,j]=|B
5[i,j]-B
2[i,j]| (9)
[i, j] travels through region B
5, B
2all pixels, i=1,2 ..., W
3, j=1,2 ..., H
3, W
3represent region B
5, B
2width, H
3represent region B
5, B
2height, statistics D
2[i, j] > th
3pixel number, be designated as C
2, when
time, adjudicate in gray level image and have masked man's face to exist, otherwise be normal person's face, in embodiments of the present invention, th
3=18.
While having masked people's face to exist on judging number of people video in window, this number of people video in window position in original color image is located and is marked, upload alarm center.
Seven, headform trains s7
In the process detecting at the enterprising pedestrian's head of gray level image for step 3, used the number of people disaggregated model training in advance, training process s7 unit as shown in Figure 2, comprise and collect the positive negative sample of the number of people, extract gradient orientation histogram hog feature and use support vector machine software storehouse svmlight training of human head model.
1. collect the positive negative sample of the number of people
The 5000 width gray scale pictures that collection comprises head shoulder are as positive sample, and the 10000 secondary gray scale pictures that collection does not comprise the number of people are as negative sample, and sample size unification is scaled 40 * 40 sizes.
2. extract hog proper vector
Extract the gradient orientation histogram statistical value of positive negative sample, and by gradient orientation histogram normalization, each value using the value after normalization in proper vector, method is with to extract the method for proper vector by moving window method on video in window in step 3 identical.
3. use support vector machine software storehouse svmlight training of human head model
5000 positive samples that extract and the proper vector of 10000 negative samples are input in support vector machine software storehouse, select the mode classification of ONE_CLASS and LINEAR kernel function to train, obtain an optimum number of people disaggregated model.
Eight, faceform trains s8
For step 5, at the enterprising pedestrian's face of the number of people image extracting, detect, used the faceform who trains in advance in testing process, training process s8 unit as shown in Figure 2, comprises and collects the positive negative sample of people's face, increases haar feature and training faceform.
1. collect the positive negative sample of people's face
Collect 5000 secondary people face gray scale pictures, unified 20 * 20 the pixel size that zooms to, as positive sample.Collect the gray scale picture of 10000 secondary unmanned face arbitrary sizes as negative sample.
2. increase by 6 haar tagsort devices
For offside dough figurine face better detects, in embodiments of the present invention, increased by 6 haar tagsort devices, the position in shape, area-of-interest for feature and scale-up factor that each tagsort device is used define, and are followed successively by:
(1) tagsort device 1, as shown in Figure 7, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle upper left corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
(2) tagsort device 2, as shown in Figure 8, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle lower left corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
(3) tagsort device 3, as shown in Figure 9, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle upper right corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
(4) tagsort device 4, as shown in figure 10, whole rectangular area is 5 * 3 rectangle, in horizontal direction, there are 5 pixels, in vertical direction, there are 3 pixels, black rectangle region is the square area in the rectangle lower right corner 2 * 2, respond into pixel in whole rectangular area and 4 demultiplication black removal rectangular areas in pixel and 15 times;
(5) tagsort device 5, as shown in figure 11, whole rectangular area is 7 * 1 rectangle, in horizontal direction, there are 7 pixels, in vertical direction, there is 1 pixel, black rectangle region is rectangle level the 5th, the region at 6 pixel places, respond into pixel in whole rectangular area and 2 demultiplication black removal rectangular areas in pixel and 7 times;
(6) tagsort device 6, as shown in figure 12, whole rectangular area is 7 * 1 rectangle, in horizontal direction, there are 7 pixels, in vertical direction, there is 1 pixel, black rectangle region is rectangle level the 2nd, the region at 3 pixel places, respond into pixel in whole rectangular area and 2 demultiplication black removal rectangular areas in pixel and 7 times;
3. train faceform
Utilize the haartraining storehouse training faceform of comparative maturity in opencv.