Summary of the invention:
The purpose of this invention is to provide and a kind ofly can judge whether masked man's face is arranged in the video image, prevent the malfeasant masked man's face detecting method of masked camouflage.
The present invention includes following steps:
1. will monitor the on-the-spot color video frequency image that obtains and convert gray level image into;
2. gray level image is carried out convergent-divergent;
3. detect at the enterprising pedestrian's head of gray level image, when having detected the number of people, the step below getting into, the no number of people is circulation step 1 to 3 then;
4. each number of people is mated in interframe;
5. carrying out people's face detects;
6. carry out masked judgement, the original color video image that rules out masked man's face is carried out mark, and report to the police.
In step 3, adopt the moving window method, the edge is from left to right; The top-down direction moves moving window by pixel, gray level image is divided into the video in window of corresponding each moving window; Video in window is carried out the number of people detect, when moving window is positioned at first video in window:
(1) the horizontal gradient G of each pixel of calculation window image
x[i, j] and VG (vertical gradient) G
y[i, j]:
A.G
x[i, j] and G
y[i, j] initialization:
G
x[i, j] and G
yThe value initialization of each pixel is 0 in [i, j], all pixels on [i, j] cycling among windows image, and i is a variable, the horizontal level of pixel in the expression video in window, value is i=1,2 ..., W
0, j is a variable, the upright position of pixel in the expression video in window, and value is j=1,2 ..., H
0, W
0, H
0Be respectively the width and the height of video in window;
B. on video in window, calculate the horizontal gradient G of each pixel
x[i, j] and VG (vertical gradient) G
y[i, j]:
With Sobel horizontal edge operator as the computing template; Translation computing template center is to each pixel place; With each pixel and corresponding the multiplying each other of each element of computing template in the image-region under the covering of computing template, all sum of products are as the horizontal gradient G of each pixel
x[i, j] as the computing template, obtains the VG (vertical gradient) G of each pixel with Sobel vertical edge operator
y[i, j]:
Wherein, i=2,3 ..., W
0-1, j=2,3 ..., H
0-1, each gray values of pixel points of I [i, j] expression video in window, S
xThe value of the capable l row of [k, l] expression Sobel horizontal edge operator k, S
yThe value of the capable l row of [k, l] expression Sobel vertical edge operator k;
(2) the gradient magnitude G of each pixel of calculation window image
1[i, j] and gradient direction G
2[i, j]:
Wherein, i=1,2 ..., W
0, j=1,2 ..., H
0, arctan () is an arctan function,
Be downward rounding operation symbol,
Expression is not more than
Maximum integer;
(3) utilize the gradient magnitude of each pixel of video in window and direction to carry out the gradient orientation histogram statistics, obtain the proper vector of video in window:
Video in window is divided into the identical connected region of size, and each connected region is made up of 8 * 8 pixels, is called a cell unit, with between square region of 2 * 2 cell unit compositions, have between interval and the interval 50% overlapping;
A. the gradient orientation histogram statistical value of each cell unit in the calculation window image:
The gradient direction span of each pixel is 1 to 9 in the video in window, and each cell unit is made up of 9 passages, utilizes the gradient magnitude and the gradient direction of each pixel of each cell unit; Each the cell unit gradient magnitude of each passage counterparty in scope added up, obtain the gradient orientation histogram statistical value of each cell unit, H [m] [p] representes the statistics with histogram value of p the passage in m cell unit in each video in window; M is a variable, is the cell element numerals, since 1 by from left to right; The top-down order adds 1 successively, and p is a variable, is the passage label; L representes the cell unit number of each video in window horizontal direction, and M representes each video in window cell unit number altogether, and L and M are constant; Only relevant with the size of video in window, get m=1, during p=1; H [1] [1] representes the 1st cell unit, the statistics with histogram value of the 1st passage, and computing formula is:
Wherein,
P adds 1 successively, up to p=9, obtains H [1] [2] from p=2 respectively, H [1] [3] ..., H [1] [9], computing formula is:
Wherein,
M adds 1 successively, from m=2 to m=M, whenever adds once, and to 9 statistics with histogram values should be arranged, the corresponding calculated formula is:
Wherein,
Expression is not more than (m-1)/5;
B. with each interval each the interior cell unit gradient orientation histogram statistical value normalization of each video in window, extract the proper vector of video in window:
S [n] representes the normalized factor of n interval gradient orientation histogram statistical value in each video in window, and n is a variable, is interval label; Since 1 by from left to right, the top-down order adds 1 successively, comprises L-1 interval on each video in window horizontal direction; N representes each video in window interval number altogether; N is a constant, and is relevant with the video in window size, when getting n=1; The normalized factor S [1] of the 1st interval gradient orientation histogram statistical value of video in window be the 1st all passage gradient orientation histogram statistical values of interval all cells unit with
The 1st cell unit with the 1st interval in the video in window; Also be the 1st cell unit of video in window; The gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval; Totally 9 values are followed successively by
as the 1st to the 9th value of video in window proper vector
The 2nd cell unit with the 1st interval in the video in window; Also be the 2nd cell unit of video in window; The gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval; Totally 9 values are followed successively by
as the 10th to 18 value of video in window proper vector
The 3rd cell unit with the 1st interval in the video in window; Also be L+1 cell unit of video in window; The gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval; Totally 9 values are followed successively by
as the 19th to 27 value of video in window proper vector
The 4th cell unit with the 1st interval in the video in window; Also be L+2 cell unit of video in window; The gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval; Totally 9 values are followed successively by
as the 28th to 36 value of video in window proper vector
N adds 1 successively, up to n=N, calculates the normalized factor of all interval gradient orientation histogram statistical values from n=2,
Wherein,
expression be not more than
maximum integer;
With the gradient orientation histogram statistical value of n each interval each passage of cell unit in the video in window normalized factor divided by n interval; Further obtain other 36 * (N-1) individual values of video in window proper vector, the proper vector of each video in window is totally 36 * N dimension;
C. 36 * N dimensional feature vector of each video in window and the number of people disaggregated model that trains are in advance sent into SVMs software storehouse; Select for use mode classification and the LINEAR kernel function of ONE_CLASS to classify; Judge whether this video in window is number of people image, the words that are regard this video in window as number of people video in window;
And the like, repeating step (1) until travel through all video in windows, obtains all number of people video in windows in the present frame gray level image to step (3), and the label of number of people video in window is in proper order identical with traversal order, according to from left to right, the top-down principle.
The flow process of execution in step 4 is following:
(1) all number of people video in windows with n number of people video in window of present frame gray level image and previous frame gray level image carry out position and area matched:
The location matches parameter
Area matched parameter A (m)=| Q
n-S
m|
N, m is a variable, is respectively the label of number of people video in window in present frame and the previous frame gray level image, n number of people video in window of present frame center position is [p
n, q
n], area is Q
n, m number of people video in window of previous frame center position is [x
m, y
m], area is S
m, n=1,2 ..., N; M=1,2 ..., M, N represent the number of people video in window number in the present frame gray level image; M representes the number of people video in window number in the previous frame gray level image, gets n=1, and whole values of traversal m are calculated the m value that makes that location matches parameter T (m) is minimum, are designated as
The optimum matching of first number of people video in window of expression present frame is previous frame J
1Individual number of people video in window is if T is (J
1)≤th
1And A (J
1)≤th
2, th
1=15, th
2=100, represent that then it is J that first number of people video in window of present frame can find the number of people video in window of coupling in the previous frame gray level image
1Individual, if T is (J
1)>th
1Or A (J
1)>th
2, with J
1Put 0, i.e. J
1=0, represent that then any number of people video in window of first number of people video in window of present frame and previous frame does not all match, be emerging number of people video in window;
(2) n adds 1 successively, and up to n=N, repeating step (1) is found out the number of people video in window J of all couplings from n=2
2, J
3..., J
NIf, J
1, J
2..., J
NDo not comprise certain value K, show that then K number of people video in window of previous frame lose from 1 to M;
(3) n number of people video in window of present frame gray level image carried out position and area matched with above second frame to all number of people video in windows of above H frame gray level image respectively; Repeat above-mentioned steps (1); (2); If the situation that number of people video in window is lost does not take place, then n number of people video in window of present frame gray level image is the analysis window image, and H is the arbitrary integer between 5 to 10.
The flow process of execution in step 5 is following:
On the analysis window image, use cascade adaboost method respectively, utilize the people's face disaggregated model that trains to carry out people's face and detect the number of people video in window that obtains detecting the number of people video in window of people's face and do not detect people's face based on the haar characteristic.
The flow process of execution in step 6 is following:
(1) detecting on the number of people video in window of people's face, number of people video in window done six five equilibriums in vertical direction divide, on the horizontal direction with a left side
The zone, the right side
Analysis is not done in the zone, and B is wherein got in the zone in the middle of only analyzing
1And B
2The zone is as analyzed area image, B
1Be the 2nd zone of counting from top to bottom, B
2Be the 5th zone of counting from top to bottom, B
1[i, j], B
2[i, j] representes area image B respectively
1, B
2In level i, vertical j gray values of pixel points, calculating B
1, B
2Level i is individual in the area image, the difference value D of vertical j pixel
1[i, j]:
D
1[i,j]=|B
1[i,j]-B
2[i,j]|
All pixels on [i, j] traversal area image, i=1,2 ..., W
1, j=1,2 ..., H
1, W
1Expression area image B
1, B
2Width, H
1Expression area image B
1, B
2Height, the statistics D
1[i, j]>th
1The pixel number, be designated as C
1, when
The time, then adjudicating has masked man's face to exist in the gray level image, otherwise is normal person's face, th
1=18;
(2) do not detecting on the number of people video in window of people's face, dividing two area B in the middle of getting doing the quartern on the number of people video in window horizontal direction
3, B
4As analyzed area, B
3[i, j], B
4[i, j] representes area B respectively
3, B
4Level i vertical j gray values of pixel points, the average of gray-scale value is respectively E
1, E
2:
W
2Expression area image B
3, B
4Width, H
2Expression area image B
3, B
4Height, two mean value of areas differences are Δ E=|E
1-E
2|, as Δ E>th
2, then showing has people from side face or the behavior of bowing, and is not masked man's face, th
2=25;
(3) at the fooled Δ E≤th of the number of people video in window that does not detect people's face
2The time, number of people video in window is done six five equilibriums in vertical direction divides, on the horizontal direction with a left side
The zone, the right side
Analysis is not done in the zone, and B is wherein got in the zone in the middle of only analyzing
5And B
2Regional as analyzed area, B
5Be the 3rd zone of counting from top to bottom, B
2Be the 5th zone of counting from top to bottom, B
5[i, j], B
2[i, j] representes area B respectively
5, B
2In level i, vertical j gray values of pixel points, calculating B
5, B
2Level i is individual in the zone, the difference value D of vertical j pixel
2[i, j]:
D
2[i,j]=|B
5[i,j]-B
2[i,j]|
I=1,2 ..., W
3, j=1,2 ..., H
3, W
3Expression area image B
5, B
2Width, H
3The expression area B
5, B
2Height, the statistics D
2[i, j]>th
3The pixel number, be designated as C
2, when
The time, then adjudicating has masked man's face to exist in the gray level image, otherwise is normal person's face, th
3=18.
When judgement has masked man's face,, upload the center of receiving a crime report with the location positioning and the mark picture frame of this number of people video in window in former video color image.
In step 3, the production process of number of people disaggregated model is following:
(1) collecting the positive negative sample of the number of people, is positive sample with the 5000 secondary gray scale pictures that comprise the head shoulder, and the 10000 secondary gray scale pictures that do not comprise the number of people are as negative sample, and sample size is consistent;
(2) the gradient orientation histogram statistical value of the positive negative sample of extraction; And with gradient orientation histogram normalization; As each value in the proper vector, method is identical with the method that the gray level image that converts to the on-the-spot video image that obtains of moving window method extraction monitoring extracts proper vector with the value after the normalization;
5000 the positive samples that (3) will extract and the proper vector of 10000 negative samples are input in the SVMs software storehouse, select for use the mode classification of ONE_CLASS and LINEAR kernel function to train, and obtain an optimum number of people disaggregated model.
In step 5, the production process of people's face disaggregated model is following:
(1) collect 5000 secondary people face gray scale pictures as positive sample, unified 20 * 20 the pixel size of zooming to, the gray scale picture of any size of collecting 10000 secondary unmanned faces is as negative sample;
(2) increase by 6 haar tagsort devices, the employed characteristic of each tagsort device defines with position in shape, the area-of-interest and scale-up factor, is followed successively by:
A. the tagsort device 1; Whole rectangular area is 5 * 3 rectangle; 5 pixels are arranged on the horizontal direction; 3 pixels are arranged on the vertical direction, and the black rectangle zone is the square area in the rectangle upper left corner 2 * 2, respond into whole rectangular area interior pixel and 4 demultiplication black removal rectangular area interior pixels and 15 times;
B. the tagsort device 2; Whole rectangular area is 5 * 3 rectangle; 5 pixels are arranged on the horizontal direction; 3 pixels are arranged on the vertical direction, and the black rectangle zone is the square area in the rectangle lower left corner 2 * 2, respond into whole rectangular area interior pixel and 4 demultiplication black removal rectangular area interior pixels and 15 times;
C. the tagsort device 3; Whole rectangular area is 5 * 3 rectangle; 5 pixels are arranged on the horizontal direction; 3 pixels are arranged on the vertical direction, and the black rectangle zone is the square area in the rectangle upper right corner 2 * 2, respond into whole rectangular area interior pixel and 4 demultiplication black removal rectangular area interior pixels and 15 times;
D. the tagsort device 4; Whole rectangular area is 5 * 3 rectangle; 5 pixels are arranged on the horizontal direction; 3 pixels are arranged on the vertical direction, and the black rectangle zone is the square area in the rectangle lower right corner 2 * 2, respond into whole rectangular area interior pixel and 4 demultiplication black removal rectangular area interior pixels and 15 times;
E. the tagsort device 5; Whole rectangular area is 7 * 1 rectangle; 7 pixels are arranged on the horizontal direction, 1 pixel is arranged on the vertical direction, the black rectangle zone is a rectangle level the 5th; The zones at 6 pixels place, respond into whole rectangular area interior pixel and 2 demultiplication black removal rectangular area interior pixels and 7 times;
F. the tagsort device 6; Whole rectangular area is 7 * 1 rectangle; 7 pixels are arranged on the horizontal direction, 1 pixel is arranged on the vertical direction, the black rectangle zone is a rectangle level the 2nd; The zones at 3 pixels place, respond into whole rectangular area interior pixel and 2 demultiplication black removal rectangular area interior pixels and 7 times;
(3) utilize the haartraining storehouse training training pattern among the opencv to obtain people's face disaggregated model.
The present invention utilizes image processing techniques and mode identification method that the human body that occurs in the video image is analyzed, and then the existence of judging people's face in the number of people whether, to be used for the judgement of masked detection.Research for masked identification in the video monitoring has very important meaning, can effectively prevent the lawbreaking activities of masked camouflage.
Embodiment:
The invention provides the masked man's face detecting method in a kind of video monitoring, the system architecture diagram of this method is as shown in Figure 1, comprises video acquisition unit, masked detecting unit and alarm unit.
The major function of video acquisition unit is monitoring scene to be taken and obtained analog video image through the general-purpose simulation camera, converts DID into through general video frequency collection card then.Here setting height(from bottom) and the angle to camera is to have certain requirements; The installation of camera will make the head shoulder zone of human body all appear in the video pictures; The setting up of video camera will make can be known in the video pictures and show people's face positive information; With best, so require the closely positive shooting of camera over against people's face.
The major function of masked detecting unit is to be gray level image to the color digital image data-switching of sending into; Whether on gray level image, detect then has masked people's face to exist; In order to improve detection efficiency; Before detection, at first gray level image is dwindled, gray level image is reduced into the standard detection gray-scale map of 176 * 144 pixel size.If when having masked man's face to exist, the position of extracting this masked man's face, and in the corresponding position of original color image this masked people's face mark is come out.
If, then report to the police to there being masked man's face to exist in masked detection, the image uploading alarm unit of masked mark will be arranged.
The invention provides the masked man's face detecting method in a kind of video monitoring, this method is as shown in Figure 2, specifically comprises the steps:
One, converts coloured image into gray level image s1
Each step of masked man's face detecting method of mentioning among the present invention is all carried out on the gray level image basis, so at first will convert coloured image into gray level image.
Two, gray level image is carried out convergent-divergent s2
In order to improve detection efficiency; Gray level image has been carried out reduction operation, image is reduced into 176 * 144 pixel size, image zoom need be at treatment effeciency; Do a balance on result's smoothness and the sharpness; The present method comparative maturity of image zoom, not in research range of the present invention, each step that the back is mentioned all is to carry out on the basis of the gray level image after dwindling.
Three, detect s3 at the enterprising pedestrian's head of gray level image
Through the step 2 scaled images, adopt the moving window method, the moving window size is 40 * 40; The horizontal scanning step-length is 3, and the vertical scanning step-length is 2, and the edge from left to right; The top-down direction moves moving window by pixel, until complete image of scanning; The corresponding down window gray level image of each moving window is carried out the number of people respectively detect, hereinafter to be referred as video in window;
I representes the horizontal ordinate position of certain point in the video in window, and j representes the ordinate position of certain point in the video in window, the gray-scale value of the video in window that I [i, j] remarked pixel point [i, j] is located, and each pixel of [i, j] cycling among windows image, the video in window width is W
0=40, highly be H
0=40, when moving window is positioned at first video in window:
1. to each pixel compute gradient direction and size of video in window, be specially:
(1) horizontal gradient of each pixel of calculation window image and VG (vertical gradient)
At first, with the horizontal gradient G of each pixel of video in window
x[i, j] and VG (vertical gradient) G
y[i, j] all initially is changed to 0:
G
x[i,j]=0,i=1,2,...,W
0,j=1,2,...,H
0 (1-a)
G
y[i,j]=0,i=1,2,...,W
0,j=1,2,...,H
0 (1-b)
On video in window, as the computing template, from left to right, translation computing template arrives each pixel [i successively from top to bottom with boundary operator; J] locate, for preventing to cross the border, do not handle and go up most, down; The most left, the pictorial element on the rightest four limits, calculation template and pixel [i; J] weighting of field gray-scale value, each respective value of the corresponding computing of each weighted value wherein obtains the horizontal gradient G of each pixel
x[i, j] and VG (vertical gradient) G
y[i, j], boundary operator can be got the Roberts boundary operator, the Sobel boundary operator, Prewitt boundary operator and Kirsch boundary operator, the present invention is with the Sobel boundary operator
With
For example describes, horizontal gradient G
x[i, j] and VG (vertical gradient) G
yThe computing method of [i, j] are:
i=2,3,...,W
0-1,j=2,3,...,H
0-1(2-a)
i=2,3,...,W
0-1,j=2,3,...,H
0-1(2-b)
Wherein, each gray values of pixel points of I [i, j] expression video in window, S
xThe value of the capable l row of [k, l] expression Sobel horizontal edge operator k, S
yThe value of the capable l row of [k, l] expression Sobel vertical edge operator k;
(2) the gradient magnitude G at each pixel place of calculation window image
1[i, j] and gradient direction G
2[i, j]:
i=1,2,...,W
0,j=1,2,...,H
0 (3-a)
Wherein, arctan () is an arctan function,
Be downward rounding operation symbol,
Expression is not more than
Maximum integer; P representes the passage number, can get the arbitrary integer between 2 to 180, in embodiments of the present invention, is that example describes with P=9, then gradient direction G
2[i, j] can be expressed as:
2. utilize the gradient magnitude obtain to carry out histogram of gradients and add up with direction, with each value of the value after the normalization of histogram of gradients statistical value as proper vector:
At first video in window is divided into the identical connected region of size, each connected region is exactly a cell unit cell, adds up the gradient orientation histogram of each pixel in the cell unit then.In order to adapt to the influence of illumination variation and shade better, a plurality of cell are formed an interval block, the cell in each block is carried out the normalization of gradient orientation histogram statistical value.
The pixel number that every row of each cell comprises can be the arbitrary integer between 2 to 20; The pixel number that every row of each cell comprise can be the arbitrary integer between 2 to 20; The pixel number that the every row of each block comprises can be the arbitrary integer between 4 to 40, and the pixel number that every row of each block comprise can be the arbitrary integer between 4 to 40.With each cell 8 * 8 pixels are arranged in embodiments of the present invention, each block consists of example by 2 * 2=4 cell and describes.Be illustrated in figure 3 as the synoptic diagram of a block among this embodiment, the video in window for 40 * 40 has 25 cell, and 50% overlap arranged between block and the block, and then image block number altogether has 4 * 4=16.9 direction passages are arranged in each cell, and the characteristic of each block has 4 * 9=36 characteristic, always has 16 * 36=576 dimensional feature.Computation process is following:
(1) the gradient orientation histogram statistical value of each cell unit in the calculation window image
Each pixel in the video in window in each cell unit all is certain histogram passage ballot based on gradient direction, and the gradient magnitude of this pixel is as the ballot weights.Represent the statistics with histogram value of p the passage in m cell unit in each video in window with H [m] [p], m is a variable, is the cell element numerals, since 1 by from left to right; The top-down order adds 1 successively, and p is a variable, is the passage label, and L representes the cell unit number of each video in window horizontal direction; M representes each video in window cell unit number altogether, and L and M are constant, and be only relevant with the size of video in window; In embodiments of the present invention, L=5, M=25;
Get m=1, during p=1, H [1] [1] representes the 1st cell unit, the statistics with histogram value of the 1st passage, and computing formula is:
Wherein,
P adds 1 successively, up to p=9, obtains H [1] [2] from p=2 respectively, H [1] [3] ..., H [1] [9], computing formula is:
Wherein,
M adds 1 successively, from m=2 to m=M, whenever adds once, and to 9 statistics with histogram values should be arranged, the corresponding calculated formula is:
Wherein,
Expression is not more than the maximum integer of (m-1)/5;
(2), extract the proper vector of video in window with each interval each the interior cell unit gradient orientation histogram statistical value normalization of each video in window.
Each interval each interior cell unit gradient orientation histogram statistical value of each video in window is added up, as normalized factor.S [n] representes the normalized factor of n interval gradient orientation histogram statistical value in each video in window, and n is a variable, is interval label; Since 1 by from left to right, the top-down order adds 1 successively, comprises L-1 interval on each video in window horizontal direction; N representes each video in window interval number altogether; N is a constant, and is relevant with the video in window size, in embodiments of the present invention; Each video in window interval number N=16 altogether has 4 intervals on each video in window horizontal direction;
When getting n=1n=1, the normalized factor S [1] of the 1st interval gradient orientation histogram statistical value of video in window be the 1st all passage gradient orientation histogram statistical values of interval all cells unit with,
The 1st cell unit with the 1st interval in the video in window; Also be the 1st cell unit of video in window; The gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval; Totally 9 values are followed successively by
as the 1st to the 9th value of video in window proper vector
The 2nd cell unit with the 1st interval in the video in window; Also be the 2nd cell unit of video in window; The gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval; Totally 9 values are followed successively by
as the 10th to 18 value of video in window proper vector
The 3rd cell unit with the 1st interval in the video in window; Also be the 6th cell unit of video in window; The gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval; Totally 9 values are followed successively by
as the 19th to 27 value of video in window proper vector
The 4th cell unit with the 1st interval in the video in window; Also be the 7th cell unit of video in window; The gradient orientation histogram statistical value of each passage is divided by the normalized factor in the 1st interval; Totally 9 values are followed successively by
as the 28th to 36 value of video in window proper vector
N adds 1 successively, up to n=N, calculates the normalized factor of all interval gradient orientation histogram statistical values from n=2, and computing method are:
Wherein,
expression be not more than
maximum integer;
The gradient orientation histogram statistical value of each each passage of cell unit that n in the video in window is interval is divided by the interval normalized factor of nn; Further obtain other 15 * 36 values of video in window proper vector, the proper vector of each video in window is totally 16 * 36=576 dimension;
576 dimensional feature vectors of each video in window and the number of people disaggregated model that trains are in advance sent into SVMs software storehouse; Select for use mode classification and the LINEAR kernel function of ONE_CLASS to classify; Judge whether this video in window is number of people image, the words that are regard this video in window as number of people video in window;
And the like; Until having scanned all video in windows, obtain all number of people video in windows in the present image, obtain all number of people video in windows in the present frame gray level image; The label order of number of people video in window is identical with traversal order; According to from left to right, the top-down principle describes in detail in the 7th part for headform's training.
Four, each number of people is mated s4 in interframe
Masked man's face occurs in video, can not flash and mistake.Therefore, on the basis of step 3, the number of people that detects is followed the tracks of, to get rid of the interference that instantaneous object appearing causes, concrete implementation procedure is as shown in Figure 4.
Suppose to detect in the previous frame gray level image M number of people video in window, the present frame gray level image detects N number of people video in window, and n, m are variable; Be respectively the label of number of people video in window in present frame and the previous frame gray level image, n=1,2 ...; N, m=1,2; ..., M, m number of people video in window of previous frame center position is [x
m, y
m], area is S
m, n number of people video in window of present frame center position is [p
n, q
n], area is Q
n
(1), carries out position and area matched with all number of people video in windows of previous frame gray level image to n number of people video in window of present frame gray level image.The center position difference T (m) and the difference in areas A (m) of n number of people video in window of present frame and m number of people video in window of previous frame are respectively:
A(m)=|Q
n-S
m| (6-b)
Get n=1, whole values of traversal m are calculated the m value that makes that location matches parameter T (m) is minimum, are designated as
The optimum matching of first number of people video in window of expression present frame is previous frame J
1Individual number of people video in window is if T is (J
1)≤th
1And A (J
1)≤th
2, th
1Be the threshold value of number of people video in window center interframe variation, th in embodiments of the present invention
1=15, th
2Be number of people video in window area interframe change threshold, th in embodiments of the present invention
2=100, represent that then it is J that first number of people video in window of present frame can find the number of people video in window of coupling in the previous frame gray level image
1Individual, if T is (J
1)>th
1Or A (J
1)>th
2, with J
1Put 0, i.e. J
1=0, represent that then any number of people video in window of first number of people video in window of present frame and previous frame does not all match, be emerging number of people video in window;
(2) n adds 1 successively, and up to n=N, repeating step (1) is found out the number of people video in window J of all couplings from n=2
2, J
3..., J
NIf, J
1, J
2..., J
NDo not comprise certain value K, show that then K number of people video in window of previous frame lose from 1 to M;
(3) n number of people video in window of present frame gray level image carried out position and area matched with above second frame to all number of people video in windows of above H frame gray level image respectively; Repeat above-mentioned steps (1), the situation that number of people video in window is lost if do not take place in (2); Then n number of people video in window of present frame gray level image is the analysis window image; H is the arbitrary integer between 5 to 10, in embodiments of the present invention, and H=8.
Five, people's face detects s5
On the basis of step 4; When continuous number of people video in window exists; On the analysis window image, use cascade adaboost method respectively, utilize the people's face disaggregated model that trains to carry out people's face and detect the number of people video in window that obtains detecting the number of people video in window of people's face and do not detect people's face based on haar; In embodiments of the present invention, the method that detects at the enterprising pedestrian's face of each analysis window image is identical.
Six, masked judgement s6
The flow process of on the basis that the number of people detects and people's face detects, carrying out masked decision-making is following:
1. detecting on the number of people video in window of people's face, according to method shown in Figure 5, number of people video in window is done six five equilibriums in vertical direction divides, on the horizontal direction with a left side
The zone, the right side
Analysis is not done in the zone, and B is wherein got in the zone in the middle of only analyzing
1And B
2Regional as analyzed area, B
1Be the 2nd zone of counting from top to bottom, B
1Be the 5th zone of counting from top to bottom, area B
1And area B
2Wide identical, for number of people video in window width
Area B
1And area B
2High identical, for number of people video in window height
B
1[i, j], B
2[i, j] representes area B respectively
1, B
2In level i, vertical j gray values of pixel points, calculating B
1, B
2Level i is individual in the zone, the difference value D of vertical j pixel
1[i, j], from left to right, scanning successively from top to bottom, B in will scheming
1Zone and B
2The gray-scale value of corresponding position, zone is done difference, D
1[i, j] representes B
1Zone and B
2Zone level i, the difference value at vertical j pixel place, the computing method of difference value are:
D
1[i,j]=|B
1[i,j]-B
2[i,j]| (7)
[i, j] travels through area B
1And area B
2All pixels, i=1,2 ..., W
1, j=1,2 ..., H
1, W
1The expression area B
1, B
2Width, H
1The expression area B
1, B
2Height, the statistics D
1[i, j]>th
1The pixel number, be designated as C
1, when
The time, then adjudicating has masked man's face to exist in the gray level image, otherwise is normal person's face, in embodiments of the present invention, and th
1=18;
2. not detecting on the number of people video in window of people's face,, divide two area B in the middle of getting with doing the quartern on the number of people video in window horizontal direction according to method shown in Figure 6
3, B
4As analyzed area, area B
3, B
4Wide identical, for number of people video in window width
Area B
3, B
4High identical, identical with number of people video in window height, B
3[i, j], B
3[i, j] representes area B respectively
3, B
4Level i vertical j gray values of pixel points.From left to right, scanning successively from top to bottom, according to each gray values of pixel points, zoning B
3, B
4Average be respectively E
1, E
2, computing method are:
W
2The expression area B
3, B
4Width, H
2The expression area B
3, B
4Height, the discrepancy delta E=|E of two regional averages
1-E
2|, as Δ E>th
2, then showing has people from side face or the behavior of bowing, and or not masked man's face, otherwise get into next step, in embodiments of the present invention, th
2=25;
3. do not detecting on the number of people video in window of people's face, as Δ E≤th
2The time, according to method shown in Figure 5, number of people video in window is done six five equilibriums in vertical direction divides, on the horizontal direction with a left side
The zone, the right side
Analysis is not done in the zone, and B is wherein got in the zone in the middle of only analyzing
5And B
2Regional as analyzed area, B
5Be the 3rd zone of counting from top to bottom, B
2Be the 5th zone of counting from top to bottom, area B
5And area B
2Wide identical, for number of people video in window width
Area B
5And area B
2High identical, for number of people video in window height
Use B
5[i, j], B
2[i, j] representes area B respectively
5, B
2In level i, vertical j gray values of pixel points from left to right, scans B in will scheming from top to bottom successively
5Zone and B
2The gray-scale value of corresponding position, zone is done difference, D
2[i, j] representes B
5Zone and B
2Zone level i, the difference value at vertical j pixel place, computing method are:
D
2[i,j]=|B
5[i,j]-B
2[i,j]| (9)
[i, j] travels through area B
5, B
2All pixels, i=1,2 ..., W
3, j=1,2 ..., H
3, W
3The expression area B
5, B
2Width, H
3The expression area B
5, B
2Height, the statistics D
2[i, j]>th
3The pixel number, be designated as C
2, when
The time, then adjudicating has masked man's face to exist in the gray level image, otherwise is normal person's face, in embodiments of the present invention, and th
3=18.
When on judging number of people video in window, having masked people's face to exist, location positioning and the mark of this number of people video in window in original color image come out, upload the center of receiving a crime report.
Seven, the headform trains s7
For step 3 in the process that the enterprising pedestrian's head of gray level image detects; Used the number of people disaggregated model that trains in advance; The s7 unit that training process is as shown in Figure 2; Comprise and collect the positive negative sample of the number of people, extract gradient orientation histogram hog characteristic and use SVMs software storehouse svmlight training of human head model.
1. collect the positive negative sample of the number of people
Collection comprises 5000 width of cloth gray scale pictures of head shoulder as positive sample, and the 10000 secondary gray scale pictures that collection does not comprise the number of people are as negative sample, and the sample size unification is scaled 40 * 40 sizes.
2. extract the hog proper vector
Extract the gradient orientation histogram statistical value of positive negative sample; And with gradient orientation histogram normalization; As each value in the proper vector, it is identical on video in window, to extract the method for proper vector with the moving window method in method and the step 3 with the value after the normalization.
3. use SVMs software storehouse svmlight training of human head model
5000 the positive samples that extract and the proper vector of 10000 negative samples are input in the SVMs software storehouse, select for use the mode classification of ONE_CLASS and LINEAR kernel function to train, obtain an optimum number of people disaggregated model.
Eight, the faceform trains s8
Detect at the enterprising pedestrian's face of the number of people image that extracts for step 5, used the faceform who trains in advance in the testing process, the s8 unit that training process is as shown in Figure 2 comprises and collects the positive negative sample of people's face, increases haar characteristic and training faceform.
1. collect the positive negative sample of people's face
Collect 5000 secondary people face gray scale pictures, unification zooms to 20 * 20 pixel size, as positive sample.Collect the big or small arbitrarily gray scale picture of 10000 secondary unmanned faces as negative sample.
2. increase by 6 haar tagsort devices
For offside dough figurine face better detects, in embodiments of the present invention, increased by 6 haar tagsort devices, the employed characteristic of each tagsort device defines with position in shape, the area-of-interest and scale-up factor, is followed successively by:
(1) the tagsort device 1; As shown in Figure 7; Whole rectangular area is 5 * 3 rectangle, and 5 pixels are arranged on the horizontal direction, and 3 pixels are arranged on the vertical direction; The black rectangle zone is the square area in the rectangle upper left corner 2 * 2, respond into whole rectangular area interior pixel and 4 demultiplication black removal rectangular area interior pixels and 15 times;
(2) the tagsort device 2; As shown in Figure 8; Whole rectangular area is 5 * 3 rectangle, and 5 pixels are arranged on the horizontal direction, and 3 pixels are arranged on the vertical direction; The black rectangle zone is the square area in the rectangle lower left corner 2 * 2, respond into whole rectangular area interior pixel and 4 demultiplication black removal rectangular area interior pixels and 15 times;
(3) the tagsort device 3; As shown in Figure 9; Whole rectangular area is 5 * 3 rectangle, and 5 pixels are arranged on the horizontal direction, and 3 pixels are arranged on the vertical direction; The black rectangle zone is the square area in the rectangle upper right corner 2 * 2, respond into whole rectangular area interior pixel and 4 demultiplication black removal rectangular area interior pixels and 15 times;
(4) the tagsort device 4; Shown in figure 10; Whole rectangular area is 5 * 3 rectangle, and 5 pixels are arranged on the horizontal direction, and 3 pixels are arranged on the vertical direction; The black rectangle zone is the square area in the rectangle lower right corner 2 * 2, respond into whole rectangular area interior pixel and 4 demultiplication black removal rectangular area interior pixels and 15 times;
(5) the tagsort device 5; Shown in figure 11, whole rectangular area is 7 * 1 rectangle, and 7 pixels are arranged on the horizontal direction; 1 pixel is arranged on the vertical direction; The black rectangle zone is the zone at the 5th, 6 pixel of rectangle level place, respond into whole rectangular area interior pixel and 2 demultiplication black removal rectangular area interior pixels and 7 times;
(6) the tagsort device 6; Shown in figure 12, whole rectangular area is 7 * 1 rectangle, and 7 pixels are arranged on the horizontal direction; 1 pixel is arranged on the vertical direction; The black rectangle zone is the zone at the 2nd, 3 pixel of rectangle level place, respond into whole rectangular area interior pixel and 2 demultiplication black removal rectangular area interior pixels and 7 times;
3. train the faceform
Utilize the haartraining storehouse training faceform of comparative maturity among the opencv.