CN104751136A

CN104751136A - Face recognition based multi-camera video event retrospective trace method

Info

Publication number: CN104751136A
Application number: CN201510106388.8A
Authority: CN
Inventors: 张二虎; 白晓楠; 张卓敏
Original assignee: Xian University of Technology
Current assignee: Beijing Hua Chong Chong Nanjing Information Technology Co., Ltd.
Priority date: 2015-03-11
Filing date: 2015-03-11
Publication date: 2015-07-01
Anticipated expiration: 2035-03-11
Also published as: CN104751136B

Abstract

The invention discloses a face recognition based multi-camera video event retrospective trace method. The face recognition based multi-camera video event retrospective trace method comprises the steps of firstly training a face cascade classifier by adopting an Adaboost method based on a Haar-like feature, then adopting the face cascade classifier to detect and tracing faces existing in each camera so as to obtain a face library, framing face zones of pedestrians in videos, extracting the LBP features of the face zones, conducting matching recognition on target faces in the face library, finally extracting the time of targets existing in all cameras to obtain walking routes according to face recognition results so as to obtain retrospective trace results of the targets. The face recognition based multi-camera video event retrospective trace method is applied in multiple camera video system in multiple scenes and used for retrospection of video events, the working amount of manual video inquiry is reduced, and inquiry efficiency is improved.

Description

A kind of multi-camera video event back jump tracking method based on recognition of face

Technical field

The invention belongs to intelligent analysis system technical field, be specifically related to a kind of multi-camera video event back jump tracking method based on recognition of face.

Background technology

Video monitoring system is dispersed throughout each occasion of people's life, and when there being anomalous event to occur, usually need from monitor video, search questionable person person's target, traditional method is checked often by manual type.Because video monitoring data amount is very large, manually searches and take time and effort and inefficiency, easily miss much useful information.Therefore processed monitor data by machine, the back jump tracking that realizes of intelligence has important actual application value.

For the pedestrian target in video, although its feature has multiple, face is reliable and is better than a kind of biological characteristic of other features.Science and technology is constantly progressive in recent years, human face detection and recognition technology also development thereupon.Face is as a kind of biological characteristic of people, and its application also gets more and more.Such as, recognition of face access management system, recognition of face access control and attendance system, recognition of face monitoring management, the strick precaution of recognition of face computer security, the search of recognition of face photo, recognition of face visiting registration, recognition of face ATM intelligent video warning system etc.2008 Olympic Games and Shanghai World's Fair in 2010, just have and recognition of face is used for authentication carries out security work.And these are all for single image mostly, or the single scene video of one camera, for many scene videos system less of polyphaser.During polyphaser many scene videos, as checked, Video Events then needs manually to consult multiple camera video respectively, the problem that existence takes time and effort, recall precision is low.

Summary of the invention

The object of this invention is to provide a kind of multi-camera video event back jump tracking method based on recognition of face, with solve that manual type checks that Video Events exists take time and effort, problem that recall precision is low.

Technical scheme of the present invention is achieved in that a kind of multi-camera video event back jump tracking method based on recognition of face, specifically implements according to following steps:

Step 1: based on Haar-like feature by Adaboost method training face cascade classifier;

Step 2: the video of all cameras in collection monitoring system;

Step 3: the face in all camera video gathered is followed the tracks of, the face obtained after tracking is write into database as face database, and each face naming rule is in a database: phase plane No. _ frame number;

Step 4: choose the target pedestrian needing back jump tracking, confine the human face region of target pedestrian in video, by extracting the LBP feature of face in target pedestrian and face database, carries out match cognization to target pedestrian in face database;

Step 5: according to the face recognition result in step 4, extracts target pedestrian and draws its walking path in the time that each camera occurs, thus obtain the back jump tracking result of target pedestrian.

Feature of the present invention is also,

Train the detailed process of face cascade classifier as follows in step 1:

1.1, provide training sample, first need to provide training sample set (x ₁, y ₁), (x ₂, y ₂) ..., (x _n, y _n), wherein comprise the positive sample of face and negative sample, x _irepresent sample, y _irepresent the positive and negative of sample, y _i=1 represents that it is the positive sample of face, y _i=0 represents that it is face negative sample (non-face), and n represents total training sample number;

1.2, initializes weights:

w_{1, i} \{\begin{matrix} \frac{1}{2 l} & y_{i} = 1 \\ \frac{1}{2 m} & y_{i} = 0 \end{matrix} - - - (1);

Wherein, w _{1, i}represent the initializes weights of the 1st iterative process i-th sample, l, m are the quantity of positive sample and negative sample respectively, i=1,2...n;

1.3, normalized weight, is all normalized the weight of all samples, shown in (2), and q _t,iweight for after normalization:

q_{t, i} = \frac{w_{t, i}}{Σ_{i = 1}^{n} w_{t, i}} - - - (2);

Wherein, w _t,irepresent the weight of the t time iterative process, i-th sample;

1.4, to the optimum Weak Classifier of each Haar-like features training, and calculate its weighting fault rate:

On sample basis, its optimum Weak Classifier h (x, f, p, θ) is trained to be for each Haar-like feature:

Wherein _,f representation feature, f (x) representation feature value, p represents the direction of the sign of inequality, and p gets 1 or-1, and θ is threshold value, and p is to make the direction of the sign of inequality be No. < all the time;

Calculate optimum Weak Classifier h (x, f, p, θ) to the weighting fault rate ε of all samples _f, be shown below:

ϵ_{f} = Σ_{i = 1}^{n} q_{t, i} | h (x_{i}, f, p, θ) - y_{i} | - - - (4);

1.5, select the minimum optimum Weak Classifier of all sample weighting error rates, as the Weak Classifier that this iteration obtains;

1.6, adjustment weight, the adjustment of weight is whether the Weak Classifier obtained according to step 1.5 is just being sentenced each sample and carried out, shown in (5), (6), if can find out and just sentence, weight reduces, if judge by accident, weight increases, and can make to choose during follow-up selection Weak Classifier the sorter more paying attention to erroneous judgement sample:

w_{t + 1, i} = w_{t, i} β_{t}^{1 - e_{i}} - - - (5);

β_{t} = \frac{ϵ_{t}}{1 - ϵ_{t}} - - - (6);

Wherein, e _ivalue be 0 or 1, got 0 when sample just sentences, get 1, ε when sample is misjudged _trepresent that Weak Classifier that the t time iterative process obtain is to the weighting fault rate of all samples;

Repeat step 1.3 ~ 1.6 and carry out iterative process, until reach the iterations T of regulation, obtain T Weak Classifier, finishing iteration process;

1.7, by T Weak Classifier composition strong classifier, strong classifier is shown below:

α_{t} =log \frac{1}{β_{t}} - - - (8);

1.8, change iterations T, obtain different strong classifiers, and according to T order from small to large, strong classifier is carried out cascade, obtain final cascade classifier, the number of strong classifier is designated as M, and M is greater than 20.

In step 3, the detailed process of face tracking is as follows:

3.1, first gray processing is carried out to magazine frame of video to be detected, then pass through bilinear interpolation method downscaled images to improve detection speed, finally carry out histogram equalization to improve Detection results;

3.2, Face datection is carried out to video and follows the tracks of, with the method chased after be: each face that N frame is detected, its scope that may occur at N+1 frame is determined according to interframe constraint, concrete grammar is the position of the face detected by N frame and size, in N+1 frame centered by the face center that N frame detects, wide and the high increase two times of the human face region that N frame is detected, be defined as the scope that in N+1 frame, this face may occur, in this scope, N+1 frame carried out to the detection of this face:

If when the face detected within the scope of this is one, be then this person;

If during more than one of the face detected within the scope of this, calculate the Euclidean distance between face that N frame detects and each face center that N+1 frame detects, the face nearest with N frame is defined as the face that this person occurs at N+1 frame;

If when face not detected within the scope of this, then think that this person has walked out the video area of this camera.

In step 3, method for detecting human face is as follows:

A, if the first frame of video image, then directly detects with the cascade classifier that step 1 obtains;

If the subsequent frame of video image, be then defined as according to interframe constraint the region that in this frame, face may occur, the top n strong classifier of the cascade classifier using step 1 to obtain in this region detects, N<M; The cascade classifier using step 1 to obtain except this other region extra-regional detects;

B, carries out Face Detection for the face detected, non-colour of skin flase drop result is excluded, is specially:

First image is transformed into YC from rgb color space _bc _rcolor space, shown in (9), (10), (11):

Y＝0.257R+0.504G+0.098B+16 (9)；

Cb＝-0.148R-0.291G+0.439B+128 (10)；

Cr＝0.439R-0.368G-0.071B+128 (11)；

Cb, Cr value of the pixel in the human face region detected is judged, if meet 85 < Cb < 130 and 132 < Cr < 180, then current pixel point is just colour of skin point, otherwise be non-colour of skin point, if colour of skin point accounts for total pixel more than 60% in testing result, think human face region, otherwise be non-face region, got rid of.

The detailed process of step 4 is as follows:

4.1, confine the human face region of target pedestrian in video, use step 1 to train the cascade classifier obtained to detect target face, as the mark to target pedestrian in the region;

4.2, extract the uniform pattern LBP feature of face in the face of target pedestrian and face database;

4.3, calculate the distance between the LBP eigenwert in face database between face and the face of target pedestrian, from small to large face in storehouse is sorted according to the distance calculated, according to clooating sequence, select the immediate width facial image of face of a width and target pedestrian, be the facial image of target pedestrian.

The detailed process of step 4.2 is as follows:

4.2.1, first facial image in the face of target pedestrian and face database is evenly divided into the image block of m × m, i.e. m ²individual image block, then judges the LBP binary code integrated mode of each pixel of each image block:

Respectively using each pixel of image block as center, its gray-scale value is as threshold value, and surrounding pixel point gray-scale value compares with it and carries out binaryzation, and obtain the binary code of LBP, surrounding pixel point compares binaryzation formula with central pixel point is shown below:

s (p) = \{\begin{matrix} 1, & g_{p} - g_{c} &GreaterEqual; 0 \\ 0, & g_{p} - g_{c} < 0 \end{matrix} - - - (12);

Wherein, s (p) is the binary value of the neighborhood centered by current pixel, g _ccentered by the gray-scale value of pixel, g _pfor g _cneighborhood territory pixel point gray-scale value;

In LBP binary code sequence, between 0 and 1, the computing formula of transition times is as follows:

U (LBP) = Σ_{i = 1}^{P} | s (i) - s (i - 1) | - - - (13);

Wherein s (i) represents the binary value of pixel P+1 neighborhood;

When the binary code integrated mode of U (LBP)≤2 is uniform pattern, other binary code integrated mode is that non-uniform pattern is classified as a kind of pattern;

4.2.2, ask the histogram of the LBP uniform pattern of each image block, and by this m ²individual histogram is together in series, as the LBP feature of face middle in target pedestrian and face database.

The detailed process of step 5 is as follows:

Facial image selected by step 4.3, its frame number can be obtained by its naming rule in step 3, further according to initial time and the frame per second of this camera video, the time that this face occurs in this video can be obtained, finally the sequencing of the face video frame selected in each camera by the time occurred is marked, obtains the back jump tracking result of this target pedestrian.

The invention has the beneficial effects as follows, the present invention passes through recognition of face, the intelligentized time obtaining target pedestrian and occur in each camera video, thus the back jump tracking that can realize efficiently target, and according to the description of installation site realization to its movement track of camera, when anomalous event occurs, can find its whereabouts fast, need not be artificial check video always, alleviate the workload of manually searching.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of a kind of multi-camera video event back jump tracking method based on recognition of face of the present invention;

Fig. 2 is the process of training cascade classifier of the present invention;

Fig. 3 is by following the tracks of the part facial image of a people in the face database that obtains in the present invention;

Fig. 4 is the image of the human face region of the target pedestrian confined in the present invention;

Fig. 5 is the image of the target face detected in the present invention;

Fig. 6 adopts back jump tracking method of the present invention to recall the target pedestrian that obtains at each magazine image;

Fig. 7 adopts back jump tracking method of the present invention to recall the target pedestrian each magazine final back jump tracking result schematic diagram under polyphaser obtained.

Embodiment

Below by the drawings and specific embodiments, the present invention is described in detail.

See Fig. 1, a kind of multi-camera video event back jump tracking method based on recognition of face of the present invention, specifically implement according to following steps:

Step 1: based on Haar-like feature by Adaboost method training face cascade classifier, detailed process is as follows:

During Face datection, form by each Weak Classifier of Adaboost algorithm training gained strong classifier corresponding with a Haar-like feature, be the optimum Weak Classifier based on this feature, illustrate the gray distribution features of face part.Important owing to not being face characteristic described by each Haar-like feature, can as differentiation face and non-face feature, so Adaboost algorithm just needs to pass through iteration, the feature being applicable to carrying out face and non-face classification is picked out from the feature that these are very huge, as selected Weak Classifier, if and give corresponding weight according to its classification capacity to each sorter, thus form final strong classifier.

See Fig. 2, the detailed process of training strong classifier is as follows:

1.2, initializes weights:

w_{1, i} \{\begin{matrix} \frac{1}{2 l} & y_{i} = 1 \\ \frac{1}{2 m} & y_{i} = 0 \end{matrix} - - - (1);

In formula, w _{1, i}represent the initializes weights of the 1st iterative process i-th sample, l, m are the quantity of positive sample and negative sample respectively, wherein, and i=1,2...n;

q_{t, i} = \frac{w_{t, i}}{Σ_{i = 1}^{n} w_{t, i}} - - - (2);

Wherein, f representation feature, f (x) representation feature value, θ is threshold value, and p represents the direction of the sign of inequality, and p gets 1 or-1, and p is to make the direction of the sign of inequality be No. < all the time;

ϵ_{f} = Σ_{i = 1}^{n} q_{t, i} | h (x_{i}, f, p, θ) - y_{i} | - - - (4);

w_{t + 1, i} = w_{t, i} β_{t}^{1 - e_{i}} - - - (5);

β_{t} = \frac{ϵ_{t}}{1 - ϵ_{t}} - - - (6);

1.7, by T Weak Classifier composition strong classifier, strong classifier gives weight composition according to the classification capacity of Weak Classifier, and namely each Weak Classifier carries out choosing result of determination in a vote, just each Weak Classifier component of voting is different, and the strong classifier obtained is shown below:

α_{t} =log \frac{1}{β_{t}} - - - (8);

Be equivalent to allow all Weak Classifiers vote, then to the error rate weighted sum of voting results according to Weak Classifier, the result of voting weighted summation compared with average voting results and draws final result;

Step 2: the video of all cameras in collection monitoring system;

Step 3: the face in all camera video collected is followed the tracks of, obtain everyone multiple faces and it can be used as face database, as shown in Figure 3, detailed process is as follows:

If when the face detected within the scope of this is one, be then this person;

If when face not detected within the scope of this, then think that this person has walked out the video area of this camera;

Method for detecting human face is as follows:

A, if the first frame of video image, then directly carries out Face datection with the cascade classifier that step 1 obtains;

If the subsequent frame of video image, then according to interframe constraint to be defined as in this frame with the region chasing after face and may occur, the top n strong classifier of the cascade classifier using step 1 to obtain in this region detects, N<M; The cascade classifier using step 1 to obtain except this other region extra-regional detects;

Y＝0.257R+0.504G+0.098B+16 (9)；

Cb＝-0.148R-0.291G+0.439B+128 (10)；

Cr＝0.439R-0.368G-0.071B+128 (11)；

3.3, the face obtained after tracking is write into database as face database, each face naming rule is in a database: phase plane No. _ frame number, and wherein phase plane No. represents that this face belongs to and detects in which camera video, and frame number represents this facial image belongs to which frame of this camera video.

Step 4, confines the human face region of target pedestrian in video, by extracting LBP feature, carries out match cognization for target face in face database; Detailed process is as follows:

4.1, confine the human face region of target pedestrian in video, as shown in Figure 4, use step 1 to train the cascade classifier obtained in the region, detect target face in the region, as the mark to target pedestrian, as shown in Figure 5;

4.2, extract the uniform pattern LBP feature of face in the face of target pedestrian and face database:

4.2.1 first facial image in the face of target pedestrian and face database is evenly divided into the image block of 4 × 4, i.e. 16 image blocks, then judges the LBP binary code integrated mode of each pixel of each image block:

LBP is a kind of operator be described Local textural feature, using current pixel point as center, its gray-scale value is as threshold value, and surrounding pixel point gray-scale value compares with it and carries out binaryzation, obtain the binary code of LBP, surrounding pixel point compares binaryzation formula with central pixel point is shown below:

s (p) = \{\begin{matrix} 1, & g_{p} - g_{c} &GreaterEqual; 0 \\ 0, & g_{p} - g_{c} < 0 \end{matrix} - - - (12);

Wherein, s (p) is the binary value of the neighborhood centered by current pixel, g _ccentered by point gray-scale value, g _pfor g _cthe gray-scale value of neighborhood territory pixel point;

The binary code of LBP has how many kinds of value, the binary code integrated mode namely having how many kinds of different.But find through research, some patterns are had to illustrate most of texture pattern, the probability occurred in the picture is up to more than 90%, and the probability that some other patterns occur in the picture is very low, so can think have some patterns to be base attributes of image texture, these patterns are exactly uniform pattern (Uniform Pattern), and namely in LBP binary code sequence, between 0 and 1, transition times is no more than the pattern of twice, and the computing formula of transition times is as follows:

U (LBP) = Σ_{i = 1}^{P} | s (i) - s (i - 1) | - - - (13);

Wherein s (i) represents the binary value of pixel P+1 neighborhood;

4.2.2, ask the histogram of the LBP uniform pattern of each image block, and these 16 histograms are together in series, as the LBP feature of face middle in target pedestrian and face database;

4.3, pass through Euclidean distance, calculate the distance between the LBP eigenwert in face database between face and the face of target pedestrian, from small to large face in storehouse is sorted according to the distance calculated, according to clooating sequence, select the immediate width facial image of face of a width and target pedestrian, be the facial image of target pedestrian.

Step 5, according to face recognition result, extract target pedestrian and draw its walking path in the time that each camera occurs, thus obtain the back jump tracking result of target, detailed process is as follows:

Facial image selected by step 4.3, its frame number can be obtained by its naming rule in step 3, further according to initial time and the frame per second of this camera video, the time that this face occurs in this video can be obtained, finally the sequencing of the face video frame selected in each camera by the time occurred is marked, the back jump tracking result of this target pedestrian can be obtained.

That Fig. 7 provides is final back jump tracking result figure, can the priority position that occurs of this target pedestrian be position corresponding to camera 5, camera 4, camera 3, camera 1, camera 2 from Fig. 7, therefore, employing the inventive method can realize the back jump tracking to target fast and accurately, substantially increases the recall precision to target pedestrian in Video Events.

Claims

1., based on a multi-camera video event back jump tracking method for recognition of face, it is characterized in that, specifically implement according to following steps:

Step 2: the video of all cameras in collection monitoring system;

2. a kind of multi-camera video event back jump tracking method based on recognition of face according to claim 1, is characterized in that, train the detailed process of face cascade classifier as follows in step 1:

1.2, initializes weights:

w_{1, i} \{\begin{matrix} \frac{1}{2 l} & y_{i} = 1 \\ \frac{1}{2 m} & y_{i} = 0 \end{matrix} - - - (1);

q_{t, i} = \frac{w_{t, 1}}{Σ_{i = 1}^{n} w_{t, i}} - - - (2);

ϵ_{f} = Σ_{i = 1}^{n} q_{t, i} | h (x_{i}, f, p, θ) - y_{i} | - - - (4);

w_{t + 1, i} = w_{t, i} β_{t}^{1 - e_{i}} - - - (5);

α_{t} = \log \frac{1}{β_{t}} - - - (8);

3. a kind of multi-camera video event back jump tracking method based on recognition of face according to claim 1, it is characterized in that, in step 3, the detailed process of face tracking is as follows:

If when the face detected within the scope of this is one, be then this person;

4. a kind of multi-camera video event back jump tracking method based on recognition of face according to claim 3, it is characterized in that, in step 3, method for detecting human face is as follows:

Y＝0.257R+0.504G+0.098B+16 (9)；

Cb＝-0.148R-0.291G+0.439B+128 (10)；

Cr＝0.439R-0.368G-0.071B+128 (11)；

5. a kind of multi-camera video event back jump tracking method based on recognition of face according to claim 1, it is characterized in that, the detailed process of step 4 is as follows:

6. a kind of multi-camera video event back jump tracking method based on recognition of face according to claim 5, it is characterized in that, the detailed process of step 4.2 is as follows:

s (p) = \{\begin{matrix} 1, & g_{p} - g_{c} &GreaterEqual; 0 \\ 0, & g_{p} - g_{c} < 0 \end{matrix} - - - (12);

U (LBP) = Σ_{i = 1}^{P} | s (i) - s (i - 1) | - - - (13);

Wherein s (i) represents the binary value of pixel P+1 neighborhood;

7. a kind of multi-camera video event back jump tracking method based on recognition of face according to claim 5, it is characterized in that, the detailed process of step 5 is as follows: