CN109558815A

CN109558815A - A kind of detection of real time multi-human face and tracking

Info

Publication number: CN109558815A
Application number: CN201811365995.6A
Authority: CN
Inventors: 张宁; 李玉惠; 金红; 杨满智; 刘长永; 陈晓光; 蔡琳
Original assignee: NATIONAL COMPUTER VIRUS EMERGENCY RESPONSE CENTER; Heng Jia Jia (beijing) Technology Co Ltd
Current assignee: NATIONAL COMPUTER VIRUS EMERGENCY RESPONSE CENTER; Heng Jia Jia (beijing) Technology Co Ltd
Priority date: 2018-11-16
Filing date: 2018-11-16
Publication date: 2019-04-02

Abstract

The invention discloses a kind of detection of real time multi-human face and trackings, which comprises the image of each video frame is obtained from the video flowing of input；The detection of face location coordinate is carried out to the video frame of acquisition by Face datection model, and face location coordinate is stored to face location coordinate container；Face tracking initialization operation extracts the position coordinates of target face up to taking from face location coordinate container, and the characteristic point of extraction target face carries out the update of subsequent face tracking from characteristic point container；Image pyramid model is established, according to the position of the model prediction current video frame human face target；Track human faces are simultaneously shown.The present invention solves the problems, such as that the accuracy of existing recognition of face and tracking is not high and cannot reach real-time tracking.

Description

A kind of detection of real time multi-human face and tracking

Technical field

The invention belongs to Face datection and tracking fields, and in particular to a kind of detection of real time multi-human face and tracking.

Background technique

With the fast development of science and technology, the relevant technologies based on computer vision are widely applied, wherein face Tracking technique is widely applied under the scenes such as video security protection, automatic gate inhibition, market shopping.

Face tracking technology mainly includes Face datection and face tracking technology.Human face detection tech refers to be looked in picture To face location.Face tracking technology refers to given Initial Face position, and it is pre- that lasting face location is carried out in successive video frames It surveys.The face tracking technology of current main-stream is roughly divided into three kinds by principle, first is that correlation filtering tracking is based on, second is that being based on Deep learning tracking, third is that being based on optical flow tracking method.

Based on correlation filtering tracking it is representative be KCF (Kernelized Correlation Filter) With SRDCF (Spatially Regularized Discriminative Correlation Filter).KCF method uses Circular matrix obtains positive negative sample, and one object detector of training during tracking, is gone under detection using object detector Whether the tracking target of one frame is real goal, then reuses new testing result and goes to update training set and then update target detection Device.The defect of this method is, when object movement speed quickly, when there is boundary effect or dynamic fuzzy phenomenon, tracking can be lost Target.SRDCF method proposes multiple dimensioned, the bigger detection zone solution boundary effect of use, but the method speed of service is very Slowly, it is unable to satisfy requirement of real-time.

Tracking based on deep learning it is representative be MDNet (Multi-Domain Convolutional Neural Network), network is made of the layer multiple-limb of inclusion layer and special domain, and wherein domain corresponds to independent training etc. Grade, and each branch is responsible for one two classification and goes to identify the target in each domain, goes to instruct using iterative manner for each domain Practice network, obtains the general target feature extraction in inclusion layer.When tracking the target in video sequence, in conjunction with pre-training CNN In (Convolution Neural Network) inclusion layer and two classification layers constitute a new networks, while propose online with Track (online tracking), online tracking are real by candidate window obtained through stochastical sampling around assessment former frame target Existing, tracking target's feature-extraction accuracy is higher, but because network parameter is more, using CPU be extremely difficult to real-time target with Track.

Method based on optical flow tracking it is representative be LK (Lucas-Kanade) light stream estimation difference method, the party Method is a kind of infinitesimal optical flow computation method based on gradient, and this optical flow method first has to meet three hypothesis, it is assumed that 1, it is bright It spends constant, is exactly variation of the same point with the time, brightness will not change.Assuming that 2, small movement is exactly the change of time Change the acute variation that will not cause position.Assuming that 3, space is consistent, and it is also neighbour on image that neighbouring point, which projects to, in a scene Near point, and neighbouring spot speed is consistent.Optical flow method can carry out target following under the scene of any complexity.This method can be quasi- Really, target following is rapidly completed, compares and is suitably applied in the smaller terminal of calculating power.

It to be realized on calculating the smaller CPU (such as i5-6200) of power, it requires that the calculation amount of implementation method is wanted Small, algorithm design cannot be too complicated.Compared to correlation filtering tracking and deep learning tracking, it is based on optical flow tracking side The advantage of method is can more rapidly to realize target following, and face is blocked, human face posture expression complexity, the mobile speed of face Degree is fast, tracking environmental background complexity has better robustness, is suitble to realize on calculating the smaller CPU processor of power.

But it is current based on the method for optical flow tracking during handling Face datection and tracking, occur asking as follows Topic:

1. occlusion issue is blocked including person to person, blocking between people and object will cause the missing of face information, this will It directly results in tracking target to lose, tracking accuracy rate decline.

2. dynamic fuzzy and boundary effect, this will cause face information to obscure, and feature extraction inaccuracy directly results in tracking The loss of target.

3. background environment is complicated, background environment includes that variation, color and the object of illumination condition are varied, is tracked sometimes Color of object can be with background environment solid colour, these will all bring huge challenge to face tracking task.

4. efficiency, existing plurality of human faces tracking technique is relatively difficult to meet requirement of real-time, especially compares in calculating power On low CPU device.

There are the feelings such as block, external environment background complexity and illumination condition are changeable in human face posture expression complexity, face Under condition, it is easy to cause the accuracy of tracking to reduce and cannot accomplishes the effect of real-time tracking.

Summary of the invention

Drawbacks described above based on the prior art, the object of the present invention is to provide a kind of detection of real time multi-human face and track sides Method, the accuracy to solve the problems, such as existing recognition of face and tracking is not high and cannot reach real-time tracking.

The technical solution adopted by the invention is as follows:

A kind of detection of real time multi-human face and tracking, which comprises

The image of each video frame is obtained from the video flowing of input；

Carry out the detection of face location coordinate to the video frame of acquisition by Face datection model, and by face location coordinate Store face location coordinate container；

Face tracking initialization operation, the position coordinates that the target face of tracking is extracted from face location coordinate container are straight To taking, use spatial gradient matrix extract target feature point and store to characteristic point container with for subsequent face tracking more Newly；

Image pyramid model is established, according to the position of the model prediction current video frame human face target；

Statistical trace frame number just re-starts a Face datection when the tracking frame number threshold value of tracking frame number satisfaction setting, When being unsatisfactory for, then calculates the face location coordinate frame central point detected and face tracking updates the face location predicted and sits The distance between frame central point is marked, does not need then to carry out face tracking initialization when calculating distance threshold of the distance less than setting, It then needs to carry out face tracking initialization when calculating distance threshold of the distance greater than setting, final result is subjected to display output.

Further, include: to the position coordinates extraction of target face

According to formula:It is each in the given tracking target area A of calculating input image I Spatial gradient the matrix G, A of pixel P_xGradient for target area A in x-axis direction, A_yLadder for target area A in y-axis direction Degree；

Calculate the minimal eigenvalue λ of each G_mAnd storage λ_mGreater than given eigenvalue threshold λ_thCorresponding pixel P, then Judge whether pixel P is greater than other pixels in 3 × 3 neighborhood of surrounding, if it is greater, then retaining pixel P and from all The minimal eigenvalue λ of storage_mMiddle maximizing λ_max, if it is less, no longer retaining, execute operations described below；

Calculate the distance between the pixel that remains distance and with distance threshold distance_thCompare, retains Distance is greater than distance threshold distance_thPixel, these pixels of reservation are the characteristic point extracted, after being used for Continuous face tracking and update.

Further, face tracking is specifically included according to image pyramid model:

Pyramid is established, I is defined₀It is the pyramid bottom, that is, the 0th layer of image, resolution ratio highest, L expression The pyramid number of plies, L take the natural number greater than 1, I_LIndicate L tomographic image；

By the optical flow computation result feedback of pyramid top layer to secondary top layer, g^LEstimate as light stream value of top layer when initial, It is set as 0, the light stream value of secondary top layer is estimated as g^L-1, pyramid top layer namely L-1 layers of light stream value d^L-1,g^L-2=2 (g^L-1+d^L ^-1(the 0+d of)=2^L-1)=2d^L-1, continue on pyramid and feed back downwards, iteration obtains until reaching the pyramid bottom Final original image light stream value d are as follows: d=g⁰+d⁰, final light flow valuve is exactly the superposition of all layers of segmentation light stream value d, it may be assumed that

The target feature point position A (x, y) that target area A is extracted is given by previous frame image and calculates present frame target area Domain B characteristic point position B (x+v_x,y+v_y), v_x,v_yIt is light stream value d in x-axis, the displacement component of y-axis；

The position of tracking target face is shown in current frame image.

Further, behind the position that current frame image draws tracking target face, judge whether all characteristic points It is taken out from face location coordinate container, if container is not sky, continues to take out, if it is empty, then execute operations described below:

If the tracking frame number of statistics is equal to setting tracking frame number threshold value, face is carried out to the first frame image got Detection, if conditions are not met, then carrying out operations described below:

The face location central point f arrived according to tracking prediction_track(x_{t_center},y_{t_center}) and the obtained people of Face datection Face place-centric point f_detection(x_{d_center},y_{d_center}), the distance l of two o'clock is calculated,

Set distance threshold value l_th=15, when distance l is greater than distance threshold l_thWhen, show the face location detected and with The face location difference of track is larger, should re-start face tracking initialization operation according to the face location detected at this time, when Distance l is less than or equal to distance threshold l_thWhen, show that the face location of the face location detected and tracking is not much different, does not have to weight It is new to carry out face tracking initialization, and execute operations described below；

Carry out Face datection and plurality of human faces tracking display；

Judge whether video flowing terminates, is exited if terminating.

Further, the Face datection model includes first network module and the second network module, the first network Module uses three by 2 convolutional layers, 2 active coatings, 2 normalization layers, 2 pond layer compositions, second network module Inception structure composition.

Further, also the face detection model is trained and is tested before using the face detection model.

Further, include: to the training of the face detection model

The face samples pictures under a large amount of natural scenes are obtained, face location mark are carried out to obtained picture, and generate The mark document of xml format；

The human face data completed to mark is cleaned, face resolution ratio directly removing less than 20 × 20；

The data that cleaning is completed directly generate lmdb formatted file, for carrying out data in deep learning frame caffe It reads；

The network model for completing lightweight is built；

Starting model training, face predicts that loss function uses softmax loss function,Wherein, y_iIndicate i-th group of data and corresponding mark classification, if Actually this group of data are faces, then y=1, if actually this group of data are not face, y=0, f (x_i, θ) and it indicates to be predicted as The probability value of face, x_iIndicate the input of i-th group of data, θ indicate can learning parameter, m indicates sample number；

Backpropagation, using stochastic gradient descent algorithm, continuous iteration, the value for enabling loss function obtain is as close possible to 0；

If reaching the number of iterations of setting, terminate, such as not up to continues to train.

Compared with prior art, a kind of detection of real time multi-human face disclosed in this invention and tracking, reached as Lower technical effect:

1, the characteristic point that the present invention extracts can represent the main feature of target to be tracked, even if in extraneous illumination condition It is complicated, background is complicated, there are dynamic fuzzy and boundary effect, exist block etc. under the conditions of also there is good generalization ability.? That is the characteristic point extracted still can characterize target signature even if environment is more complicated, there are small areas to block for target , complete tracking.

2, the present invention sets correction condition to the tracking of target, when encountering extreme condition, such as the very strong video of illumination It is a piece of black that video presentation under a piece of white or no light condition is presented, or the characteristic point extracted has been blocked (note completely just Meaning characteristic point is dispersed in target area, even if fraction characteristic point is blocked or lacks or can continue tracking) this The case where Shi Huiyou BREAK TRACK, it is necessary to re-start Face datection, face tracking initialization, face tracking update.

3, by the accurate detection of Face datection model realization face, the real-time of dynamic human face is realized by tracking Tracking.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.

Fig. 1 is the model schematic of the Inception structure in Face datection model described in the embodiment of the present invention.

Fig. 2 is the network structure of Face datection model described in the embodiment of the present invention.

Fig. 3 is the schematic diagram of Face datection model described in the embodiment of the present invention being trained.

Fig. 4 is the schematic diagram that Face datection model described in the embodiment of the present invention is tested.

Fig. 5 is the detection of real time multi-human face and the complete flow diagram of tracking described in the embodiment of the present invention.

Specific embodiment

Technical solution in order to enable those skilled in the art to better understand the present invention, with reference to the accompanying drawing and specific embodiment party Present invention is further described in detail for formula.

The present invention will handle all faces detected in input video frame, and track to it, both include to individual human face Tracking, also include the tracking to multiple faces.The present invention is based on optical flow tracking methods, can faster realize target following, Face is blocked, human face posture expression is complicated, face movement speed is fast, tracking environmental background complexity has better robustness, It is suitble to realize on calculating the smaller CPU processor of power.

A kind of detection of plurality of human faces disclosed in the embodiment of the present invention and tracking are based on Face datection model and face Trace model realizes that Face datection is realized using depth learning technology, and face tracking is real using the tracking based on light stream It is existing.

Shown in reference picture 1, Fig. 2, in the present embodiment, Face datection model is to be divided into two sons using structure end to end Module, first network module and the second network module.First network module is by 2 convolutional layers, 2 active coatings, 2 normalization Layer, 2 pond layer compositions, main function is the characteristic information of rapidly extracting input picture, by first time convolution algorithm, activation Processing, normalized and Chi Huahou, using being exported behind second of convolution algorithm, activation processing, normalized and pond To the second network module, since first network module network model is simple, model parameter is less, and calculation amount is also small, can compare Quickly extract face characteristic information.For second network module by using 3 Inception structures, Fig. 1 is Inception knot The network structure of structure, Inception structure include a variety of different convolution branches, can get a variety of receptive fields, can be to each Kind scale in other words preferably detects multiple dimensioned face.Picture after three Inception pattern handlings, can be more Add the face of accurate detection various scales into input picture.

By Face datection model, the position coordinates and picture of face frame are exported, and face tracking module is then according to coordinate The region for the human face target to be tracked is found from input picture.

In order to realize the accurate detection of Face datection model, before testing, to model training and test.

Training process to Face datection model is as shown in figure 3, specifically comprise the following steps:

1) face picture under a large amount of natural scenes is obtained, from web crawlers, client provides or public data collection, Face location mark is carried out to obtained picture, and generates xml format mark document, is executed 2).

2) human face data completed to mark is cleaned, and face resolution ratio directly removing less than 20 × 20 is not used in Training, prevents network model from not restraining, and executes 3).

3) data that cleaning is completed directly generate lmdb formatted file, for being counted in deep learning frame caffe According to reading, execute 4).

4) network model for completing lightweight is built, and is executed 5).

5) start model training, face predicts that loss function uses softmax loss, y_iIndicate i-th group of data and correspondence Mark classification, if actually this group of data are face, y=1, if actually this group of data are not face, y=0.f(x_i,θ) Indicate the probability value for being predicted as face, x_iIndicate the input of i-th group of data, θ indicate can learning parameter, m indicates sample number.Formula is such as Under:

Backpropagation, using stochastic gradient descent algorithm, continuous iteration, the value for enabling loss function obtain as close possible to 0, It executes 6).

If 6) reach the number of iterations of setting, terminate, if 5) the number of iterations of not up to setting, executes.Instruction Data format after the completion of white silk is caffemodel, is to need designated model store path when calling.

By being trained to model, can it is more accurate it is quick detection recognize face.

After being finished to Face datection model training, model is tested.Test process is as shown in Figure 4.

1) 2) input video stream executes；

2) video frame is obtained, is executed 3)；

3) picture format is converted, is executed 4), the channel hwc of picture is converted into the channel cwh；

4) it enters data into and gives Face datection model, execute 5)

5) result is exported.Export face location coordinate and face probability value in current video frame.

After rule are closed in the test of Face datection model, face tracking process is entered, the input of face tracking port is people The face frame coordinate (upper left corner starting point (x, y) width (width) is high (height)) and picture of face detection model output, face tracking Mesh target area to be tracked can be found from input picture according to coordinate.Picture is obtained from video flowing, is first carried out after getting Then Face datection judges whether to face tracking initialization, then judge whether to face tracking update, under finally obtaining One or end.

Before formally using model, trained model is tested, to improve the accuracy for using model.

Referring to Figure 5, the process of a complete face tracking is as follows:

1) detection model initializes, and executes 2).

2) 3) input video stream executes.

3) video frame images I is obtained from input video stream, is executed 4).

4) Face datection is carried out to the first frame image got, executed 5).

If 5) detect face, execute 6), if not detecting face, executes 22).

6) face location coordinate is stored in face location coordinate container, is executed 7).

If 7) carry out face tracking initialization, execute 8), if initialized without face tracking, executes 12).

8) 1 face position coordinates is obtained from face location coordinate container, then is executed 9).

9) target human face characteristic point, given tracking target area A (being exactly face location coordinate) of calculating input image I are extracted In each pixel P spatial gradient matrix G, A_xGradient for target area A in x-axis direction, A_yIt is target area A in y-axis side To gradient,

Calculate the minimal eigenvalue λ of each G_mAnd storage λ_mGreater than given eigenvalue threshold λ_thCorresponding pixel P, then Judge whether pixel P is greater than other pixels in 3 × 3 neighborhood of surrounding, if it is greater, then retaining pixel P and from all The minimal eigenvalue λ of storage_mMiddle maximizing λ_max, if it is less, no longer retaining, execute 10).The feature of target face The extraction of point is the method using spatial gradient matrix, and characteristic point container is arrived in storage after extraction, for the updated of face tracking Journey.

10) finally calculate the distance between pixel remained dis tan ce and with distance threshold dis tan ce_thCompare, retains dis tan ce and be greater than distance threshold dis tan ce_thPixel, these pixels of reservation are to mention 11) characteristic point taken is executed for tracking.By step 9), 10), realize the extraction of characteristic point, the characteristic point of extraction can Characterize target face characteristic, though extraneous illumination condition is complicated, background is complicated, there are dynamic fuzzy and boundary effect, exist Small area also has good generalization ability under the conditions of blocking etc., can complete subsequent tracking.

11) if face location coordinate container is sky, execute 12), if face location coordinate container is not sky, holds Row 8).

If 12) carry out face tracking update, execute 13), if updated without face tracking, executes 22) straight Row video is tapped into show.

13) characteristic point of the target face of initialization is taken out from characteristic point container, is executed 14).

14) 15) tracking frame counting number is executed for statistical trace how many frame.

15) image pyramid processing establishes pyramid, defines I₀It is the pyramid bottom, that is, the 0th layer of image, Its resolution ratio highest.L indicates the pyramid number of plies, and L usually takes 2,3,4.I_LIndicate L tomographic image.

Optical flow computation result (misalignment) feedback of pyramid top layer (L-1 layers) arrives time top layer (L-2 layers), g^LMake The light stream value for being top layer when initial estimation, is set as 0, the light stream value of secondary top layer is estimated as g^L-1, pyramid top layer (L-1 layers) Light stream value d^L-1,

g^L-2=2 (g^L-1+d^L-1(the 0+d of)=2^L-1)=2d^L-1

It continues on pyramid to feed back downwards, iteration, until reaching the pyramid bottom.

Final original image light stream value d are as follows:

D=g⁰+d⁰

Final light flow valuve is exactly the superposition of all layers of segmentation light stream value d,

The target feature point position A (x, y) that target area A is extracted is given by previous frame image and calculates present frame target area Domain B characteristic point position B (x+v_x,y+v_y), v_x,v_y16) as light stream value d is executed in x-axis, y-axis displacement component.

16) tracking position of object is drawn in current frame image, executed 17).

17) judge whether from container to take out all characteristic points, if feature container is sky, execute 18), if not 13) sky then executes.

18) if the tracking frame number counted is equal to tracking frame number threshold value Δ_frame=10, then execute 4), 19), if discontented It is sufficient then execute 22).

19) 20) position coordinates for obtaining Face datection, execute.

20) the face location central point f arrived according to tracking prediction_track(x_{t_center},y_{t_center}) and Face datection obtain Face location central point f_detection(x_{d_center},y_{d_center}), the distance l of two o'clock is calculated,

21) set distance threshold value l_th=15, when distance l is greater than distance threshold l_thWhen, show the face location detected and The face location difference of tracking is larger, should re-start face tracking initialization according to the face location detected at this time, execute 7).When distance l is less than or equal to distance threshold l_thWhen, show that the face location of the face location detected and tracking is not much different, Without re-starting face tracking initialization, execute 22).Step 20) 21) sets correction condition, when encountering extreme condition, Characteristic point that is a piece of black, or extracting is presented as video under a piece of white or no light condition is presented in the very strong video of illumination Just blocked completely (attention characteristics point is dispersed in target area, though fraction characteristic point be blocked or lack also It is that can continue tracking), it the case where at this moment having BREAK TRACK, just needs to re-start Face datection, face at this time Tracking initialization, face tracking update.

22) Face datection and plurality of human faces tracking display are carried out, the position of face frame is shown in video frame, no matter face is examined Survey or the result of plurality of human faces trace back be all face frame position coordinates, finally will in video frame by face location with square The form of shape is drawn, and is executed 23).

23) judge whether video flowing terminates, execute 24) if terminating, executed 3) if being not finished

24) entire program is exited.

The present invention through the above steps, human face posture expression is complicated, face exist block, external environment background complexity with And when illumination condition is changeable, it also can be realized the accuracy of tracking and accomplish the effect of real-time tracking.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features； And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims

1. detection and the tracking of a kind of real time multi-human face, which is characterized in that the described method includes:

The image of each video frame is obtained from the video flowing of input；

The detection of face location coordinate is carried out to the video frame of acquisition by Face datection model, and face location coordinate is stored To face location coordinate container；

Face tracking initialization operation extracts the position coordinates of target face up to taking from face location coordinate container, mentions The characteristic point of human face target is taken to store to characteristic point container to update for the tracking of succeeding target face；

Statistical trace frame number just re-starts a Face datection, when or not the tracking frame number threshold value of tracking frame number satisfaction setting When meeting, then calculates the face location coordinate frame central point detected and face tracking updates the face location coordinate frame predicted The distance between central point does not need then to carry out face tracking initialization, works as meter when calculating distance threshold of the distance less than setting The distance threshold that distance is calculated greater than setting then needs to carry out face tracking initialization, and final result is carried out display output.

2. detection and the tracking of real time multi-human face according to claim 1, which is characterized in that the position of target face Setting coordinate extraction includes:

According to formula:Each pixel in the given tracking target area A of calculating input image I Spatial gradient the matrix G, A of point P_xGradient for target area A in x-axis direction, A_yGradient for target area A in y-axis direction,

Calculate the minimal eigenvalue λ of each G_mAnd storage λ_mGreater than given eigenvalue threshold λ_thCorresponding pixel P, then judge Whether pixel P is greater than other pixels in 3 × 3 neighborhood of surrounding, if it is greater, then retaining pixel P and from all storages Minimal eigenvalue λ_mMiddle maximizing λ_max, if it is less, no longer retaining, execute operations described below；

3. detection and the tracking of real time multi-human face according to claim 2, which is characterized in that according to image pyramid Model specifically includes face tracking:

Pyramid is established, I is defined₀It is the pyramid bottom, that is, the 0th layer of image, resolution ratio highest, L expression pyramid The number of plies, L take the natural number greater than 1, I_LIndicate L tomographic image；

By the optical flow computation result feedback of pyramid top layer to secondary top layer, g^LAs light stream value estimation of top layer when initial, it is set as 0, The light stream value of secondary top layer is estimated as g^L-1, pyramid top layer namely L-1 layers of light stream value d^L-1,g^L-2=2 (g^L-1+d^L-1)=2 (0+d^L-1)=2d^L-1, continue on pyramid and feed back downwards, iteration obtains final until reaching the pyramid bottom Original image light stream value d are as follows: d=g⁰+d⁰, final light flow valuve is exactly the superposition of all layers of segmentation light stream value d, it may be assumed that

The target feature point position A (x, y) that target area A is extracted is given by previous frame image and calculates present frame target area B Characteristic point position B (x+v_x,y+v_y), v_x,v_yIt is light stream value d in x-axis, the displacement component of y-axis；

The position of tracking target face is shown in current frame image.

4. detection and the tracking of real time multi-human face according to claim 3, which is characterized in that drawn in current frame image Behind the position for tracking target face out, judge whether to take out all characteristic points from face location coordinate container, if container It is not sky, then continues to take out, if it is empty, then execute operations described below:

If the tracking frame number of statistics is equal to setting tracking frame number threshold value, face inspection is carried out to the first frame image got It surveys, if conditions are not met, then carrying out operations described below:

The face location central point f arrived according to tracking prediction_track(x_{t_center},y_{t_center}) and the obtained face position of Face datection Set central point f_detection(x_{d_center},y_{d_center}), the distance l of two o'clock is calculated,

Set distance threshold value l_th=15, when distance l is greater than distance threshold l_thWhen, show the people of the face location detected and tracking Face position difference is larger, face tracking initialization operation should be re-started according to the face location detected at this time, when distance l is small In equal to distance threshold l_thWhen, show that the face location of the face location detected and tracking is not much different, without re-starting Face tracking initialization, and execute operations described below；

Carry out Face datection and plurality of human faces tracking display；

Judge whether video flowing terminates, is exited if terminating.

5. detection and the tracking of real time multi-human face according to claim 1, which is characterized in that the Face datection mould Type includes first network module and the second network module, and the first network module is by 2 convolutional layers, and 2 active coatings, 2 are returned One changes layer, 2 pond layer compositions, and second network module uses three Inception structure compositions.

6. detection and the tracking of real time multi-human face according to claim 5, which is characterized in that examined using the face Also the face detection model is trained and is tested before surveying model.

7. detection and the tracking of real time multi-human face according to claim 6, which is characterized in that the Face datection mould The training of type includes:

The face samples pictures under a large amount of natural scenes are obtained, face location mark are carried out to obtained picture, and generate xml lattice The mark document of formula；

The data that cleaning is completed directly generate lmdb formatted file, for carrying out reading data in deep learning frame caffe；

The network model for completing lightweight is built；

8. detection and the tracking of real time multi-human face according to claim 7, which is characterized in that complete to model training And then carry out the test of Face datection, comprising:

Input video stream；

Obtain video frame；

It is converted into the picture format that model can identify；

It enters data into network model；

Export result.

9. detection and the tracking of real time multi-human face according to claim 7, which is characterized in that obtain face sample graph Piece includes: web crawlers mode, third party client provides or disclosed set of data samples.