CN109558815A - A kind of detection of real time multi-human face and tracking - Google Patents
A kind of detection of real time multi-human face and tracking Download PDFInfo
- Publication number
- CN109558815A CN109558815A CN201811365995.6A CN201811365995A CN109558815A CN 109558815 A CN109558815 A CN 109558815A CN 201811365995 A CN201811365995 A CN 201811365995A CN 109558815 A CN109558815 A CN 109558815A
- Authority
- CN
- China
- Prior art keywords
- face
- tracking
- detection
- distance
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of detection of real time multi-human face and trackings, which comprises the image of each video frame is obtained from the video flowing of input;The detection of face location coordinate is carried out to the video frame of acquisition by Face datection model, and face location coordinate is stored to face location coordinate container;Face tracking initialization operation extracts the position coordinates of target face up to taking from face location coordinate container, and the characteristic point of extraction target face carries out the update of subsequent face tracking from characteristic point container;Image pyramid model is established, according to the position of the model prediction current video frame human face target;Track human faces are simultaneously shown.The present invention solves the problems, such as that the accuracy of existing recognition of face and tracking is not high and cannot reach real-time tracking.
Description
Technical field
The invention belongs to Face datection and tracking fields, and in particular to a kind of detection of real time multi-human face and tracking.
Background technique
With the fast development of science and technology, the relevant technologies based on computer vision are widely applied, wherein face
Tracking technique is widely applied under the scenes such as video security protection, automatic gate inhibition, market shopping.
Face tracking technology mainly includes Face datection and face tracking technology.Human face detection tech refers to be looked in picture
To face location.Face tracking technology refers to given Initial Face position, and it is pre- that lasting face location is carried out in successive video frames
It surveys.The face tracking technology of current main-stream is roughly divided into three kinds by principle, first is that correlation filtering tracking is based on, second is that being based on
Deep learning tracking, third is that being based on optical flow tracking method.
Based on correlation filtering tracking it is representative be KCF (Kernelized Correlation Filter)
With SRDCF (Spatially Regularized Discriminative Correlation Filter).KCF method uses
Circular matrix obtains positive negative sample, and one object detector of training during tracking, is gone under detection using object detector
Whether the tracking target of one frame is real goal, then reuses new testing result and goes to update training set and then update target detection
Device.The defect of this method is, when object movement speed quickly, when there is boundary effect or dynamic fuzzy phenomenon, tracking can be lost
Target.SRDCF method proposes multiple dimensioned, the bigger detection zone solution boundary effect of use, but the method speed of service is very
Slowly, it is unable to satisfy requirement of real-time.
Tracking based on deep learning it is representative be MDNet (Multi-Domain Convolutional
Neural Network), network is made of the layer multiple-limb of inclusion layer and special domain, and wherein domain corresponds to independent training etc.
Grade, and each branch is responsible for one two classification and goes to identify the target in each domain, goes to instruct using iterative manner for each domain
Practice network, obtains the general target feature extraction in inclusion layer.When tracking the target in video sequence, in conjunction with pre-training CNN
In (Convolution Neural Network) inclusion layer and two classification layers constitute a new networks, while propose online with
Track (online tracking), online tracking are real by candidate window obtained through stochastical sampling around assessment former frame target
Existing, tracking target's feature-extraction accuracy is higher, but because network parameter is more, using CPU be extremely difficult to real-time target with
Track.
Method based on optical flow tracking it is representative be LK (Lucas-Kanade) light stream estimation difference method, the party
Method is a kind of infinitesimal optical flow computation method based on gradient, and this optical flow method first has to meet three hypothesis, it is assumed that 1, it is bright
It spends constant, is exactly variation of the same point with the time, brightness will not change.Assuming that 2, small movement is exactly the change of time
Change the acute variation that will not cause position.Assuming that 3, space is consistent, and it is also neighbour on image that neighbouring point, which projects to, in a scene
Near point, and neighbouring spot speed is consistent.Optical flow method can carry out target following under the scene of any complexity.This method can be quasi-
Really, target following is rapidly completed, compares and is suitably applied in the smaller terminal of calculating power.
It to be realized on calculating the smaller CPU (such as i5-6200) of power, it requires that the calculation amount of implementation method is wanted
Small, algorithm design cannot be too complicated.Compared to correlation filtering tracking and deep learning tracking, it is based on optical flow tracking side
The advantage of method is can more rapidly to realize target following, and face is blocked, human face posture expression complexity, the mobile speed of face
Degree is fast, tracking environmental background complexity has better robustness, is suitble to realize on calculating the smaller CPU processor of power.
But it is current based on the method for optical flow tracking during handling Face datection and tracking, occur asking as follows
Topic:
1. occlusion issue is blocked including person to person, blocking between people and object will cause the missing of face information, this will
It directly results in tracking target to lose, tracking accuracy rate decline.
2. dynamic fuzzy and boundary effect, this will cause face information to obscure, and feature extraction inaccuracy directly results in tracking
The loss of target.
3. background environment is complicated, background environment includes that variation, color and the object of illumination condition are varied, is tracked sometimes
Color of object can be with background environment solid colour, these will all bring huge challenge to face tracking task.
4. efficiency, existing plurality of human faces tracking technique is relatively difficult to meet requirement of real-time, especially compares in calculating power
On low CPU device.
There are the feelings such as block, external environment background complexity and illumination condition are changeable in human face posture expression complexity, face
Under condition, it is easy to cause the accuracy of tracking to reduce and cannot accomplishes the effect of real-time tracking.
Summary of the invention
Drawbacks described above based on the prior art, the object of the present invention is to provide a kind of detection of real time multi-human face and track sides
Method, the accuracy to solve the problems, such as existing recognition of face and tracking is not high and cannot reach real-time tracking.
The technical solution adopted by the invention is as follows:
A kind of detection of real time multi-human face and tracking, which comprises
The image of each video frame is obtained from the video flowing of input;
Carry out the detection of face location coordinate to the video frame of acquisition by Face datection model, and by face location coordinate
Store face location coordinate container;
Face tracking initialization operation, the position coordinates that the target face of tracking is extracted from face location coordinate container are straight
To taking, use spatial gradient matrix extract target feature point and store to characteristic point container with for subsequent face tracking more
Newly;
Image pyramid model is established, according to the position of the model prediction current video frame human face target;
Statistical trace frame number just re-starts a Face datection when the tracking frame number threshold value of tracking frame number satisfaction setting,
When being unsatisfactory for, then calculates the face location coordinate frame central point detected and face tracking updates the face location predicted and sits
The distance between frame central point is marked, does not need then to carry out face tracking initialization when calculating distance threshold of the distance less than setting,
It then needs to carry out face tracking initialization when calculating distance threshold of the distance greater than setting, final result is subjected to display output.
Further, include: to the position coordinates extraction of target face
According to formula:It is each in the given tracking target area A of calculating input image I
Spatial gradient the matrix G, A of pixel PxGradient for target area A in x-axis direction, AyLadder for target area A in y-axis direction
Degree;
Calculate the minimal eigenvalue λ of each GmAnd storage λmGreater than given eigenvalue threshold λthCorresponding pixel P, then
Judge whether pixel P is greater than other pixels in 3 × 3 neighborhood of surrounding, if it is greater, then retaining pixel P and from all
The minimal eigenvalue λ of storagemMiddle maximizing λmax, if it is less, no longer retaining, execute operations described below;
Calculate the distance between the pixel that remains distance and with distance threshold distancethCompare, retains
Distance is greater than distance threshold distancethPixel, these pixels of reservation are the characteristic point extracted, after being used for
Continuous face tracking and update.
Further, face tracking is specifically included according to image pyramid model:
Pyramid is established, I is defined0It is the pyramid bottom, that is, the 0th layer of image, resolution ratio highest, L expression
The pyramid number of plies, L take the natural number greater than 1, ILIndicate L tomographic image;
By the optical flow computation result feedback of pyramid top layer to secondary top layer, gLEstimate as light stream value of top layer when initial,
It is set as 0, the light stream value of secondary top layer is estimated as gL-1, pyramid top layer namely L-1 layers of light stream value dL-1,gL-2=2 (gL-1+dL -1(the 0+d of)=2L-1)=2dL-1, continue on pyramid and feed back downwards, iteration obtains until reaching the pyramid bottom
Final original image light stream value d are as follows: d=g0+d0, final light flow valuve is exactly the superposition of all layers of segmentation light stream value d, it may be assumed that
The target feature point position A (x, y) that target area A is extracted is given by previous frame image and calculates present frame target area
Domain B characteristic point position B (x+vx,y+vy), vx,vyIt is light stream value d in x-axis, the displacement component of y-axis;
The position of tracking target face is shown in current frame image.
Further, behind the position that current frame image draws tracking target face, judge whether all characteristic points
It is taken out from face location coordinate container, if container is not sky, continues to take out, if it is empty, then execute operations described below:
If the tracking frame number of statistics is equal to setting tracking frame number threshold value, face is carried out to the first frame image got
Detection, if conditions are not met, then carrying out operations described below:
The face location central point f arrived according to tracking predictiontrack(xt_center,yt_center) and the obtained people of Face datection
Face place-centric point fdetection(xd_center,yd_center), the distance l of two o'clock is calculated,
Set distance threshold value lth=15, when distance l is greater than distance threshold lthWhen, show the face location detected and with
The face location difference of track is larger, should re-start face tracking initialization operation according to the face location detected at this time, when
Distance l is less than or equal to distance threshold lthWhen, show that the face location of the face location detected and tracking is not much different, does not have to weight
It is new to carry out face tracking initialization, and execute operations described below;
Carry out Face datection and plurality of human faces tracking display;
Judge whether video flowing terminates, is exited if terminating.
Further, the Face datection model includes first network module and the second network module, the first network
Module uses three by 2 convolutional layers, 2 active coatings, 2 normalization layers, 2 pond layer compositions, second network module
Inception structure composition.
Further, also the face detection model is trained and is tested before using the face detection model.
Further, include: to the training of the face detection model
The face samples pictures under a large amount of natural scenes are obtained, face location mark are carried out to obtained picture, and generate
The mark document of xml format;
The human face data completed to mark is cleaned, face resolution ratio directly removing less than 20 × 20;
The data that cleaning is completed directly generate lmdb formatted file, for carrying out data in deep learning frame caffe
It reads;
The network model for completing lightweight is built;
Starting model training, face predicts that loss function uses softmax loss function,Wherein, yiIndicate i-th group of data and corresponding mark classification, if
Actually this group of data are faces, then y=1, if actually this group of data are not face, y=0, f (xi, θ) and it indicates to be predicted as
The probability value of face, xiIndicate the input of i-th group of data, θ indicate can learning parameter, m indicates sample number;
Backpropagation, using stochastic gradient descent algorithm, continuous iteration, the value for enabling loss function obtain is as close possible to 0;
If reaching the number of iterations of setting, terminate, such as not up to continues to train.
Compared with prior art, a kind of detection of real time multi-human face disclosed in this invention and tracking, reached as
Lower technical effect:
1, the characteristic point that the present invention extracts can represent the main feature of target to be tracked, even if in extraneous illumination condition
It is complicated, background is complicated, there are dynamic fuzzy and boundary effect, exist block etc. under the conditions of also there is good generalization ability.?
That is the characteristic point extracted still can characterize target signature even if environment is more complicated, there are small areas to block for target
, complete tracking.
2, the present invention sets correction condition to the tracking of target, when encountering extreme condition, such as the very strong video of illumination
It is a piece of black that video presentation under a piece of white or no light condition is presented, or the characteristic point extracted has been blocked (note completely just
Meaning characteristic point is dispersed in target area, even if fraction characteristic point is blocked or lacks or can continue tracking) this
The case where Shi Huiyou BREAK TRACK, it is necessary to re-start Face datection, face tracking initialization, face tracking update.
3, by the accurate detection of Face datection model realization face, the real-time of dynamic human face is realized by tracking
Tracking.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair
Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the model schematic of the Inception structure in Face datection model described in the embodiment of the present invention.
Fig. 2 is the network structure of Face datection model described in the embodiment of the present invention.
Fig. 3 is the schematic diagram of Face datection model described in the embodiment of the present invention being trained.
Fig. 4 is the schematic diagram that Face datection model described in the embodiment of the present invention is tested.
Fig. 5 is the detection of real time multi-human face and the complete flow diagram of tracking described in the embodiment of the present invention.
Specific embodiment
Technical solution in order to enable those skilled in the art to better understand the present invention, with reference to the accompanying drawing and specific embodiment party
Present invention is further described in detail for formula.
The present invention will handle all faces detected in input video frame, and track to it, both include to individual human face
Tracking, also include the tracking to multiple faces.The present invention is based on optical flow tracking methods, can faster realize target following,
Face is blocked, human face posture expression is complicated, face movement speed is fast, tracking environmental background complexity has better robustness,
It is suitble to realize on calculating the smaller CPU processor of power.
A kind of detection of plurality of human faces disclosed in the embodiment of the present invention and tracking are based on Face datection model and face
Trace model realizes that Face datection is realized using depth learning technology, and face tracking is real using the tracking based on light stream
It is existing.
Shown in reference picture 1, Fig. 2, in the present embodiment, Face datection model is to be divided into two sons using structure end to end
Module, first network module and the second network module.First network module is by 2 convolutional layers, 2 active coatings, 2 normalization
Layer, 2 pond layer compositions, main function is the characteristic information of rapidly extracting input picture, by first time convolution algorithm, activation
Processing, normalized and Chi Huahou, using being exported behind second of convolution algorithm, activation processing, normalized and pond
To the second network module, since first network module network model is simple, model parameter is less, and calculation amount is also small, can compare
Quickly extract face characteristic information.For second network module by using 3 Inception structures, Fig. 1 is Inception knot
The network structure of structure, Inception structure include a variety of different convolution branches, can get a variety of receptive fields, can be to each
Kind scale in other words preferably detects multiple dimensioned face.Picture after three Inception pattern handlings, can be more
Add the face of accurate detection various scales into input picture.
By Face datection model, the position coordinates and picture of face frame are exported, and face tracking module is then according to coordinate
The region for the human face target to be tracked is found from input picture.
In order to realize the accurate detection of Face datection model, before testing, to model training and test.
Training process to Face datection model is as shown in figure 3, specifically comprise the following steps:
1) face picture under a large amount of natural scenes is obtained, from web crawlers, client provides or public data collection,
Face location mark is carried out to obtained picture, and generates xml format mark document, is executed 2).
2) human face data completed to mark is cleaned, and face resolution ratio directly removing less than 20 × 20 is not used in
Training, prevents network model from not restraining, and executes 3).
3) data that cleaning is completed directly generate lmdb formatted file, for being counted in deep learning frame caffe
According to reading, execute 4).
4) network model for completing lightweight is built, and is executed 5).
5) start model training, face predicts that loss function uses softmax loss, yiIndicate i-th group of data and correspondence
Mark classification, if actually this group of data are face, y=1, if actually this group of data are not face, y=0.f(xi,θ)
Indicate the probability value for being predicted as face, xiIndicate the input of i-th group of data, θ indicate can learning parameter, m indicates sample number.Formula is such as
Under:
Backpropagation, using stochastic gradient descent algorithm, continuous iteration, the value for enabling loss function obtain as close possible to 0,
It executes 6).
If 6) reach the number of iterations of setting, terminate, if 5) the number of iterations of not up to setting, executes.Instruction
Data format after the completion of white silk is caffemodel, is to need designated model store path when calling.
By being trained to model, can it is more accurate it is quick detection recognize face.
After being finished to Face datection model training, model is tested.Test process is as shown in Figure 4.
1) 2) input video stream executes;
2) video frame is obtained, is executed 3);
3) picture format is converted, is executed 4), the channel hwc of picture is converted into the channel cwh;
4) it enters data into and gives Face datection model, execute 5)
5) result is exported.Export face location coordinate and face probability value in current video frame.
After rule are closed in the test of Face datection model, face tracking process is entered, the input of face tracking port is people
The face frame coordinate (upper left corner starting point (x, y) width (width) is high (height)) and picture of face detection model output, face tracking
Mesh target area to be tracked can be found from input picture according to coordinate.Picture is obtained from video flowing, is first carried out after getting
Then Face datection judges whether to face tracking initialization, then judge whether to face tracking update, under finally obtaining
One or end.
Before formally using model, trained model is tested, to improve the accuracy for using model.
Referring to Figure 5, the process of a complete face tracking is as follows:
1) detection model initializes, and executes 2).
2) 3) input video stream executes.
3) video frame images I is obtained from input video stream, is executed 4).
4) Face datection is carried out to the first frame image got, executed 5).
If 5) detect face, execute 6), if not detecting face, executes 22).
6) face location coordinate is stored in face location coordinate container, is executed 7).
If 7) carry out face tracking initialization, execute 8), if initialized without face tracking, executes 12).
8) 1 face position coordinates is obtained from face location coordinate container, then is executed 9).
9) target human face characteristic point, given tracking target area A (being exactly face location coordinate) of calculating input image I are extracted
In each pixel P spatial gradient matrix G, AxGradient for target area A in x-axis direction, AyIt is target area A in y-axis side
To gradient,
Calculate the minimal eigenvalue λ of each GmAnd storage λmGreater than given eigenvalue threshold λthCorresponding pixel P, then
Judge whether pixel P is greater than other pixels in 3 × 3 neighborhood of surrounding, if it is greater, then retaining pixel P and from all
The minimal eigenvalue λ of storagemMiddle maximizing λmax, if it is less, no longer retaining, execute 10).The feature of target face
The extraction of point is the method using spatial gradient matrix, and characteristic point container is arrived in storage after extraction, for the updated of face tracking
Journey.
10) finally calculate the distance between pixel remained dis tan ce and with distance threshold dis tan
cethCompare, retains dis tan ce and be greater than distance threshold dis tan cethPixel, these pixels of reservation are to mention
11) characteristic point taken is executed for tracking.By step 9), 10), realize the extraction of characteristic point, the characteristic point of extraction can
Characterize target face characteristic, though extraneous illumination condition is complicated, background is complicated, there are dynamic fuzzy and boundary effect, exist
Small area also has good generalization ability under the conditions of blocking etc., can complete subsequent tracking.
11) if face location coordinate container is sky, execute 12), if face location coordinate container is not sky, holds
Row 8).
If 12) carry out face tracking update, execute 13), if updated without face tracking, executes 22) straight
Row video is tapped into show.
13) characteristic point of the target face of initialization is taken out from characteristic point container, is executed 14).
14) 15) tracking frame counting number is executed for statistical trace how many frame.
15) image pyramid processing establishes pyramid, defines I0It is the pyramid bottom, that is, the 0th layer of image,
Its resolution ratio highest.L indicates the pyramid number of plies, and L usually takes 2,3,4.ILIndicate L tomographic image.
Optical flow computation result (misalignment) feedback of pyramid top layer (L-1 layers) arrives time top layer (L-2 layers), gLMake
The light stream value for being top layer when initial estimation, is set as 0, the light stream value of secondary top layer is estimated as gL-1, pyramid top layer (L-1 layers)
Light stream value dL-1,
gL-2=2 (gL-1+dL-1(the 0+d of)=2L-1)=2dL-1
It continues on pyramid to feed back downwards, iteration, until reaching the pyramid bottom.
Final original image light stream value d are as follows:
D=g0+d0
Final light flow valuve is exactly the superposition of all layers of segmentation light stream value d,
The target feature point position A (x, y) that target area A is extracted is given by previous frame image and calculates present frame target area
Domain B characteristic point position B (x+vx,y+vy), vx,vy16) as light stream value d is executed in x-axis, y-axis displacement component.
16) tracking position of object is drawn in current frame image, executed 17).
17) judge whether from container to take out all characteristic points, if feature container is sky, execute 18), if not
13) sky then executes.
18) if the tracking frame number counted is equal to tracking frame number threshold value Δframe=10, then execute 4), 19), if discontented
It is sufficient then execute 22).
19) 20) position coordinates for obtaining Face datection, execute.
20) the face location central point f arrived according to tracking predictiontrack(xt_center,yt_center) and Face datection obtain
Face location central point fdetection(xd_center,yd_center), the distance l of two o'clock is calculated,
21) set distance threshold value lth=15, when distance l is greater than distance threshold lthWhen, show the face location detected and
The face location difference of tracking is larger, should re-start face tracking initialization according to the face location detected at this time, execute
7).When distance l is less than or equal to distance threshold lthWhen, show that the face location of the face location detected and tracking is not much different,
Without re-starting face tracking initialization, execute 22).Step 20) 21) sets correction condition, when encountering extreme condition,
Characteristic point that is a piece of black, or extracting is presented as video under a piece of white or no light condition is presented in the very strong video of illumination
Just blocked completely (attention characteristics point is dispersed in target area, though fraction characteristic point be blocked or lack also
It is that can continue tracking), it the case where at this moment having BREAK TRACK, just needs to re-start Face datection, face at this time
Tracking initialization, face tracking update.
22) Face datection and plurality of human faces tracking display are carried out, the position of face frame is shown in video frame, no matter face is examined
Survey or the result of plurality of human faces trace back be all face frame position coordinates, finally will in video frame by face location with square
The form of shape is drawn, and is executed 23).
23) judge whether video flowing terminates, execute 24) if terminating, executed 3) if being not finished
24) entire program is exited.
The present invention through the above steps, human face posture expression is complicated, face exist block, external environment background complexity with
And when illumination condition is changeable, it also can be realized the accuracy of tracking and accomplish the effect of real-time tracking.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;
And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and
Range.
Claims (9)
1. detection and the tracking of a kind of real time multi-human face, which is characterized in that the described method includes:
The image of each video frame is obtained from the video flowing of input;
The detection of face location coordinate is carried out to the video frame of acquisition by Face datection model, and face location coordinate is stored
To face location coordinate container;
Face tracking initialization operation extracts the position coordinates of target face up to taking from face location coordinate container, mentions
The characteristic point of human face target is taken to store to characteristic point container to update for the tracking of succeeding target face;
Image pyramid model is established, according to the position of the model prediction current video frame human face target;
Statistical trace frame number just re-starts a Face datection, when or not the tracking frame number threshold value of tracking frame number satisfaction setting
When meeting, then calculates the face location coordinate frame central point detected and face tracking updates the face location coordinate frame predicted
The distance between central point does not need then to carry out face tracking initialization, works as meter when calculating distance threshold of the distance less than setting
The distance threshold that distance is calculated greater than setting then needs to carry out face tracking initialization, and final result is carried out display output.
2. detection and the tracking of real time multi-human face according to claim 1, which is characterized in that the position of target face
Setting coordinate extraction includes:
According to formula:Each pixel in the given tracking target area A of calculating input image I
Spatial gradient the matrix G, A of point PxGradient for target area A in x-axis direction, AyGradient for target area A in y-axis direction,
Calculate the minimal eigenvalue λ of each GmAnd storage λmGreater than given eigenvalue threshold λthCorresponding pixel P, then judge
Whether pixel P is greater than other pixels in 3 × 3 neighborhood of surrounding, if it is greater, then retaining pixel P and from all storages
Minimal eigenvalue λmMiddle maximizing λmax, if it is less, no longer retaining, execute operations described below;
Calculate the distance between the pixel that remains distance and with distance threshold distancethCompare, retains
Distance is greater than distance threshold distancethPixel, these pixels of reservation are the characteristic point extracted, after being used for
Continuous face tracking and update.
3. detection and the tracking of real time multi-human face according to claim 2, which is characterized in that according to image pyramid
Model specifically includes face tracking:
Pyramid is established, I is defined0It is the pyramid bottom, that is, the 0th layer of image, resolution ratio highest, L expression pyramid
The number of plies, L take the natural number greater than 1, ILIndicate L tomographic image;
By the optical flow computation result feedback of pyramid top layer to secondary top layer, gLAs light stream value estimation of top layer when initial, it is set as 0,
The light stream value of secondary top layer is estimated as gL-1, pyramid top layer namely L-1 layers of light stream value dL-1,gL-2=2 (gL-1+dL-1)=2
(0+dL-1)=2dL-1, continue on pyramid and feed back downwards, iteration obtains final until reaching the pyramid bottom
Original image light stream value d are as follows: d=g0+d0, final light flow valuve is exactly the superposition of all layers of segmentation light stream value d, it may be assumed that
The target feature point position A (x, y) that target area A is extracted is given by previous frame image and calculates present frame target area B
Characteristic point position B (x+vx,y+vy), vx,vyIt is light stream value d in x-axis, the displacement component of y-axis;
The position of tracking target face is shown in current frame image.
4. detection and the tracking of real time multi-human face according to claim 3, which is characterized in that drawn in current frame image
Behind the position for tracking target face out, judge whether to take out all characteristic points from face location coordinate container, if container
It is not sky, then continues to take out, if it is empty, then execute operations described below:
If the tracking frame number of statistics is equal to setting tracking frame number threshold value, face inspection is carried out to the first frame image got
It surveys, if conditions are not met, then carrying out operations described below:
The face location central point f arrived according to tracking predictiontrack(xt_center,yt_center) and the obtained face position of Face datection
Set central point fdetection(xd_center,yd_center), the distance l of two o'clock is calculated,
Set distance threshold value lth=15, when distance l is greater than distance threshold lthWhen, show the people of the face location detected and tracking
Face position difference is larger, face tracking initialization operation should be re-started according to the face location detected at this time, when distance l is small
In equal to distance threshold lthWhen, show that the face location of the face location detected and tracking is not much different, without re-starting
Face tracking initialization, and execute operations described below;
Carry out Face datection and plurality of human faces tracking display;
Judge whether video flowing terminates, is exited if terminating.
5. detection and the tracking of real time multi-human face according to claim 1, which is characterized in that the Face datection mould
Type includes first network module and the second network module, and the first network module is by 2 convolutional layers, and 2 active coatings, 2 are returned
One changes layer, 2 pond layer compositions, and second network module uses three Inception structure compositions.
6. detection and the tracking of real time multi-human face according to claim 5, which is characterized in that examined using the face
Also the face detection model is trained and is tested before surveying model.
7. detection and the tracking of real time multi-human face according to claim 6, which is characterized in that the Face datection mould
The training of type includes:
The face samples pictures under a large amount of natural scenes are obtained, face location mark are carried out to obtained picture, and generate xml lattice
The mark document of formula;
The human face data completed to mark is cleaned, face resolution ratio directly removing less than 20 × 20;
The data that cleaning is completed directly generate lmdb formatted file, for carrying out reading data in deep learning frame caffe;
The network model for completing lightweight is built;
Starting model training, face predicts that loss function uses softmax loss function,Wherein, yiIndicate i-th group of data and corresponding mark classification, if
Actually this group of data are faces, then y=1, if actually this group of data are not face, y=0, f (xi, θ) and it indicates to be predicted as
The probability value of face, xiIndicate the input of i-th group of data, θ indicate can learning parameter, m indicates sample number;
Backpropagation, using stochastic gradient descent algorithm, continuous iteration, the value for enabling loss function obtain is as close possible to 0;
If reaching the number of iterations of setting, terminate, such as not up to continues to train.
8. detection and the tracking of real time multi-human face according to claim 7, which is characterized in that complete to model training
And then carry out the test of Face datection, comprising:
Input video stream;
Obtain video frame;
It is converted into the picture format that model can identify;
It enters data into network model;
Export result.
9. detection and the tracking of real time multi-human face according to claim 7, which is characterized in that obtain face sample graph
Piece includes: web crawlers mode, third party client provides or disclosed set of data samples.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811365995.6A CN109558815A (en) | 2018-11-16 | 2018-11-16 | A kind of detection of real time multi-human face and tracking |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811365995.6A CN109558815A (en) | 2018-11-16 | 2018-11-16 | A kind of detection of real time multi-human face and tracking |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109558815A true CN109558815A (en) | 2019-04-02 |
Family
ID=65866501
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811365995.6A Pending CN109558815A (en) | 2018-11-16 | 2018-11-16 | A kind of detection of real time multi-human face and tracking |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109558815A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110533000A (en) * | 2019-09-06 | 2019-12-03 | 厦门美图之家科技有限公司 | Facial image detection method, device, computer equipment and readable storage medium storing program for executing |
CN110991287A (en) * | 2019-11-23 | 2020-04-10 | 深圳市恩钛控股有限公司 | Real-time video stream face detection tracking method and detection tracking system |
CN110991250A (en) * | 2019-11-06 | 2020-04-10 | 江苏科技大学 | Face tracking method and system fusing color interference model and shielding model |
CN111046752A (en) * | 2019-11-26 | 2020-04-21 | 上海兴容信息技术有限公司 | Indoor positioning method and device, computer equipment and storage medium |
CN111047626A (en) * | 2019-12-26 | 2020-04-21 | 深圳云天励飞技术有限公司 | Target tracking method and device, electronic equipment and storage medium |
CN111160202A (en) * | 2019-12-20 | 2020-05-15 | 万翼科技有限公司 | AR equipment-based identity verification method, AR equipment-based identity verification device, AR equipment-based identity verification equipment and storage medium |
CN111563490A (en) * | 2020-07-14 | 2020-08-21 | 北京搜狐新媒体信息技术有限公司 | Face key point tracking method and device and electronic equipment |
CN112446922A (en) * | 2020-11-24 | 2021-03-05 | 厦门熵基科技有限公司 | Pedestrian reverse judgment method and device for channel gate |
CN112597901A (en) * | 2020-12-23 | 2021-04-02 | 艾体威尔电子技术(北京)有限公司 | Multi-face scene effective face recognition device and method based on three-dimensional distance measurement |
US20210319234A1 (en) * | 2018-12-29 | 2021-10-14 | Zhejiang Dahua Technology Co., Ltd. | Systems and methods for video surveillance |
CN113723375A (en) * | 2021-11-02 | 2021-11-30 | 杭州魔点科技有限公司 | Double-frame face tracking method and system based on feature extraction |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103093212A (en) * | 2013-01-28 | 2013-05-08 | 北京信息科技大学 | Method and device for clipping facial images based on face detection and face tracking |
US8873798B2 (en) * | 2010-02-05 | 2014-10-28 | Rochester Institue Of Technology | Methods for tracking objects using random projections, distance learning and a hybrid template library and apparatuses thereof |
CN106599836A (en) * | 2016-12-13 | 2017-04-26 | 北京智慧眼科技股份有限公司 | Multi-face tracking method and tracking system |
CN108564029A (en) * | 2018-04-12 | 2018-09-21 | 厦门大学 | Face character recognition methods based on cascade multi-task learning deep neural network |
CN108629299A (en) * | 2018-04-24 | 2018-10-09 | 武汉幻视智能科技有限公司 | A kind of matched long-time multi-object tracking method of combination face and system |
-
2018
- 2018-11-16 CN CN201811365995.6A patent/CN109558815A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8873798B2 (en) * | 2010-02-05 | 2014-10-28 | Rochester Institue Of Technology | Methods for tracking objects using random projections, distance learning and a hybrid template library and apparatuses thereof |
CN103093212A (en) * | 2013-01-28 | 2013-05-08 | 北京信息科技大学 | Method and device for clipping facial images based on face detection and face tracking |
CN106599836A (en) * | 2016-12-13 | 2017-04-26 | 北京智慧眼科技股份有限公司 | Multi-face tracking method and tracking system |
CN108564029A (en) * | 2018-04-12 | 2018-09-21 | 厦门大学 | Face character recognition methods based on cascade multi-task learning deep neural network |
CN108629299A (en) * | 2018-04-24 | 2018-10-09 | 武汉幻视智能科技有限公司 | A kind of matched long-time multi-object tracking method of combination face and system |
Non-Patent Citations (2)
Title |
---|
王文静: "人脸表情识别的算法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
聂小燕: "基于分层光流场的运动车辆检测与跟踪", 《实验技术与管理》 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210319234A1 (en) * | 2018-12-29 | 2021-10-14 | Zhejiang Dahua Technology Co., Ltd. | Systems and methods for video surveillance |
CN110533000A (en) * | 2019-09-06 | 2019-12-03 | 厦门美图之家科技有限公司 | Facial image detection method, device, computer equipment and readable storage medium storing program for executing |
CN110991250A (en) * | 2019-11-06 | 2020-04-10 | 江苏科技大学 | Face tracking method and system fusing color interference model and shielding model |
CN110991250B (en) * | 2019-11-06 | 2023-04-25 | 江苏科技大学 | Face tracking method and system integrating color interference model and shielding model |
CN110991287A (en) * | 2019-11-23 | 2020-04-10 | 深圳市恩钛控股有限公司 | Real-time video stream face detection tracking method and detection tracking system |
CN111046752A (en) * | 2019-11-26 | 2020-04-21 | 上海兴容信息技术有限公司 | Indoor positioning method and device, computer equipment and storage medium |
CN111160202A (en) * | 2019-12-20 | 2020-05-15 | 万翼科技有限公司 | AR equipment-based identity verification method, AR equipment-based identity verification device, AR equipment-based identity verification equipment and storage medium |
CN111160202B (en) * | 2019-12-20 | 2023-09-05 | 万翼科技有限公司 | Identity verification method, device, equipment and storage medium based on AR equipment |
CN111047626A (en) * | 2019-12-26 | 2020-04-21 | 深圳云天励飞技术有限公司 | Target tracking method and device, electronic equipment and storage medium |
CN111047626B (en) * | 2019-12-26 | 2024-03-22 | 深圳云天励飞技术有限公司 | Target tracking method, device, electronic equipment and storage medium |
CN111563490B (en) * | 2020-07-14 | 2020-11-03 | 北京搜狐新媒体信息技术有限公司 | Face key point tracking method and device and electronic equipment |
CN111563490A (en) * | 2020-07-14 | 2020-08-21 | 北京搜狐新媒体信息技术有限公司 | Face key point tracking method and device and electronic equipment |
CN112446922A (en) * | 2020-11-24 | 2021-03-05 | 厦门熵基科技有限公司 | Pedestrian reverse judgment method and device for channel gate |
CN112597901A (en) * | 2020-12-23 | 2021-04-02 | 艾体威尔电子技术(北京)有限公司 | Multi-face scene effective face recognition device and method based on three-dimensional distance measurement |
CN112597901B (en) * | 2020-12-23 | 2023-12-29 | 艾体威尔电子技术(北京)有限公司 | Device and method for effectively recognizing human face in multiple human face scenes based on three-dimensional ranging |
CN113723375A (en) * | 2021-11-02 | 2021-11-30 | 杭州魔点科技有限公司 | Double-frame face tracking method and system based on feature extraction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109558815A (en) | A kind of detection of real time multi-human face and tracking | |
CN111259850B (en) | Pedestrian re-identification method integrating random batch mask and multi-scale representation learning | |
CN108596101B (en) | Remote sensing image multi-target detection method based on convolutional neural network | |
CN106127204B (en) | A kind of multi-direction meter reading Region detection algorithms of full convolutional neural networks | |
CN108492319B (en) | Moving target detection method based on deep full convolution neural network | |
CN105069472B (en) | A kind of vehicle checking method adaptive based on convolutional neural networks | |
CN103164706B (en) | Object counting method and device based on video signal analysis | |
CN109559302A (en) | Pipe video defect inspection method based on convolutional neural networks | |
CN107742099A (en) | A kind of crowd density estimation based on full convolutional network, the method for demographics | |
CN109993095B (en) | Frame level feature aggregation method for video target detection | |
CN108647591A (en) | Activity recognition method and system in a kind of video of view-based access control model-semantic feature | |
CN110287960A (en) | The detection recognition method of curve text in natural scene image | |
CN109584248A (en) | Infrared surface object instance dividing method based on Fusion Features and dense connection network | |
CN109670452A (en) | Method for detecting human face, device, electronic equipment and Face datection model | |
CN108334847A (en) | A kind of face identification method based on deep learning under real scene | |
CN109784386A (en) | A method of it is detected with semantic segmentation helpers | |
CN107016357A (en) | A kind of video pedestrian detection method based on time-domain convolutional neural networks | |
CN109712127B (en) | Power transmission line fault detection method for machine inspection video stream | |
CN108197604A (en) | Fast face positioning and tracing method based on embedded device | |
CN109389599A (en) | A kind of defect inspection method and device based on deep learning | |
CN110163041A (en) | Video pedestrian recognition methods, device and storage medium again | |
CN110472542A (en) | A kind of infrared image pedestrian detection method and detection system based on deep learning | |
CN109145836A (en) | Ship target video detection method based on deep learning network and Kalman filtering | |
CN111507248A (en) | Face forehead area detection and positioning method and system of low-resolution thermodynamic diagram | |
CN112712516B (en) | High-speed rail bottom rubber strip fault detection method and system based on YOLOv5 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190402 |
|
RJ01 | Rejection of invention patent application after publication |