CN109558815A - A kind of detection of real time multi-human face and tracking - Google Patents

A kind of detection of real time multi-human face and tracking Download PDF

Info

Publication number
CN109558815A
CN109558815A CN201811365995.6A CN201811365995A CN109558815A CN 109558815 A CN109558815 A CN 109558815A CN 201811365995 A CN201811365995 A CN 201811365995A CN 109558815 A CN109558815 A CN 109558815A
Authority
CN
China
Prior art keywords
face
tracking
detection
distance
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811365995.6A
Other languages
Chinese (zh)
Inventor
张宁
李玉惠
金红
杨满智
刘长永
陈晓光
蔡琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NATIONAL COMPUTER VIRUS EMERGENCY RESPONSE CENTER
Heng Jia Jia (beijing) Technology Co Ltd
Original Assignee
NATIONAL COMPUTER VIRUS EMERGENCY RESPONSE CENTER
Heng Jia Jia (beijing) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NATIONAL COMPUTER VIRUS EMERGENCY RESPONSE CENTER, Heng Jia Jia (beijing) Technology Co Ltd filed Critical NATIONAL COMPUTER VIRUS EMERGENCY RESPONSE CENTER
Priority to CN201811365995.6A priority Critical patent/CN109558815A/en
Publication of CN109558815A publication Critical patent/CN109558815A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of detection of real time multi-human face and trackings, which comprises the image of each video frame is obtained from the video flowing of input;The detection of face location coordinate is carried out to the video frame of acquisition by Face datection model, and face location coordinate is stored to face location coordinate container;Face tracking initialization operation extracts the position coordinates of target face up to taking from face location coordinate container, and the characteristic point of extraction target face carries out the update of subsequent face tracking from characteristic point container;Image pyramid model is established, according to the position of the model prediction current video frame human face target;Track human faces are simultaneously shown.The present invention solves the problems, such as that the accuracy of existing recognition of face and tracking is not high and cannot reach real-time tracking.

Description

A kind of detection of real time multi-human face and tracking
Technical field
The invention belongs to Face datection and tracking fields, and in particular to a kind of detection of real time multi-human face and tracking.
Background technique
With the fast development of science and technology, the relevant technologies based on computer vision are widely applied, wherein face Tracking technique is widely applied under the scenes such as video security protection, automatic gate inhibition, market shopping.
Face tracking technology mainly includes Face datection and face tracking technology.Human face detection tech refers to be looked in picture To face location.Face tracking technology refers to given Initial Face position, and it is pre- that lasting face location is carried out in successive video frames It surveys.The face tracking technology of current main-stream is roughly divided into three kinds by principle, first is that correlation filtering tracking is based on, second is that being based on Deep learning tracking, third is that being based on optical flow tracking method.
Based on correlation filtering tracking it is representative be KCF (Kernelized Correlation Filter) With SRDCF (Spatially Regularized Discriminative Correlation Filter).KCF method uses Circular matrix obtains positive negative sample, and one object detector of training during tracking, is gone under detection using object detector Whether the tracking target of one frame is real goal, then reuses new testing result and goes to update training set and then update target detection Device.The defect of this method is, when object movement speed quickly, when there is boundary effect or dynamic fuzzy phenomenon, tracking can be lost Target.SRDCF method proposes multiple dimensioned, the bigger detection zone solution boundary effect of use, but the method speed of service is very Slowly, it is unable to satisfy requirement of real-time.
Tracking based on deep learning it is representative be MDNet (Multi-Domain Convolutional Neural Network), network is made of the layer multiple-limb of inclusion layer and special domain, and wherein domain corresponds to independent training etc. Grade, and each branch is responsible for one two classification and goes to identify the target in each domain, goes to instruct using iterative manner for each domain Practice network, obtains the general target feature extraction in inclusion layer.When tracking the target in video sequence, in conjunction with pre-training CNN In (Convolution Neural Network) inclusion layer and two classification layers constitute a new networks, while propose online with Track (online tracking), online tracking are real by candidate window obtained through stochastical sampling around assessment former frame target Existing, tracking target's feature-extraction accuracy is higher, but because network parameter is more, using CPU be extremely difficult to real-time target with Track.
Method based on optical flow tracking it is representative be LK (Lucas-Kanade) light stream estimation difference method, the party Method is a kind of infinitesimal optical flow computation method based on gradient, and this optical flow method first has to meet three hypothesis, it is assumed that 1, it is bright It spends constant, is exactly variation of the same point with the time, brightness will not change.Assuming that 2, small movement is exactly the change of time Change the acute variation that will not cause position.Assuming that 3, space is consistent, and it is also neighbour on image that neighbouring point, which projects to, in a scene Near point, and neighbouring spot speed is consistent.Optical flow method can carry out target following under the scene of any complexity.This method can be quasi- Really, target following is rapidly completed, compares and is suitably applied in the smaller terminal of calculating power.
It to be realized on calculating the smaller CPU (such as i5-6200) of power, it requires that the calculation amount of implementation method is wanted Small, algorithm design cannot be too complicated.Compared to correlation filtering tracking and deep learning tracking, it is based on optical flow tracking side The advantage of method is can more rapidly to realize target following, and face is blocked, human face posture expression complexity, the mobile speed of face Degree is fast, tracking environmental background complexity has better robustness, is suitble to realize on calculating the smaller CPU processor of power.
But it is current based on the method for optical flow tracking during handling Face datection and tracking, occur asking as follows Topic:
1. occlusion issue is blocked including person to person, blocking between people and object will cause the missing of face information, this will It directly results in tracking target to lose, tracking accuracy rate decline.
2. dynamic fuzzy and boundary effect, this will cause face information to obscure, and feature extraction inaccuracy directly results in tracking The loss of target.
3. background environment is complicated, background environment includes that variation, color and the object of illumination condition are varied, is tracked sometimes Color of object can be with background environment solid colour, these will all bring huge challenge to face tracking task.
4. efficiency, existing plurality of human faces tracking technique is relatively difficult to meet requirement of real-time, especially compares in calculating power On low CPU device.
There are the feelings such as block, external environment background complexity and illumination condition are changeable in human face posture expression complexity, face Under condition, it is easy to cause the accuracy of tracking to reduce and cannot accomplishes the effect of real-time tracking.
Summary of the invention
Drawbacks described above based on the prior art, the object of the present invention is to provide a kind of detection of real time multi-human face and track sides Method, the accuracy to solve the problems, such as existing recognition of face and tracking is not high and cannot reach real-time tracking.
The technical solution adopted by the invention is as follows:
A kind of detection of real time multi-human face and tracking, which comprises
The image of each video frame is obtained from the video flowing of input;
Carry out the detection of face location coordinate to the video frame of acquisition by Face datection model, and by face location coordinate Store face location coordinate container;
Face tracking initialization operation, the position coordinates that the target face of tracking is extracted from face location coordinate container are straight To taking, use spatial gradient matrix extract target feature point and store to characteristic point container with for subsequent face tracking more Newly;
Image pyramid model is established, according to the position of the model prediction current video frame human face target;
Statistical trace frame number just re-starts a Face datection when the tracking frame number threshold value of tracking frame number satisfaction setting, When being unsatisfactory for, then calculates the face location coordinate frame central point detected and face tracking updates the face location predicted and sits The distance between frame central point is marked, does not need then to carry out face tracking initialization when calculating distance threshold of the distance less than setting, It then needs to carry out face tracking initialization when calculating distance threshold of the distance greater than setting, final result is subjected to display output.
Further, include: to the position coordinates extraction of target face
According to formula:It is each in the given tracking target area A of calculating input image I Spatial gradient the matrix G, A of pixel PxGradient for target area A in x-axis direction, AyLadder for target area A in y-axis direction Degree;
Calculate the minimal eigenvalue λ of each GmAnd storage λmGreater than given eigenvalue threshold λthCorresponding pixel P, then Judge whether pixel P is greater than other pixels in 3 × 3 neighborhood of surrounding, if it is greater, then retaining pixel P and from all The minimal eigenvalue λ of storagemMiddle maximizing λmax, if it is less, no longer retaining, execute operations described below;
Calculate the distance between the pixel that remains distance and with distance threshold distancethCompare, retains Distance is greater than distance threshold distancethPixel, these pixels of reservation are the characteristic point extracted, after being used for Continuous face tracking and update.
Further, face tracking is specifically included according to image pyramid model:
Pyramid is established, I is defined0It is the pyramid bottom, that is, the 0th layer of image, resolution ratio highest, L expression The pyramid number of plies, L take the natural number greater than 1, ILIndicate L tomographic image;
By the optical flow computation result feedback of pyramid top layer to secondary top layer, gLEstimate as light stream value of top layer when initial, It is set as 0, the light stream value of secondary top layer is estimated as gL-1, pyramid top layer namely L-1 layers of light stream value dL-1,gL-2=2 (gL-1+dL -1(the 0+d of)=2L-1)=2dL-1, continue on pyramid and feed back downwards, iteration obtains until reaching the pyramid bottom Final original image light stream value d are as follows: d=g0+d0, final light flow valuve is exactly the superposition of all layers of segmentation light stream value d, it may be assumed that
The target feature point position A (x, y) that target area A is extracted is given by previous frame image and calculates present frame target area Domain B characteristic point position B (x+vx,y+vy), vx,vyIt is light stream value d in x-axis, the displacement component of y-axis;
The position of tracking target face is shown in current frame image.
Further, behind the position that current frame image draws tracking target face, judge whether all characteristic points It is taken out from face location coordinate container, if container is not sky, continues to take out, if it is empty, then execute operations described below:
If the tracking frame number of statistics is equal to setting tracking frame number threshold value, face is carried out to the first frame image got Detection, if conditions are not met, then carrying out operations described below:
The face location central point f arrived according to tracking predictiontrack(xt_center,yt_center) and the obtained people of Face datection Face place-centric point fdetection(xd_center,yd_center), the distance l of two o'clock is calculated,
Set distance threshold value lth=15, when distance l is greater than distance threshold lthWhen, show the face location detected and with The face location difference of track is larger, should re-start face tracking initialization operation according to the face location detected at this time, when Distance l is less than or equal to distance threshold lthWhen, show that the face location of the face location detected and tracking is not much different, does not have to weight It is new to carry out face tracking initialization, and execute operations described below;
Carry out Face datection and plurality of human faces tracking display;
Judge whether video flowing terminates, is exited if terminating.
Further, the Face datection model includes first network module and the second network module, the first network Module uses three by 2 convolutional layers, 2 active coatings, 2 normalization layers, 2 pond layer compositions, second network module Inception structure composition.
Further, also the face detection model is trained and is tested before using the face detection model.
Further, include: to the training of the face detection model
The face samples pictures under a large amount of natural scenes are obtained, face location mark are carried out to obtained picture, and generate The mark document of xml format;
The human face data completed to mark is cleaned, face resolution ratio directly removing less than 20 × 20;
The data that cleaning is completed directly generate lmdb formatted file, for carrying out data in deep learning frame caffe It reads;
The network model for completing lightweight is built;
Starting model training, face predicts that loss function uses softmax loss function,Wherein, yiIndicate i-th group of data and corresponding mark classification, if Actually this group of data are faces, then y=1, if actually this group of data are not face, y=0, f (xi, θ) and it indicates to be predicted as The probability value of face, xiIndicate the input of i-th group of data, θ indicate can learning parameter, m indicates sample number;
Backpropagation, using stochastic gradient descent algorithm, continuous iteration, the value for enabling loss function obtain is as close possible to 0;
If reaching the number of iterations of setting, terminate, such as not up to continues to train.
Compared with prior art, a kind of detection of real time multi-human face disclosed in this invention and tracking, reached as Lower technical effect:
1, the characteristic point that the present invention extracts can represent the main feature of target to be tracked, even if in extraneous illumination condition It is complicated, background is complicated, there are dynamic fuzzy and boundary effect, exist block etc. under the conditions of also there is good generalization ability.? That is the characteristic point extracted still can characterize target signature even if environment is more complicated, there are small areas to block for target , complete tracking.
2, the present invention sets correction condition to the tracking of target, when encountering extreme condition, such as the very strong video of illumination It is a piece of black that video presentation under a piece of white or no light condition is presented, or the characteristic point extracted has been blocked (note completely just Meaning characteristic point is dispersed in target area, even if fraction characteristic point is blocked or lacks or can continue tracking) this The case where Shi Huiyou BREAK TRACK, it is necessary to re-start Face datection, face tracking initialization, face tracking update.
3, by the accurate detection of Face datection model realization face, the real-time of dynamic human face is realized by tracking Tracking.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the model schematic of the Inception structure in Face datection model described in the embodiment of the present invention.
Fig. 2 is the network structure of Face datection model described in the embodiment of the present invention.
Fig. 3 is the schematic diagram of Face datection model described in the embodiment of the present invention being trained.
Fig. 4 is the schematic diagram that Face datection model described in the embodiment of the present invention is tested.
Fig. 5 is the detection of real time multi-human face and the complete flow diagram of tracking described in the embodiment of the present invention.
Specific embodiment
Technical solution in order to enable those skilled in the art to better understand the present invention, with reference to the accompanying drawing and specific embodiment party Present invention is further described in detail for formula.
The present invention will handle all faces detected in input video frame, and track to it, both include to individual human face Tracking, also include the tracking to multiple faces.The present invention is based on optical flow tracking methods, can faster realize target following, Face is blocked, human face posture expression is complicated, face movement speed is fast, tracking environmental background complexity has better robustness, It is suitble to realize on calculating the smaller CPU processor of power.
A kind of detection of plurality of human faces disclosed in the embodiment of the present invention and tracking are based on Face datection model and face Trace model realizes that Face datection is realized using depth learning technology, and face tracking is real using the tracking based on light stream It is existing.
Shown in reference picture 1, Fig. 2, in the present embodiment, Face datection model is to be divided into two sons using structure end to end Module, first network module and the second network module.First network module is by 2 convolutional layers, 2 active coatings, 2 normalization Layer, 2 pond layer compositions, main function is the characteristic information of rapidly extracting input picture, by first time convolution algorithm, activation Processing, normalized and Chi Huahou, using being exported behind second of convolution algorithm, activation processing, normalized and pond To the second network module, since first network module network model is simple, model parameter is less, and calculation amount is also small, can compare Quickly extract face characteristic information.For second network module by using 3 Inception structures, Fig. 1 is Inception knot The network structure of structure, Inception structure include a variety of different convolution branches, can get a variety of receptive fields, can be to each Kind scale in other words preferably detects multiple dimensioned face.Picture after three Inception pattern handlings, can be more Add the face of accurate detection various scales into input picture.
By Face datection model, the position coordinates and picture of face frame are exported, and face tracking module is then according to coordinate The region for the human face target to be tracked is found from input picture.
In order to realize the accurate detection of Face datection model, before testing, to model training and test.
Training process to Face datection model is as shown in figure 3, specifically comprise the following steps:
1) face picture under a large amount of natural scenes is obtained, from web crawlers, client provides or public data collection, Face location mark is carried out to obtained picture, and generates xml format mark document, is executed 2).
2) human face data completed to mark is cleaned, and face resolution ratio directly removing less than 20 × 20 is not used in Training, prevents network model from not restraining, and executes 3).
3) data that cleaning is completed directly generate lmdb formatted file, for being counted in deep learning frame caffe According to reading, execute 4).
4) network model for completing lightweight is built, and is executed 5).
5) start model training, face predicts that loss function uses softmax loss, yiIndicate i-th group of data and correspondence Mark classification, if actually this group of data are face, y=1, if actually this group of data are not face, y=0.f(xi,θ) Indicate the probability value for being predicted as face, xiIndicate the input of i-th group of data, θ indicate can learning parameter, m indicates sample number.Formula is such as Under:
Backpropagation, using stochastic gradient descent algorithm, continuous iteration, the value for enabling loss function obtain as close possible to 0, It executes 6).
If 6) reach the number of iterations of setting, terminate, if 5) the number of iterations of not up to setting, executes.Instruction Data format after the completion of white silk is caffemodel, is to need designated model store path when calling.
By being trained to model, can it is more accurate it is quick detection recognize face.
After being finished to Face datection model training, model is tested.Test process is as shown in Figure 4.
1) 2) input video stream executes;
2) video frame is obtained, is executed 3);
3) picture format is converted, is executed 4), the channel hwc of picture is converted into the channel cwh;
4) it enters data into and gives Face datection model, execute 5)
5) result is exported.Export face location coordinate and face probability value in current video frame.
After rule are closed in the test of Face datection model, face tracking process is entered, the input of face tracking port is people The face frame coordinate (upper left corner starting point (x, y) width (width) is high (height)) and picture of face detection model output, face tracking Mesh target area to be tracked can be found from input picture according to coordinate.Picture is obtained from video flowing, is first carried out after getting Then Face datection judges whether to face tracking initialization, then judge whether to face tracking update, under finally obtaining One or end.
Before formally using model, trained model is tested, to improve the accuracy for using model.
Referring to Figure 5, the process of a complete face tracking is as follows:
1) detection model initializes, and executes 2).
2) 3) input video stream executes.
3) video frame images I is obtained from input video stream, is executed 4).
4) Face datection is carried out to the first frame image got, executed 5).
If 5) detect face, execute 6), if not detecting face, executes 22).
6) face location coordinate is stored in face location coordinate container, is executed 7).
If 7) carry out face tracking initialization, execute 8), if initialized without face tracking, executes 12).
8) 1 face position coordinates is obtained from face location coordinate container, then is executed 9).
9) target human face characteristic point, given tracking target area A (being exactly face location coordinate) of calculating input image I are extracted In each pixel P spatial gradient matrix G, AxGradient for target area A in x-axis direction, AyIt is target area A in y-axis side To gradient,
Calculate the minimal eigenvalue λ of each GmAnd storage λmGreater than given eigenvalue threshold λthCorresponding pixel P, then Judge whether pixel P is greater than other pixels in 3 × 3 neighborhood of surrounding, if it is greater, then retaining pixel P and from all The minimal eigenvalue λ of storagemMiddle maximizing λmax, if it is less, no longer retaining, execute 10).The feature of target face The extraction of point is the method using spatial gradient matrix, and characteristic point container is arrived in storage after extraction, for the updated of face tracking Journey.
10) finally calculate the distance between pixel remained dis tan ce and with distance threshold dis tan cethCompare, retains dis tan ce and be greater than distance threshold dis tan cethPixel, these pixels of reservation are to mention 11) characteristic point taken is executed for tracking.By step 9), 10), realize the extraction of characteristic point, the characteristic point of extraction can Characterize target face characteristic, though extraneous illumination condition is complicated, background is complicated, there are dynamic fuzzy and boundary effect, exist Small area also has good generalization ability under the conditions of blocking etc., can complete subsequent tracking.
11) if face location coordinate container is sky, execute 12), if face location coordinate container is not sky, holds Row 8).
If 12) carry out face tracking update, execute 13), if updated without face tracking, executes 22) straight Row video is tapped into show.
13) characteristic point of the target face of initialization is taken out from characteristic point container, is executed 14).
14) 15) tracking frame counting number is executed for statistical trace how many frame.
15) image pyramid processing establishes pyramid, defines I0It is the pyramid bottom, that is, the 0th layer of image, Its resolution ratio highest.L indicates the pyramid number of plies, and L usually takes 2,3,4.ILIndicate L tomographic image.
Optical flow computation result (misalignment) feedback of pyramid top layer (L-1 layers) arrives time top layer (L-2 layers), gLMake The light stream value for being top layer when initial estimation, is set as 0, the light stream value of secondary top layer is estimated as gL-1, pyramid top layer (L-1 layers) Light stream value dL-1,
gL-2=2 (gL-1+dL-1(the 0+d of)=2L-1)=2dL-1
It continues on pyramid to feed back downwards, iteration, until reaching the pyramid bottom.
Final original image light stream value d are as follows:
D=g0+d0
Final light flow valuve is exactly the superposition of all layers of segmentation light stream value d,
The target feature point position A (x, y) that target area A is extracted is given by previous frame image and calculates present frame target area Domain B characteristic point position B (x+vx,y+vy), vx,vy16) as light stream value d is executed in x-axis, y-axis displacement component.
16) tracking position of object is drawn in current frame image, executed 17).
17) judge whether from container to take out all characteristic points, if feature container is sky, execute 18), if not 13) sky then executes.
18) if the tracking frame number counted is equal to tracking frame number threshold value Δframe=10, then execute 4), 19), if discontented It is sufficient then execute 22).
19) 20) position coordinates for obtaining Face datection, execute.
20) the face location central point f arrived according to tracking predictiontrack(xt_center,yt_center) and Face datection obtain Face location central point fdetection(xd_center,yd_center), the distance l of two o'clock is calculated,
21) set distance threshold value lth=15, when distance l is greater than distance threshold lthWhen, show the face location detected and The face location difference of tracking is larger, should re-start face tracking initialization according to the face location detected at this time, execute 7).When distance l is less than or equal to distance threshold lthWhen, show that the face location of the face location detected and tracking is not much different, Without re-starting face tracking initialization, execute 22).Step 20) 21) sets correction condition, when encountering extreme condition, Characteristic point that is a piece of black, or extracting is presented as video under a piece of white or no light condition is presented in the very strong video of illumination Just blocked completely (attention characteristics point is dispersed in target area, though fraction characteristic point be blocked or lack also It is that can continue tracking), it the case where at this moment having BREAK TRACK, just needs to re-start Face datection, face at this time Tracking initialization, face tracking update.
22) Face datection and plurality of human faces tracking display are carried out, the position of face frame is shown in video frame, no matter face is examined Survey or the result of plurality of human faces trace back be all face frame position coordinates, finally will in video frame by face location with square The form of shape is drawn, and is executed 23).
23) judge whether video flowing terminates, execute 24) if terminating, executed 3) if being not finished
24) entire program is exited.
The present invention through the above steps, human face posture expression is complicated, face exist block, external environment background complexity with And when illumination condition is changeable, it also can be realized the accuracy of tracking and accomplish the effect of real-time tracking.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (9)

1. detection and the tracking of a kind of real time multi-human face, which is characterized in that the described method includes:
The image of each video frame is obtained from the video flowing of input;
The detection of face location coordinate is carried out to the video frame of acquisition by Face datection model, and face location coordinate is stored To face location coordinate container;
Face tracking initialization operation extracts the position coordinates of target face up to taking from face location coordinate container, mentions The characteristic point of human face target is taken to store to characteristic point container to update for the tracking of succeeding target face;
Image pyramid model is established, according to the position of the model prediction current video frame human face target;
Statistical trace frame number just re-starts a Face datection, when or not the tracking frame number threshold value of tracking frame number satisfaction setting When meeting, then calculates the face location coordinate frame central point detected and face tracking updates the face location coordinate frame predicted The distance between central point does not need then to carry out face tracking initialization, works as meter when calculating distance threshold of the distance less than setting The distance threshold that distance is calculated greater than setting then needs to carry out face tracking initialization, and final result is carried out display output.
2. detection and the tracking of real time multi-human face according to claim 1, which is characterized in that the position of target face Setting coordinate extraction includes:
According to formula:Each pixel in the given tracking target area A of calculating input image I Spatial gradient the matrix G, A of point PxGradient for target area A in x-axis direction, AyGradient for target area A in y-axis direction,
Calculate the minimal eigenvalue λ of each GmAnd storage λmGreater than given eigenvalue threshold λthCorresponding pixel P, then judge Whether pixel P is greater than other pixels in 3 × 3 neighborhood of surrounding, if it is greater, then retaining pixel P and from all storages Minimal eigenvalue λmMiddle maximizing λmax, if it is less, no longer retaining, execute operations described below;
Calculate the distance between the pixel that remains distance and with distance threshold distancethCompare, retains Distance is greater than distance threshold distancethPixel, these pixels of reservation are the characteristic point extracted, after being used for Continuous face tracking and update.
3. detection and the tracking of real time multi-human face according to claim 2, which is characterized in that according to image pyramid Model specifically includes face tracking:
Pyramid is established, I is defined0It is the pyramid bottom, that is, the 0th layer of image, resolution ratio highest, L expression pyramid The number of plies, L take the natural number greater than 1, ILIndicate L tomographic image;
By the optical flow computation result feedback of pyramid top layer to secondary top layer, gLAs light stream value estimation of top layer when initial, it is set as 0, The light stream value of secondary top layer is estimated as gL-1, pyramid top layer namely L-1 layers of light stream value dL-1,gL-2=2 (gL-1+dL-1)=2 (0+dL-1)=2dL-1, continue on pyramid and feed back downwards, iteration obtains final until reaching the pyramid bottom Original image light stream value d are as follows: d=g0+d0, final light flow valuve is exactly the superposition of all layers of segmentation light stream value d, it may be assumed that
The target feature point position A (x, y) that target area A is extracted is given by previous frame image and calculates present frame target area B Characteristic point position B (x+vx,y+vy), vx,vyIt is light stream value d in x-axis, the displacement component of y-axis;
The position of tracking target face is shown in current frame image.
4. detection and the tracking of real time multi-human face according to claim 3, which is characterized in that drawn in current frame image Behind the position for tracking target face out, judge whether to take out all characteristic points from face location coordinate container, if container It is not sky, then continues to take out, if it is empty, then execute operations described below:
If the tracking frame number of statistics is equal to setting tracking frame number threshold value, face inspection is carried out to the first frame image got It surveys, if conditions are not met, then carrying out operations described below:
The face location central point f arrived according to tracking predictiontrack(xt_center,yt_center) and the obtained face position of Face datection Set central point fdetection(xd_center,yd_center), the distance l of two o'clock is calculated,
Set distance threshold value lth=15, when distance l is greater than distance threshold lthWhen, show the people of the face location detected and tracking Face position difference is larger, face tracking initialization operation should be re-started according to the face location detected at this time, when distance l is small In equal to distance threshold lthWhen, show that the face location of the face location detected and tracking is not much different, without re-starting Face tracking initialization, and execute operations described below;
Carry out Face datection and plurality of human faces tracking display;
Judge whether video flowing terminates, is exited if terminating.
5. detection and the tracking of real time multi-human face according to claim 1, which is characterized in that the Face datection mould Type includes first network module and the second network module, and the first network module is by 2 convolutional layers, and 2 active coatings, 2 are returned One changes layer, 2 pond layer compositions, and second network module uses three Inception structure compositions.
6. detection and the tracking of real time multi-human face according to claim 5, which is characterized in that examined using the face Also the face detection model is trained and is tested before surveying model.
7. detection and the tracking of real time multi-human face according to claim 6, which is characterized in that the Face datection mould The training of type includes:
The face samples pictures under a large amount of natural scenes are obtained, face location mark are carried out to obtained picture, and generate xml lattice The mark document of formula;
The human face data completed to mark is cleaned, face resolution ratio directly removing less than 20 × 20;
The data that cleaning is completed directly generate lmdb formatted file, for carrying out reading data in deep learning frame caffe;
The network model for completing lightweight is built;
Starting model training, face predicts that loss function uses softmax loss function,Wherein, yiIndicate i-th group of data and corresponding mark classification, if Actually this group of data are faces, then y=1, if actually this group of data are not face, y=0, f (xi, θ) and it indicates to be predicted as The probability value of face, xiIndicate the input of i-th group of data, θ indicate can learning parameter, m indicates sample number;
Backpropagation, using stochastic gradient descent algorithm, continuous iteration, the value for enabling loss function obtain is as close possible to 0;
If reaching the number of iterations of setting, terminate, such as not up to continues to train.
8. detection and the tracking of real time multi-human face according to claim 7, which is characterized in that complete to model training And then carry out the test of Face datection, comprising:
Input video stream;
Obtain video frame;
It is converted into the picture format that model can identify;
It enters data into network model;
Export result.
9. detection and the tracking of real time multi-human face according to claim 7, which is characterized in that obtain face sample graph Piece includes: web crawlers mode, third party client provides or disclosed set of data samples.
CN201811365995.6A 2018-11-16 2018-11-16 A kind of detection of real time multi-human face and tracking Pending CN109558815A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811365995.6A CN109558815A (en) 2018-11-16 2018-11-16 A kind of detection of real time multi-human face and tracking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811365995.6A CN109558815A (en) 2018-11-16 2018-11-16 A kind of detection of real time multi-human face and tracking

Publications (1)

Publication Number Publication Date
CN109558815A true CN109558815A (en) 2019-04-02

Family

ID=65866501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811365995.6A Pending CN109558815A (en) 2018-11-16 2018-11-16 A kind of detection of real time multi-human face and tracking

Country Status (1)

Country Link
CN (1) CN109558815A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110533000A (en) * 2019-09-06 2019-12-03 厦门美图之家科技有限公司 Facial image detection method, device, computer equipment and readable storage medium storing program for executing
CN110991287A (en) * 2019-11-23 2020-04-10 深圳市恩钛控股有限公司 Real-time video stream face detection tracking method and detection tracking system
CN110991250A (en) * 2019-11-06 2020-04-10 江苏科技大学 Face tracking method and system fusing color interference model and shielding model
CN111046752A (en) * 2019-11-26 2020-04-21 上海兴容信息技术有限公司 Indoor positioning method and device, computer equipment and storage medium
CN111047626A (en) * 2019-12-26 2020-04-21 深圳云天励飞技术有限公司 Target tracking method and device, electronic equipment and storage medium
CN111160202A (en) * 2019-12-20 2020-05-15 万翼科技有限公司 AR equipment-based identity verification method, AR equipment-based identity verification device, AR equipment-based identity verification equipment and storage medium
CN111563490A (en) * 2020-07-14 2020-08-21 北京搜狐新媒体信息技术有限公司 Face key point tracking method and device and electronic equipment
CN112446922A (en) * 2020-11-24 2021-03-05 厦门熵基科技有限公司 Pedestrian reverse judgment method and device for channel gate
CN112597901A (en) * 2020-12-23 2021-04-02 艾体威尔电子技术(北京)有限公司 Multi-face scene effective face recognition device and method based on three-dimensional distance measurement
US20210319234A1 (en) * 2018-12-29 2021-10-14 Zhejiang Dahua Technology Co., Ltd. Systems and methods for video surveillance
CN113723375A (en) * 2021-11-02 2021-11-30 杭州魔点科技有限公司 Double-frame face tracking method and system based on feature extraction

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093212A (en) * 2013-01-28 2013-05-08 北京信息科技大学 Method and device for clipping facial images based on face detection and face tracking
US8873798B2 (en) * 2010-02-05 2014-10-28 Rochester Institue Of Technology Methods for tracking objects using random projections, distance learning and a hybrid template library and apparatuses thereof
CN106599836A (en) * 2016-12-13 2017-04-26 北京智慧眼科技股份有限公司 Multi-face tracking method and tracking system
CN108564029A (en) * 2018-04-12 2018-09-21 厦门大学 Face character recognition methods based on cascade multi-task learning deep neural network
CN108629299A (en) * 2018-04-24 2018-10-09 武汉幻视智能科技有限公司 A kind of matched long-time multi-object tracking method of combination face and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8873798B2 (en) * 2010-02-05 2014-10-28 Rochester Institue Of Technology Methods for tracking objects using random projections, distance learning and a hybrid template library and apparatuses thereof
CN103093212A (en) * 2013-01-28 2013-05-08 北京信息科技大学 Method and device for clipping facial images based on face detection and face tracking
CN106599836A (en) * 2016-12-13 2017-04-26 北京智慧眼科技股份有限公司 Multi-face tracking method and tracking system
CN108564029A (en) * 2018-04-12 2018-09-21 厦门大学 Face character recognition methods based on cascade multi-task learning deep neural network
CN108629299A (en) * 2018-04-24 2018-10-09 武汉幻视智能科技有限公司 A kind of matched long-time multi-object tracking method of combination face and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王文静: "人脸表情识别的算法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
聂小燕: "基于分层光流场的运动车辆检测与跟踪", 《实验技术与管理》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210319234A1 (en) * 2018-12-29 2021-10-14 Zhejiang Dahua Technology Co., Ltd. Systems and methods for video surveillance
CN110533000A (en) * 2019-09-06 2019-12-03 厦门美图之家科技有限公司 Facial image detection method, device, computer equipment and readable storage medium storing program for executing
CN110991250A (en) * 2019-11-06 2020-04-10 江苏科技大学 Face tracking method and system fusing color interference model and shielding model
CN110991250B (en) * 2019-11-06 2023-04-25 江苏科技大学 Face tracking method and system integrating color interference model and shielding model
CN110991287A (en) * 2019-11-23 2020-04-10 深圳市恩钛控股有限公司 Real-time video stream face detection tracking method and detection tracking system
CN111046752A (en) * 2019-11-26 2020-04-21 上海兴容信息技术有限公司 Indoor positioning method and device, computer equipment and storage medium
CN111160202A (en) * 2019-12-20 2020-05-15 万翼科技有限公司 AR equipment-based identity verification method, AR equipment-based identity verification device, AR equipment-based identity verification equipment and storage medium
CN111160202B (en) * 2019-12-20 2023-09-05 万翼科技有限公司 Identity verification method, device, equipment and storage medium based on AR equipment
CN111047626A (en) * 2019-12-26 2020-04-21 深圳云天励飞技术有限公司 Target tracking method and device, electronic equipment and storage medium
CN111047626B (en) * 2019-12-26 2024-03-22 深圳云天励飞技术有限公司 Target tracking method, device, electronic equipment and storage medium
CN111563490B (en) * 2020-07-14 2020-11-03 北京搜狐新媒体信息技术有限公司 Face key point tracking method and device and electronic equipment
CN111563490A (en) * 2020-07-14 2020-08-21 北京搜狐新媒体信息技术有限公司 Face key point tracking method and device and electronic equipment
CN112446922A (en) * 2020-11-24 2021-03-05 厦门熵基科技有限公司 Pedestrian reverse judgment method and device for channel gate
CN112597901A (en) * 2020-12-23 2021-04-02 艾体威尔电子技术(北京)有限公司 Multi-face scene effective face recognition device and method based on three-dimensional distance measurement
CN112597901B (en) * 2020-12-23 2023-12-29 艾体威尔电子技术(北京)有限公司 Device and method for effectively recognizing human face in multiple human face scenes based on three-dimensional ranging
CN113723375A (en) * 2021-11-02 2021-11-30 杭州魔点科技有限公司 Double-frame face tracking method and system based on feature extraction

Similar Documents

Publication Publication Date Title
CN109558815A (en) A kind of detection of real time multi-human face and tracking
CN111259850B (en) Pedestrian re-identification method integrating random batch mask and multi-scale representation learning
CN108596101B (en) Remote sensing image multi-target detection method based on convolutional neural network
CN106127204B (en) A kind of multi-direction meter reading Region detection algorithms of full convolutional neural networks
CN108492319B (en) Moving target detection method based on deep full convolution neural network
CN105069472B (en) A kind of vehicle checking method adaptive based on convolutional neural networks
CN103164706B (en) Object counting method and device based on video signal analysis
CN109559302A (en) Pipe video defect inspection method based on convolutional neural networks
CN107742099A (en) A kind of crowd density estimation based on full convolutional network, the method for demographics
CN109993095B (en) Frame level feature aggregation method for video target detection
CN108647591A (en) Activity recognition method and system in a kind of video of view-based access control model-semantic feature
CN110287960A (en) The detection recognition method of curve text in natural scene image
CN109584248A (en) Infrared surface object instance dividing method based on Fusion Features and dense connection network
CN109670452A (en) Method for detecting human face, device, electronic equipment and Face datection model
CN108334847A (en) A kind of face identification method based on deep learning under real scene
CN109784386A (en) A method of it is detected with semantic segmentation helpers
CN107016357A (en) A kind of video pedestrian detection method based on time-domain convolutional neural networks
CN109712127B (en) Power transmission line fault detection method for machine inspection video stream
CN108197604A (en) Fast face positioning and tracing method based on embedded device
CN109389599A (en) A kind of defect inspection method and device based on deep learning
CN110163041A (en) Video pedestrian recognition methods, device and storage medium again
CN110472542A (en) A kind of infrared image pedestrian detection method and detection system based on deep learning
CN109145836A (en) Ship target video detection method based on deep learning network and Kalman filtering
CN111507248A (en) Face forehead area detection and positioning method and system of low-resolution thermodynamic diagram
CN112712516B (en) High-speed rail bottom rubber strip fault detection method and system based on YOLOv5

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190402

RJ01 Rejection of invention patent application after publication