CN106295511B

CN106295511B - Face tracking method and device

Info

Publication number: CN106295511B
Application number: CN201610599541.XA
Authority: CN
Inventors: 杨松; 罗瑾文; 王百超
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2016-07-26
Filing date: 2016-07-26
Publication date: 2019-05-21
Anticipated expiration: 2036-07-26
Also published as: CN106295511A

Abstract

The disclosure is directed to a kind of face tracking method and devices, belong to field of image recognition, this method comprises: obtaining the current frame image in video data comprising facial image；When the current frame image is not first frame image, the position of multiple face key points in previous frame image adjacent with the current frame image in the video data is assessed；When the position of multiple face key points in previous frame image is evaluated as correct, position and the first specified discrimination model based on multiple face key points in the previous frame image determine the position of multiple face key points in the current frame image；When the position of multiple face key points in the previous frame image is evaluated as incorrect, Face datection is carried out to the current frame image, obtains the face location in the current frame image；Based on multiple designated key point positions and the second specified discrimination model, the position of multiple face key points in the current frame image is determined, the face in the video data is tracked with realizing.

Description

Face tracking method and device

Technical field

This disclosure relates to field of image recognition more particularly to a kind of face tracking method and device.

Background technique

With the development of technology, face tracking technology will be used wider and wider.Wherein, face tracking technology refers to The face key point of each face is tracked in video data, tracking to the face key point of face can be square Continue the processing for the video frame images for including to video data after an action of the bowels, for example, the detection of Expression analysis, driver tired driving, intelligence U.S. face etc..

In the related technology, the main positioning that face key point is carried out by discrimination model, that is to say, face tracking.It is typical Discrimination model be SDM (Supervised Descent Method, there is the gradient descent method of supervision) model.Firstly, from depositing Each of in multiple training images of storage, the mean place of each face key point in multiple face key points is obtained, will acquire The mean place of face key point is determined as the initial position X of each face key point in video frame images₀, each face key The initial position X of point₀For coordinate position of each face key point in the video frame images, obtained by SDM model each The initial position X of face key point₀The HOG (Histogram OfOriented Gradient, histograms of oriented gradients) at place is special Sign, and according to the initial position X of each face key point₀The HOG feature at place determines feature vector Y₀, and pass through feature vector Y₀ And specified function, calculate the initial position X from each face key point₀To the displacement △ X of tracing positional₀, then by everyone The initial position X of face key point₀With displacement △ X₀It is added, obtains iteration result, this completes an iteration updates.It Afterwards, repeat iteration in the same way, until resulting iteration result is a fixed value or resulting iteration knot When the variation of fruit starts to become smaller, resulting iteration result is determined as to the trace bit of each face key point in the video frame images It sets.

Summary of the invention

To overcome the problems in correlation technique, the disclosure provides a kind of face tracking method and device.

According to the first aspect of the embodiments of the present disclosure, a kind of face tracking method is provided, which comprises

Obtain the current frame image in video data comprising facial image；

When the current frame image is not first frame image, to adjacent with the current frame image in the video data Previous frame image in the positions of multiple face key points assessed, the first frame image is the in the video data Once there are the video frame images of facial image；

When the position of multiple face key points in the previous frame image is evaluated as correct, it is based on the previous frame figure The position of multiple face key points as in and the first specified discrimination model determine that multiple faces in the current frame image close The position of key point；

When the position of multiple face key points in the previous frame image is evaluated as incorrect, to the present frame figure As carrying out Face datection, the face location in the current frame image is obtained；

Based on multiple designated key point positions and the second specified discrimination model, multiple people in the current frame image are determined The position of face key point tracks the face in the video data with realizing.

Optionally, the method also includes:

The labeling position for multiple face key points that every training image in multiple training images includes is obtained, it is every described The labeling position for multiple face key points that training image includes is to multiple face key points in every training image It is labeled identified position；

Centered on the labeling position for the multiple face key points for including by every training image in multiple described training images Gauss sampling is carried out, the sample bits for multiple face key points that every training image includes in multiple described training images are obtained It sets；

The sampling location for the multiple face key points for including based on every training image in multiple described training images and mark Position is infused, discrimination model is trained, the described first specified discrimination model is obtained.

It should be noted that this improves model trainings since terminal is by a large amount of training images progress model training Effect, meanwhile, after terminal obtains the first discrimination model by model training, when in the previous frame image multiple faces close When the position of key point is evaluated as correct, terminal can position based on multiple face key points in previous frame image and this first Specified discrimination model, determines the position of multiple face key points in current frame image, to realize to the people in video data Face is tracked, and the operation for carrying out face tracking is made to have specific aim.

Optionally, the method also includes:

For every training image in multiple described training images, the multiple faces for including based on the training image are crucial The labeling position of point, determines the mean place for multiple face key points that the training image includes；

Mean place and mark based on multiple face key points that every training image in multiple described training images includes Position is infused, discrimination model is trained, the described second specified discrimination model is obtained.

It should be noted that terminal obtains the second discrimination model by model training, thus when in the previous frame image When the position of multiple face key points is evaluated as incorrect, terminal can be specified based on multiple designated key point positions and second to be sentenced Other model determines the position of multiple face key points in current frame image, carries out to realize to the face in video data Tracking, and make the operation for carrying out face tracking that there is specific aim.

Optionally, multiple people in previous frame image adjacent with the current frame image in the video data It is assessed the position of face key point, comprising:

Extract the direction gradient ladder histogram HOG feature of each of previous frame image face key point；

The HOG feature of each face key point is connected into feature vector；

Classified by specified classifier to described eigenvector, obtains classification results；

When the classification results are the first numerical value, determine that the position of multiple face key points in the previous frame image is commented It is correct for estimating；

When the classification results are second value, determine that the position of multiple face key points in the previous frame image is commented It is not correct for estimating.

It should be noted that terminal is assessed by the position for the multiple face key points for including to previous frame image, So as to the method for the position for the face key point for selecting determining current frame image to include, make face tracking that there is specific aim, To improve the efficiency of face tracking.

Optionally, the method also includes:

Dimensionality reduction is carried out to described eigenvector；

Correspondingly, described to be classified by specified classifier to described eigenvector, obtain classification results, comprising:

Classified by the specified classifier to the described eigenvector after dimensionality reduction, obtains the classification results.

It should be noted that only including in the feature vector after dimensionality reduction should after terminal carries out dimensionality reduction to this feature vector The main component of speciality vector, therefore, terminal classify to the feature vector after dimensionality reduction by specified classifier, can reduce Computation complexity improves classification speed, and then improves the efficiency of classification.

According to the second aspect of an embodiment of the present disclosure, a kind of face tracking device is provided, described device includes:

First obtains module, for obtaining the current frame image in video data comprising facial image；

Evaluation module, for when the current frame image is not first frame image, in the video data with it is described The position of multiple face key points in the adjacent previous frame image of current frame image is assessed, and the first frame image is institute State the video frame images for occurring facial image in video data for the first time；

First determining module, for being evaluated as correctly when the position of multiple face key points in the previous frame image When, position and the first specified discrimination model based on multiple face key points in the previous frame image determine described current The position of multiple face key points in frame image；

Face detection module, the position for working as multiple face key points in the previous frame image are evaluated as incorrect When, Face datection is carried out to the current frame image, obtains the face location in the current frame image；

Second determining module is worked as described in determination for being based on multiple designated key point positions and the second specified discrimination model The position of multiple face key points in prior image frame tracks the face in the video data with realizing.

Optionally, described device further include:

Second obtains module, the multiple face key points for including for obtaining every training image in multiple training images Labeling position, the labeling position for multiple face key points that every training image includes are in every training image Multiple face key points be labeled identified position；

Sampling module, the mark of multiple face key points for including with every training image in multiple described training images Note carries out Gauss sampling centered on position, and it is crucial to obtain multiple faces that every training image includes in multiple described training images The sampling location of point；

First training module, multiple faces for including based on every training image in multiple described training images are crucial The sampling location of point and labeling position, are trained discrimination model, obtain the described first specified discrimination model.

Optionally, described device further include:

Third obtains module, the multiple face key points for including for obtaining every training image in multiple training images Labeling position, the labeling position for multiple face key points that every training image includes are in every training image Multiple face key points be labeled identified position；

Third determining module, for being based on the training image for every training image in multiple described training images Including multiple face key points labeling position, determine the average bit for multiple face key points that the training image includes It sets；

Second training module, multiple faces for including based on every training image in multiple described training images are crucial The mean place and labeling position of point, are trained discrimination model, obtain the described second specified discrimination model.

Optionally, the evaluation module includes:

Extraction unit, for extracting the direction gradient ladder histogram of each of previous frame image face key point HOG feature；

Series unit, for the HOG feature of each face key point to be connected into feature vector；

Taxon obtains classification results for classifying by specified classifier to described eigenvector；

First determination unit, for determining multiple in the previous frame image when the classification results are the first numerical value The position of face key point is evaluated as correctly；

Second determination unit, for determining multiple in the previous frame image when the classification results are second value The position assessment of face key point is not correct.

Optionally, the evaluation module further include:

Dimensionality reduction unit, for carrying out dimensionality reduction to described eigenvector；

Correspondingly, the taxon includes:

Classification subelement, classifies to the described eigenvector after dimensionality reduction by the specified classifier, obtains described Classification results.

According to the third aspect of an embodiment of the present disclosure, a kind of face tracking device is provided, described device includes:

Processor；

Memory for storage processor executable instruction；

Wherein, the processor is configured to:

Obtain the current frame image in video data comprising facial image；

The technical scheme provided by this disclosed embodiment can include the following benefits: in the embodiments of the present disclosure, when It, can be to multiple faces in previous frame image when the current frame image that terminal is obtained from video data is not first frame image The position of key point is assessed, and when assessment result is correct, the multiple faces for including by the previous frame image are crucial The position of point and the first designated model, determine the position of multiple face key points in current frame image；When assessment result is not When correct, Face datection can be carried out to current frame image, and by the designated key point position and the second designated model, is determined The position of multiple face key points in current frame image tracks the face in the video data to realize.Due to Determine that the condition of face key point position is different by first designated model and the second designated model, so as to so that this first Designated model and the second designated model cooperate under different conditions, reduce terminal and carry out calculating when face tracking Amount, improves the speed of face tracking, to improve face tracking effect.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention Example, and be used to explain the principle of the present invention together with specification.

Fig. 1 is a kind of flow chart of face tracking method shown according to an exemplary embodiment.

Fig. 2A is the flow chart of another face tracking method shown according to an exemplary embodiment.

Fig. 2 B is a kind of schematic diagram of face key point shown according to an exemplary embodiment.

Fig. 3 A is the structural block diagram of the first face tracking device shown according to an exemplary embodiment.

Fig. 3 B is the structural block diagram of second of face tracking device shown according to an exemplary embodiment.

Fig. 3 C is the structural block diagram of the third face tracking device shown according to an exemplary embodiment.

Fig. 3 D is a kind of structural block diagram of evaluation module shown according to an exemplary embodiment.

Fig. 4 is the structural block diagram of the 4th kind of face tracking device shown according to an exemplary embodiment.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistented with the present invention.On the contrary, they be only with it is such as appended The example of device and method being described in detail in claims, some aspects of the invention are consistent.

Before to the embodiment of the present disclosure carrying out that explanation is explained in detail, first the application scenarios of the embodiment of the present disclosure are given It introduces.With the development of technology, face tracking technology will be used wider and wider it is general.In the related technology, mainly pass through SDM mould Type determines the position of face key point in video data to realize the tracking to face.But due to by SDM model into When row face tracking, need largely to be calculated, to cause face tracking process cumbersome, face tracking is ineffective.Therefore, The embodiment of the present disclosure provides a kind of face tracking method, so that the calculation amount during reducing face tracking, simplifies people The process of face tracking, to improve the effect of face tracking.

Fig. 1 is a kind of flow chart of face tracking method shown according to an exemplary embodiment, as shown in Figure 1, the party Method is for including the following steps in terminal.

In a step 101, the current frame image in video data comprising facial image is obtained.

In a step 102, when the current frame image is not first frame image, in the video data with the present frame figure As the position of multiple face key points in adjacent previous frame image is assessed, which is in the video data Occur the video frame images of facial image for the first time.

In step 103, when the position of multiple face key points in the previous frame image is evaluated as correct, being based on should The position of multiple face key points in previous frame image and the first specified discrimination model, determine multiple in the current frame image The position of face key point.

At step 104, when the position of multiple face key points in the previous frame image is evaluated as incorrect, to this Current frame image carries out Face datection, obtains the face location in the current frame image.

In step 105, it is based on multiple designated key point positions and the second specified discrimination model, determines the current frame image In multiple face key points position, with realize the face in the video data is tracked.

In the embodiments of the present disclosure, when the current frame image that terminal is obtained from video data is not first frame image, The position of multiple face key points in previous frame image can be assessed, and when assessment result is correct, by this The position for multiple face key points that previous frame image includes and the first designated model, determine multiple faces in current frame image The position of key point；When assessment result is incorrect, Face datection can be carried out to current frame image, and pass through the specified pass Key point position and the second designated model determine the position of multiple face key points in current frame image, to realize to the view Face of the frequency in is tracked.Due to determining face key point position by first designated model and the second designated model Condition it is different, so as to so that first designated model and the second designated model cooperate under different conditions, Reduce terminal and carry out calculation amount when face tracking, the speed of face tracking is improved, to improve face tracking effect.

Optionally, this method further include:

Obtain the labeling position for multiple face key points that every training image in multiple training images includes, this every instruction The labeling position for practicing multiple face key points that image includes is to carry out to multiple face key points in this every training image Position determined by marking；

Centered on the labeling position for the multiple face key points for including by every training image in multiple training images into Row Gauss sampling, obtains the sampling location for multiple face key points that every training image in multiple training images includes；

The sampling location for the multiple face key points for including based on every training image in multiple training images and mark Position is trained discrimination model, obtains the first specified discrimination model.

Optionally, this method further include:

For every training image in multiple training images, the multiple face key points for including based on the training image Labeling position determines the mean place for multiple face key points that the training image includes；

Mean place and mark based on multiple face key points that every training image in multiple training images includes Position is trained discrimination model, obtains the second specified discrimination model.

Optionally, to multiple face key points in previous frame image adjacent with the current frame image in the video data Position assessed, comprising:

The HOG feature of each face key point is connected into feature vector；

Classified by specified classifier to this feature vector, obtains classification results；

When the classification results are the first numerical value, determine that the position of multiple face key points in the previous frame image is evaluated as Correctly；

When the classification results are second value, the position assessment of multiple face key points in the previous frame image is determined not It is correct.

Optionally, this method further include:

Dimensionality reduction is carried out to this feature vector；

Correspondingly, classified by specified classifier to this feature vector, obtain classification results, comprising:

It specifies classifier to classify this feature vector after dimensionality reduction by this, obtains the classification results.

All the above alternatives, can form the alternative embodiment of the disclosure according to any combination, and the present invention is real It applies example and this is no longer repeated one by one.

Fig. 2A is a kind of flow chart of the face tracking method shown accoding to exemplary embodiment, referring to fig. 2 A, this method Include the following steps.

In step 201, terminal obtains the current frame image in video data comprising facial image.

Wherein, current frame image refers to that terminal currently carries out the image of face tracking, and the present frame from video data Image can be not specifically limited in this embodiment for any frame image in video data, the embodiment of the present disclosure.

It should be noted that the video data can be any video data, for example, the video data can be alert for electronics The video data etc. taken is examined, the embodiment of the present disclosure is not specifically limited in this embodiment.

It should also be noted that, the operation that terminal obtains the current frame image in video data comprising facial image can be joined The relevant technologies are examined, the embodiment of the present disclosure is not specifically limited in this embodiment.

In step 202, when the current frame image is not first frame image, terminal is to current with this in the video data The position of multiple face key points in the adjacent previous frame image of frame image is assessed, which is the video counts There are the video frame images of facial image according to middle first time.

Wherein, terminal is crucial to multiple faces in previous frame image adjacent with the current frame image in the video data The operation assessed of position of point can be with are as follows: extracts the HOG feature of each of previous frame image face key point；It will be every The HOG feature of a face key point is connected into feature vector；Classified by specified classifier to this feature vector, is divided Class result；When the classification results are the first numerical value, determine that the position of multiple face key points in the previous frame image is evaluated as Correctly；When the classification results are second value, determine that the position assessment of multiple face key points in the previous frame image is not Correctly.

It should be noted that previous frame image refer to it is adjacent with current frame image in video data, and be located at present frame figure Image before picture.

In addition, since terminal is when carrying out face tracking, it can be to each frame figure in video data including facial image As carrying out face tracking, and obtain the position for multiple face key points that each frame image includes.Therefore, when the terminal is to working as When prior image frame carries out face tracking, which can be from the image for having been carried out face tracking, acquisition and current frame image The position for multiple face key points that adjacent previous frame image includes.

It should also be noted that, multiple face key point can refer to the point for indicating human face five-sense-organ.For example, such as Fig. 2 B institute The face key point schematic diagram shown, multiple face key point can be eyes, mouth, nose, the eyebrow, face of expression face The point of profile etc., the embodiment of the present disclosure are not specifically limited in this embodiment.

In addition, in the embodiments of the present disclosure, the quantity of multiple face key point can be 95, certainly in practical application In, the quantity of multiple face key point may be other numbers, and the embodiment of the present disclosure is not specifically limited in this embodiment.

It should also be noted that, specified classifier is used to classify to feature vector, and the specified classifier can be Any one classifier with classification feature, for example, SVM (Support Vector Machine, support vector machines) etc., this Open embodiment is not specifically limited in this embodiment.

In addition, the operation that terminal extracts the HOG feature of each of previous frame image face key point can refer to phase Pass technology, the embodiment of the present disclosure no longer repeat this one by one.

Furthermore the operation that the HOG feature of each face key point is connected into feature vector can equally be referred to phase by terminal Pass technology, the embodiment of the present disclosure equally no longer repeat this one by one.

It should also be noted that, in the embodiments of the present disclosure, the first numerical value and second value are two different numerical value.

Wherein, the first numerical value is correct to the classification results of this feature vector for indicating the specified classifier, and this One numerical value can be arranged in advance, for example, first numerical value can be 1,0 etc., the embodiment of the present disclosure is not specifically limited in this embodiment.

Similarly, second value is used to indicate that the specified classifier to be incorrect to the classification results of this feature vector, and should Second value can be equally arranged in advance, for example, the second value can be 0,1 etc., the embodiment of the present disclosure is not done this specifically It limits.

Wherein, since first numerical value and second value are two different numerical value, when first numerical value is 1 When, second value 0；When first numerical value is 0, which is 1.

It further, can also be to this after the HOG feature of each face key point is connected into feature vector by terminal Feature vector carries out dimensionality reduction；Correspondingly, which classifies to this feature vector by specified classifier, obtains classification results Operation can be with are as follows: specify classifier to classify the feature vector after dimensionality reduction by this, obtain the classification results.

Wherein, terminal can be by PCA (Principal ComponentAnalysis, principal component analysis) algorithm to this Feature vector carry out dimensionality reduction, this feature vector can also be carried out dimensionality reduction by other means, the embodiment of the present disclosure to this not It is specifically limited.

It should be noted that terminal can be with reference to related skill by the operation that PCA algorithm carries out dimensionality reduction to this feature vector Art, the embodiment of the present disclosure no longer repeat this one by one.

It should also be noted that, only including in the feature vector after dimensionality reduction after terminal carries out dimensionality reduction to this feature vector The main component of the speciality vector, therefore, terminal classify to the feature vector after dimensionality reduction by specified classifier, can drop Low computation complexity improves classification speed, and then improves the efficiency of classification.

In addition, terminal is assessed by the position for the multiple face key points for including to previous frame image, so as to The method for selecting the position for the face key point for determining that current frame image includes makes face tracking have specific aim, to improve The efficiency of face tracking.

In step 203, when the position of multiple face key points in the previous frame image is evaluated as correct, terminal base The position of multiple face key points in the previous frame image and the first specified discrimination model, determine in the current frame image The position of multiple face key points.

Wherein, position and first specified discrimination model of the terminal based on multiple faces key in the previous frame image, really The operation of the position of multiple face key points in the fixed current frame image can be with are as follows: terminal will be multiple in the previous frame image The position of each of position of face key point face key point is updated in the first specified discrimination model and is calculated, will Obtained result is determined as the position of multiple face key points in current frame image.

It should be noted that the first specified discrimination model is the model that progress model training obtains in advance, and first finger Determine discrimination model to be used for when the position of multiple face key points in previous frame image is evaluated as correct, in current frame image The position of multiple face key points is calculated.

Wherein, terminal carry out model training obtain the operation of the first discrimination model can be with are as follows: obtain in multiple training images The labeling position for multiple face key points that every training image includes, multiple face key points that this every training image includes Labeling position be identified position is labeled to multiple face key points in this every training image；With multiple instructions Gauss sampling is carried out centered on the labeling position for multiple face key points that every training image includes in white silk image, it is more to obtain this Open the sampling location for multiple face key points that every training image includes in training image；Based on every in multiple training images Sampling location and the labeling position for opening multiple face key points that training image includes, are trained discrimination model, are somebody's turn to do First specified discrimination model.

It should be noted that training image is the image for carrying out model training, which can be any one The image with facial image is opened, and the quantity of the training image can be any amount.And in order to have preferably training effect The quantity of fruit, the training image is The more the better, for example, the quantity of the training image can be 10000, the embodiment of the present disclosure pair This is not specifically limited.

It should also be noted that, multiple face key points that terminal includes with every training image in multiple training images Labeling position centered on carry out Gauss sampling, obtain every training image includes in multiple training images multiple faces and close The operation of the sampling location of key point can refer to the relevant technologies, and the embodiment of the present disclosure no longer repeats this one by one.

Wherein, the sample bits for multiple face key points that terminal includes based on every training image in multiple training images Set and labeling position, discrimination model be trained, obtain the first specified discrimination model operation can with reference to the relevant technologies, The embodiment of the present disclosure no longer repeats this one by one.

In step 204, when the position of multiple face key points in the previous frame image is evaluated as incorrect, to this Current frame image carries out Face datection, obtains the face location in the current frame image.

Wherein, when the position of multiple face key points in the previous frame image is evaluated as incorrect, terminal can lead to The mode for crossing recognition of face carries out Face datection to the current frame image, obtains the face location in the current frame image.

It should be noted that terminal not only can carry out Face datection to current frame image by way of recognition of face, The terminal can also carry out Face datection to the current frame image by other means, and the embodiment of the present disclosure does not do specific limit to this It is fixed.

It should also be noted that, terminal carries out Face datection to the current frame image by way of recognition of face, obtain The operation of face location in the current frame image can refer to the relevant technologies, and the embodiment of the present disclosure no longer goes to live in the household of one's in-laws on getting married to this one by one It states.

In step 205, terminal is based on multiple designated key point positions and the second specified discrimination model, determines the present frame The position of multiple face key points in image tracks the face in the video data with realizing.

Wherein, terminal is based on multiple designated key point position and the second specified discrimination model, determines the current frame image In the operations of position of multiple face key points can be with are as follows: terminal is closed each of multiple designated key point position is specified Key point position is updated in the second specified discrimination model and is calculated, and obtained result is determined as more in current frame image The position of a face key point.

It should be noted that the second specified discrimination model is the model that progress model training obtains in advance, and second finger Determine discrimination model to be used for when the position of multiple face key points in previous frame image is evaluated as incorrect, to current frame image In the positions of multiple face key points calculated.

Wherein, terminal carry out model training obtain the operation of the second discrimination model can be with are as follows: obtain in multiple training images The labeling position for multiple face key points that every training image includes, multiple face key points that this every training image includes Labeling position be identified position is labeled to multiple face key points in this every training image；For this multiple Every training image in training image, the labeling position based on multiple face key points that the training image includes, determines the instruction Practice the mean place for multiple face key points that image includes；Based on every training image in multiple training images include it is more The mean place and labeling position of a face key point, are trained discrimination model, obtain the second specified discrimination model.

It should be noted that multiple designated key point is set to the position of multiple face key points set in advance, than Such as, multiple designated key point is set to the mean place for multiple face key points that the training image includes, certainly, multiple Designated key point position may be other positions, and the embodiment of the present disclosure is not specifically limited in this embodiment.

In addition, for every training image in multiple training images, multiple people that terminal includes based on the training image The labeling position of face key point determines that the operation of the mean place for multiple face key points that the training image includes can be with are as follows: Multiple face key points in every training image are numbered, the identical people of multiple numbers is obtained from multiple training images The labeling position of the labeling position of face key point, the identical face key point of the multiple numbers that will acquire is added, and uses phase The result obtained after adding obtains the flat of multiple face key points that the training image includes divided by the quantity of multiple training image Equal position.

For example, when the quantity of training image is 5, terminal to every training image in training image in 5 include it is multiple Face key point is numbered from 0 to 95, and the face that number is 0 in every training image is obtained from 5 training images The labeling position of key point is respectively (10,15), (9,10), (10,10), (11,15) and (10,10), 5 numbers that will acquire Labeling position for 0 face key point is added, and (50,60) are obtained, with the result (50,60) being added after obtaining divided by instruction The quantity 5 for practicing image, the mean place for obtaining the face key point that the number that the training image includes is 0 is (10,12).Terminal According to the operation for the mean place that the number that the training image includes is 0, the face key point that remaining number is 1 to 95 is determined Mean place, so that it is determined that the mean place for multiple face key points that the training image includes.

It should be noted that in the embodiments of the present disclosure involved number be for illustrating the embodiment of the present disclosure and For example, not to the embodiment of the present disclosure constitute limit.

Further, when the current frame image that the terminal obtains is first frame image, which can be to present frame figure As carrying out Face datection, the face location in current frame image is obtained；And it is specified based on multiple designated key point positions and second Discrimination model determines the position of multiple face key points in the current frame image.

It should also be noted that, terminal will stop to face when terminal can not get the frame image in video data The tracking of image.

In the embodiments of the present disclosure, when the current frame image that terminal is obtained from video data is not first frame image, The position of multiple face key points in previous frame image can be assessed, and when assessment result is correct, by this The position for multiple face key points that previous frame image includes and the first designated model, determine multiple faces in current frame image The position of key point；When assessment result is incorrect, alternatively, the current frame image that terminal is obtained from video data is not the When one frame image, Face datection can be carried out to current frame image, and specified by multiple designated key point position and second Model determines the position of multiple face key points in current frame image, carries out to realize to the face in the video data Tracking.Due to training the training condition of obtained the first specified discrimination model and the second specified discrimination model different, pass through First designated model and the second designated model determine that the condition of face key point position is different, so as to so that this is first specified Model and the second designated model cooperate under different conditions, reduce big meter when terminal carries out face tracking Calculation process, improves the speed of face tracking, to improve face tracking effect.

Fig. 3 A is a kind of structural block diagram of face tracking device shown according to an exemplary embodiment.It, should referring to Fig. 3 A Device includes: that the first acquisition module 301, evaluation module 302, the first determining module 303, face detection module 304 and second are true Cover half block 305.

First obtains module 301, for obtaining the current frame image in video data comprising facial image；

Evaluation module 302, for when the current frame image is not first frame image, to current with this in the video data The position of multiple face key points in the adjacent previous frame image of frame image is assessed, which is the video counts There are the video frame images of facial image according to middle first time；

First determining module 303, for being evaluated as correctly when the position of multiple face key points in the previous frame image When, position and the first specified discrimination model based on multiple face key points in the previous frame image determine the present frame figure The position of multiple face key points as in；

Face detection module 304, for being evaluated as not just when the position of multiple face key points in the previous frame image When true, Face datection is carried out to the current frame image, obtains the face location in the current frame image；

Second determining module 305, for being based on multiple designated key point positions and the second specified discrimination model, determination is deserved The position of multiple face key points in prior image frame tracks the face in the video data with realizing.

Optionally, referring to Fig. 3 B, the device further include:

Second obtains module 306, crucial for obtaining multiple faces that every training image in multiple training images includes The labeling position of point, the labeling position for multiple face key points that this every training image includes are in this every training image Multiple face key points be labeled identified position；

Sampling module 307, multiple face key points for including with every training image in multiple training images Gauss sampling is carried out centered on labeling position, and it is crucial to obtain multiple faces that every training image includes in multiple training images The sampling location of point；

First training module 308, multiple faces for including based on every training image in multiple training images close The sampling location of key point and labeling position, are trained discrimination model, obtain the first specified discrimination model.

Optionally, referring to Fig. 3 C, the device further include:

Third obtains module 309, crucial for obtaining multiple faces that every training image in multiple training images includes The labeling position of point, the labeling position for multiple face key points that this every training image includes are in this every training image Multiple face key points be labeled identified position；

Third determining module 310, for being based on the training image packet for every training image in multiple training images The labeling position of the multiple face key points included determines the mean place for multiple face key points that the training image includes；

Second training module 311, multiple faces for including based on every training image in multiple training images close The mean place and labeling position of key point, are trained discrimination model, obtain the second specified discrimination model.

Optionally, referring to Fig. 3 D, which includes:

Extraction unit 3021, for extracting the direction gradient ladder histogram of each of previous frame image face key point HOG feature；

Series unit 3022, for the HOG feature of each face key point to be connected into feature vector；

Taxon 3023 obtains classification results for classifying by specified classifier to this feature vector；

First determination unit 3024, for determining multiple in the previous frame image when the classification results are the first numerical value The position of face key point is evaluated as correctly；

Second determination unit 3025, for determining multiple in the previous frame image when the classification results are second value The position assessment of face key point is not correct.

Optionally, the evaluation module 302 further include:

Dimensionality reduction unit, for carrying out dimensionality reduction to this feature vector；

Correspondingly, which includes:

Classification subelement, specifies classifier to classify this feature vector after dimensionality reduction, obtains the classification knot by this Fruit.

About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.

Fig. 4 is a kind of block diagram of device 400 for face tracking shown according to an exemplary embodiment.For example, dress Setting 400 can be mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, medical treatment Equipment, body-building equipment, personal digital assistant etc..

Referring to Fig. 4, device 400 may include following one or more components: processing component 402, memory 404, power supply Component 406, multimedia component 408, audio component 410, the interface 412 of input/output (I/O), sensor module 414, and Communication component 416.

The integrated operation of the usual control device 400 of processing component 402, such as with display, telephone call, data communication, phase Machine operation and record operate associated operation.Processing component 402 may include that one or more processors 420 refer to execute It enables, to perform all or part of the steps of the methods described above.In addition, processing component 402 may include one or more modules, just Interaction between processing component 402 and other assemblies.For example, processing component 402 may include multi-media module, it is more to facilitate Interaction between media component 408 and processing component 402.

Memory 404 is configured as storing various types of data to support the operation in device 400.These data are shown Example includes the instruction of any application or method for operating on device 400, contact data, and telephone book data disappears Breath, picture, video etc..Memory 404 can be by any kind of volatibility or non-volatile memory device or their group It closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash Device, disk or CD.

Power supply module 406 provides power supply for the various assemblies of device 400.Power supply module 406 may include power management system System, one or more power supplys and other with for device 400 generate, manage, and distribute the associated component of power supply.

Multimedia component 408 includes the screen of one output interface of offer between described device 400 and user.One In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers Body component 408 includes a front camera and/or rear camera.When device 400 is in operation mode, such as screening-mode or When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.

Audio component 410 is configured as output and/or input audio signal.For example, audio component 410 includes a Mike Wind (MIC), when device 400 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone is matched It is set to reception external audio signal.The received audio signal can be further stored in memory 404 or via communication set Part 416 is sent.In some embodiments, audio component 410 further includes a loudspeaker, is used for output audio signal.

I/O interface 412 provides interface between processing component 402 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.

Sensor module 414 includes one or more sensors, and the state for providing various aspects for device 400 is commented Estimate.For example, sensor module 414 can detecte the state that opens/closes of device 400, and the relative positioning of component, for example, it is described Component is the display and keypad of device 400, and sensor module 414 can be with 400 1 components of detection device 400 or device Position change, the existence or non-existence that user contacts with device 400,400 orientation of device or acceleration/deceleration and device 400 Temperature change.Sensor module 414 may include proximity sensor, be configured to detect without any physical contact Presence of nearby objects.Sensor module 414 can also include optical sensor, such as CMOS or ccd image sensor, at As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 416 is configured to facilitate the communication of wired or wireless way between device 400 and other equipment.Device 400 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation In example, communication component 416 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 416 further includes near-field communication (NFC) module, to promote short range communication.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, device 400 can be believed by one or more application specific integrated circuit (ASIC), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided It such as include the memory 404 of instruction, above-metioned instruction can be executed by the processor 420 of device 400 to complete the above method.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..

A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of mobile terminal When device executes, so that mobile terminal is able to carry out a kind of face tracking method, which comprises

Obtain the current frame image in video data comprising facial image.

When the current frame image is not first frame image, in the video data adjacent with the current frame image upper one The position of multiple face key points in frame image is assessed, which is to occur people for the first time in the video data The video frame images of face image.

When the position of multiple face key points in the previous frame image is evaluated as correct, based in the previous frame image Multiple face key points position and the first specified discrimination model, determine multiple face key points in the current frame image Position.

When the position of multiple face key points in the previous frame image is evaluated as incorrect, to the current frame image into Row Face datection obtains the face location in the current frame image.

Based on multiple designated key point positions and the second specified discrimination model, multiple faces in the current frame image are determined The position of key point tracks the face in the video data with realizing.

Optionally, this method further include:

The HOG feature of each face key point is connected into feature vector；

Optionally, this method further include:

Dimensionality reduction is carried out to this feature vector；

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.This application is intended to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.

It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.

Claims

1. a kind of face tracking method, which is characterized in that the described method includes:

Obtain the current frame image in video data comprising facial image；

When the current frame image is not first frame image, to adjacent with the current frame image upper in the video data The position of multiple face key points in one frame image is assessed, and the first frame image is first time in the video data There are the video frame images of facial image；

When the position of multiple face key points in the previous frame image is evaluated as correct, based in the previous frame image Multiple face key points position and the first specified discrimination model, determine multiple face key points in the current frame image Position；

When the position of multiple face key points in the previous frame image is evaluated as incorrect, to the current frame image into Row Face datection obtains the face location in the current frame image；

Based on multiple designated key point positions and the second specified discrimination model, determine that multiple faces in the current frame image close The position of key point tracks the face in the video data with realizing.

2. the method as described in claim 1, which is characterized in that the method also includes:

Obtain the labeling position for multiple face key points that every training image in multiple training images includes, every training The labeling position for multiple face key points that image includes is to carry out to multiple face key points in every training image Position determined by marking；

It is carried out centered on the labeling position for the multiple face key points for including by every training image in multiple described training images Gauss sampling obtains the sampling location for multiple face key points that every training image includes in multiple described training images；

Sampling location and mark position based on multiple face key points that every training image in multiple described training images includes It sets, discrimination model is trained, obtain the described first specified discrimination model.

3. the method as described in claim 1, which is characterized in that the method also includes:

For every training image in multiple described training images, the multiple face key points for including based on the training image Labeling position determines the mean place for multiple face key points that the training image includes；

The mean place for the multiple face key points for including based on every training image in multiple described training images and mark position It sets, discrimination model is trained, obtain the described second specified discrimination model.

4. the method as described in claim 1, which is characterized in that it is described in the video data with the current frame image phase It is assessed the position of multiple face key points in adjacent previous frame image, comprising:

The HOG feature of each face key point is connected into feature vector；

5. method as claimed in claim 4, which is characterized in that the method also includes:

Dimensionality reduction is carried out to described eigenvector；

6. a kind of face tracking device, which is characterized in that described device includes:

Evaluation module, for when the current frame image is not first frame image, in the video data with it is described current The position of multiple face key points in the adjacent previous frame image of frame image is assessed, and the first frame image is the view There are the video frame images of facial image in for the first time in frequency；

First determining module, for when the position of multiple face key points in the previous frame image is evaluated as correct, base The position of multiple face key points in the previous frame image and the first specified discrimination model, determine the current frame image In multiple face key points position；

Face detection module, for when the position of multiple face key points in the previous frame image is evaluated as incorrect, Face datection is carried out to the current frame image, obtains the face location in the current frame image；

Second determining module determines the present frame for being based on multiple designated key point positions and the second specified discrimination model The position of multiple face key points in image tracks the face in the video data with realizing.

7. device as claimed in claim 6, which is characterized in that described device further include:

Second obtains module, for obtaining the mark for multiple face key points that every training image in multiple training images includes Position, the labeling position for multiple face key points that every training image includes are to more in every training image A face key point is labeled identified position；

Sampling module, the mark position of multiple face key points for including with every training image in multiple described training images It is set to center and carries out Gauss sampling, obtain multiple face key points that every training image includes in multiple described training images Sampling location；

First training module, multiple face key points for including based on every training image in multiple described training images Sampling location and labeling position, are trained discrimination model, obtain the described first specified discrimination model.

8. device as claimed in claim 6, which is characterized in that described device further include:

Third obtains module, for obtaining the mark for multiple face key points that every training image in multiple training images includes Position, the labeling position for multiple face key points that every training image includes are to more in every training image A face key point is labeled identified position；

Third determining module, for including based on the training image for every training image in multiple described training images Multiple face key points labeling position, determine the mean place for multiple face key points that the training image includes；

Second training module, multiple face key points for including based on every training image in multiple described training images Mean place and labeling position, are trained discrimination model, obtain the described second specified discrimination model.

9. device as claimed in claim 6, which is characterized in that the evaluation module includes:

Extraction unit, the direction gradient ladder histogram HOG for extracting each of previous frame image face key point are special Sign；

First determination unit, for determining multiple faces in the previous frame image when the classification results are the first numerical value The position of key point is evaluated as correctly；

Second determination unit, for determining multiple faces in the previous frame image when the classification results are second value The position assessment of key point is not correct.

10. device as claimed in claim 9, which is characterized in that the evaluation module further include:

Correspondingly, the taxon includes:

Classification subelement, classifies to the described eigenvector after dimensionality reduction by the specified classifier, obtains the classification As a result.