Embodiment one:
Referring to Fig. 1, the human face in-vivo detection method that the embodiment of the present invention one provides is the following steps are included: should be noted
It is, if having substantially the same as a result, human face in-vivo detection method of the invention is not limited with process sequence shown in FIG. 1.
Face in S101, detection image.
In the embodiment of the present invention one, S101 can specifically include following steps:
S1011, image is obtained;
Such as obtain a frame image of video camera acquisition;Video camera can be what face recognition device carried, be also possible to
The external camera being connect with face recognition device;Face recognition device can be mobile terminal (such as mobile phone, tablet computer
Deng) or desktop computer etc.;
Face in S1012, detection image;
In the embodiment of the present invention one, Face datection can be carried out using VJ algorithm and (please refer to " Rapid object
Detection using a boosted cascade of simple features " (P.Viola, M.Jones,
Proceedings of the 2001IEEE Computer Society Conference on Computer Vision
And Pattern Recognition, 2001), the algorithm combination Haar feature and Adaboost classifier carry out Face datection,
Feature extraction is accelerated using integrogram, cascade mode is carried out by the strong classifier for constructing AdaBoost, can improved
Substantially accelerate detection speed while Face datection performance, has the advantages that operation efficiency is high, resources occupation rate is lower, be suitble to
The built-in terminals such as mobile phone, tablet computer carry out real-time Face datection.
Whether the size for the face that S1013, judgement detect coincide with preset window, if coincide, executes S102.
For the accuracy for improving face In vivo detection, it is provided with preset window, it is desirable that face is placed on default window by user
Dependent instruction movement is carried out in mouthful.Preset window can be circle, ellipse, rectangle etc..
The apex coordinate in the upper left corner of the extraneous rectangle frame of preset window is denoted as (X1, Y1), the apex coordinate note in the lower right corner
For (X2, Y2).In the embodiment of the present invention one, X1=69, Y2=169, X2=254, Y2=408, naturally it is also possible to be set as
Other values.Assuming that the apex coordinate in the upper left corner of the face rectangle frame of the human face region detected is denoted as (x1, y1), the lower right corner
Apex coordinate is denoted as (x2, y2).So, the human face region detected and the overlapping region area A of preset window can be indicated are as follows:
A=(min (X2, x2)-max (X1, x1)+1) (min (Y2, y2)-max (Y1, y1)+1), wherein max and min points
It Biao Shi not be maximized and be minimized operation.
The human face region detected and the registration I of preset window can be indicated are as follows:
If I < preset value, then it is assumed that the size of the face detected is misfitted with preset window, returns to S1011, otherwise,
Execute S102.Preset value can be set to 0.4, naturally it is also possible to be other empirical values.
S102, generation random action instruction relevant to face.
In the embodiment of the present invention one, random action instruction relevant to face may include shaking the head, nodding, blinking, opening
One of mouth or any combination.
S103, acquisition image sequence.
Image sequence includes user's movement corresponding image completed for cooperation random action instruction, it is contemplated that movement etc.
To and complete substantially duration, the embodiment of the present invention one acquire image sequence length be 100 frames, naturally it is also possible to be other
Empirical value.
S104, according to the human face action of acquired image Sequence Detection user, calculated according to the position of face key point
Statistic relevant to the instruction of each random action, calculates separately the phase that relevant all statistics are instructed to each random action
Judge whether the human face action of user is consistent with random action instruction to variable quantity, and according to relative variation, if unanimously,
Determine that current face is living body.
In the embodiment of the present invention one, S104 can specifically include following steps:
A frame image in S1041, reading acquired image sequence, then executes S1042, if in image sequence
Image all reads and finishes, then directly executes S1045.
S1042, locating human face's key point.
Currently, locating human face's key point usually requires 68 key points of positioning, however most of key point is living for face
Physical examination survey is nonsensical, and the positioning of these redundancy key points increases resources occupation rate, and reduces operation efficiency.Also having only needs
12 key points are positioned, but these key points are not enough the expression of eyes and mouth region, what is for example positioned is upper and lower
Eyelid and upper and lower lip portion all only have a key point, and these parts itself are there is no special location feature, be easy because
Position error and cause eyes and mouth states to judge incorrectly.In this way, the accuracy of face In vivo detection is be easy to cause to decline.
The embodiment of the present invention one is directed to four operating position fixings, 19 face key points of shaking the head, nod, blink and open one's mouth, wherein
Left eye, right eye and mouth distinguish 6 key points, 1 key point of nose.Specifically: the upper eyelid of left eye and palpebra inferior have 2 respectively
There is 1 key point at a key point, the both sides canthus of left eye respectively;The upper eyelid of right eye and palpebra inferior have 2 key points respectively, right
There is 1 key point at the both sides canthus of eye respectively;Upper lip and lower lip have 2 key points, the both sides corners of the mouth difference of lip respectively
There is 1 key point.Eye and mouth key point are more, and the movement range that is primarily due to blink and open one's mouth is relatively small, to eye and
The key point positioning accuracy request of mouth is higher.Wherein, upper palpebra inferior and upper and lower lip portion have 2 key points, not only may be used
Judged by accident to avoid eyes caused by single key point position error or mouth states, and can distinguish different user eyes,
Eyes caused by mouth shape difference or mouth states erroneous judgement.The relative position variation of nose and eyes, mouth can be used for sentencing
It certainly shakes the head and nodding action.25 face key points can also be positioned, i.e. left eye, right eye and mouth distinguishes 8 key points, nose
1 key point.Specifically: the upper eyelid of left eye and palpebra inferior have 3 key points respectively, and the both sides canthus of left eye has 1 respectively
Key point;The upper eyelid of right eye and palpebra inferior have 3 key points respectively, and there is 1 key point at the both sides canthus of right eye respectively;Upper mouth
Lip and lower lip have 3 key points respectively, and the both sides corners of the mouth of lip has 1 key point respectively.Can certainly position 18 or
24 face key points, the i.e. supratip key point of delocalization.It is also possible to the face key point of other quantity, as long as guaranteeing left
Eye, right eye and mouth distinguish at least six key point.
The localization method of the face key point of the embodiment of the present invention one uses Active Shape Model Method.The embodiment of the present invention
One opens the key point position of facial image using handmarking 500, as training dataset.
Since human face action mainly carries out in preset window, in order to reduce operand, locating human face's key point tool
Body are as follows: locating human face's key point in preset window.
S1043, statistic relevant to the random action instruction being currently generated is calculated according to the position of face key point.
The present invention only calculates statistic relevant to the random action instruction generated each time, do not need calculate and other with
The relevant statistic of machine action command does not need more to describe different statistics by state machine etc..It can substantially drop in this way
Low operand improves operation efficiency.Such as in S102, the instruction of the random action that is currently generated is to shake the head, then in S1043, only
Calculate statistic relevant to shaking the head.
For shaking the head and nodding action, common method is to calculate relevant three Eulerian angles of human face posture information
(pitch, yaw, roll), the calculating of these three values are related to complicated angle calculation, matrix operation, and computation complexity is high.Due to
The present invention only needs to judge whether the human face action of user is consistent with random action instruction, the tool without judging current face
Body movement.Therefore, the present invention carrys out Counting statistics amount only in accordance with face position when shaking the head or nodding, and is sentenced according to relative variation
Whether the human face action of disconnected user is consistent with random action instruction.
Specifically, the random action instruction being currently generated is calculated and is shaken the head according to the position of face key point when shaking the head
Relevant statistic U1Specifically:
Wherein, x1 is the x-axis coordinate value at the right canthus of right eye, and x10 is the x-axis coordinate value at the left side canthus of left eye, x13
It is the x-axis coordinate value of nose.When positive subject to human face posture, U1Close to 1, when shaking the head, U1Far from 1.
The random action instruction being currently generated is to calculate system relevant to nodding according to the position of face key point when nodding
Measure U2Specifically:
Wherein, y13 is the y-axis coordinate value of nose, and y4 is the y-axis coordinate value at the left side canthus of right eye, the right of y7 left eye
The y-axis coordinate value at canthus, y14 are the y-axis coordinate values of the right corners of the mouth of lip, and y17 is the y-axis coordinate of the left side corners of the mouth of lip
Value.When positive subject to human face posture, U2Close to 1, when nodding, U2Far from 1.
When the random action instruction being currently generated is blink, system relevant to blink is calculated according to the position of face key point
Measure U3Specifically:
Wherein, (x2, y2) and (x3, y3) is the coordinate value of 2 key points in the upper eyelid of right eye respectively, (x5, y5) and
(x6, y6) is the coordinate value of 2 key points of the palpebra inferior of right eye respectively, and (x4, y4) is the coordinate value at the left side canthus of right eye,
(x1, y1) is the coordinate value at the right canthus of right eye, and (x8, y8) and (x9, y9) is 2 key points in the upper eyelid of left eye respectively
Coordinate value, (x11, y11) and (x12, y12) is the coordinate value of 2 key points of the palpebra inferior of left eye, (x10, y10) respectively
It is the coordinate value at the left side canthus of left eye, (x7, y7) is the coordinate value at the right canthus of left eye, U3Value is bigger, illustrates eyes
It is bigger to open degree;Conversely, illustrate eyes to open degree smaller.
The random action instruction being currently generated is to calculate system relevant to opening one's mouth according to the position of face key point when opening one's mouth
Measure U4Specifically:
Wherein, (x15, y15) and (x16, y16) is the coordinate value of 2 key points of upper lip respectively, (x18, y18) and
(x19, y19) is the coordinate value of 2 key points of lower lip respectively, and (x14, y14) is the coordinate value of the right corners of the mouth of lip,
(x17, y17) is the coordinate value of the left side corners of the mouth of lip.U4Value is bigger, illustrates that the opening degree of mouth is bigger;Conversely, illustrating mouth
Bar opening degree it is smaller.
As it can be seen that calculating the calculating for instructing relevant statistic to the random action being currently generated in the embodiment of the present invention one
Complexity is low, and operand is small, convenient for realizing rapid solving on the built-in terminals such as mobile phone.
Statistic relevant to the random action instruction being currently generated described in S1044, caching.
S1045, after reading to image sequence, the relevant statistic of all and random action instruction of caching is read,
Calculate separately the relative variation that relevant all statistics are instructed to each random action.
Assuming that random action instruction is that (k=1 indicates to shake the head k, and k=2 expression is nodded, and k=3 indicates blink, and k=4 indicates to open
Mouth), corresponding statistic is Uk, the quantity of statistic is N (due to the image sequence that S103 is acquired in the present invention in spatial cache
Length be 100 frames, and the case where S1042 fails there may be crucial point location, therefore N≤100).For each random action
All statistics of instruction can be linked to be a curve, have recorded the situation of change of statistic during human face action.Wherein, bent
The wave crest and trough of line reflect the limiting condition of human face action, for example: U when head shaking movement moves to the leftmost side1Minimum reaches
Wave trough position;U when head shaking movement moves to the rightmost side1Maximum reaches crest location.U when nodding action moves to top side2Most
It is small, reach wave trough position;U when nodding action moves to lower side2Maximum reaches crest location.U when eyes open degree maximum3
Crest location is reached, U when eyes closed3Reach wave trough position.U when mouth opening degree maximum4Crest location is reached, mouth closes
U when conjunction4Reach wave trough position.
Error is calculated in order to reduce data, the embodiment of the present invention one is first filtered statistic, filter window 3
(can certainly be other empirical values, such as 4,5 etc.), (can certainly use other filtering sides using mean filter method
Method), filtered statistic is
N is the quantity of statistic in spatial cache;
In view of the continuity of human face action, for simplicity, the maximum value of the embodiment of the present invention one and minimum value are come generation
For wave crest and trough, it is denoted as respectivelyWith
In view of the eyes under the quasi- front normal attitude of different user open degree, mouth is closed degree and face are opposite
Position is all variant, and the embodiment of the present invention one will instruct the mean value of relevant all statistics to each random action, as with
The a reference value at family
Calculate the relative variation Δ U that relevant all statistics are instructed to each random actionkFor
The physical significance of the relative variation is that the movement of face is more significant, thenWithDifference it is bigger, relatively
Variation delta UkAlso bigger.It therefore, can be by relative variation Δ UkFoundation as movement judgement.
S1046, when instructed to random action relevant all statistics relative variation be greater than or equal to it is described with
When the preset threshold of the corresponding movement of machine action command, determine that the human face action of user and random action instruction are consistent, and determine
Current face is living body, and it is inconsistent otherwise to determine that the human face action of user is instructed with random action.
The embodiment of the present invention one is judged using simple threshold value mode decision scheme.If TkFor the default of kth class movement
Threshold value, (k=1 expression is shaken the head, and k=2 expression is nodded, and k=3 indicates blink, and k=4 expression is opened one's mouth), as relative variation Δ UkGreatly
In or equal to preset threshold TkWhen, determine that the human face action of user and random action instruction are consistent, is otherwise determined as inconsistent.Tk
Using empirical value, in the embodiment of the present invention one, T1=T2=0.6, T3=0.3, T4=0.9.Since the present invention is according to phase
Judge variable quantity whether the human face action of user is consistent with random action instruction, avoids complicated angle and matrix fortune
It calculates, improves operation efficiency.Since the embodiment of the present invention one only needs to calculate the mean value, maximum value and minimum value of each statistic,
Movement judgement is realized using simple threshold value mode decision scheme, for the machine learning methods such as classifier, efficiency is obtained greatly
Width is promoted, and does not need training data, and resources occupation rate is also very small.