Detailed Description
In order to make the objects, technical solutions and advantageous effects of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In order to illustrate the technical scheme of the invention, the following description is made by specific examples.
Embodiment one:
referring to fig. 1, a face living body detection method provided by an embodiment of the invention includes the following steps: it should be noted that, if the results are substantially the same, the face living body detection method of the present invention is not limited to the flow sequence shown in fig. 1.
S101, detecting faces in the images.
In the first embodiment of the present invention, S101 may specifically include the following steps:
s1011, acquiring an image;
for example, acquiring a frame of image acquired by a camera; the camera can be self-contained or external camera connected with the face recognition device; the face recognition device can be a mobile terminal (such as a mobile phone, a tablet computer and the like) or a desktop computer and the like;
s1012, detecting a human face in the image;
in the first embodiment of the invention, the VJ algorithm (please refer to Rapid object detection using a boosted cascade of simple features (P.Viola, M.Jones, proceedings of the 2001IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2001)) can be adopted to perform face detection, the algorithm combines Haar features and Adaboost classifiers to perform face detection, and an integral diagram is adopted to accelerate feature extraction, and the strong classifiers constructed by Adaboost are cascaded, so that the detection speed can be greatly accelerated while the face detection performance is improved, and the method has the advantages of high operation efficiency and low resource occupancy rate, and is suitable for performing real-time face detection on embedded terminals such as mobile phones and tablet computers.
S1013, judging whether the size of the detected face is matched with a preset window, and if so, executing S102.
In order to improve the accuracy of the human face living body detection, a preset window is arranged, and a user is required to place the human face in the preset window to perform relevant instruction actions. The preset window may be circular, oval, rectangular, etc.
The vertex sitting at the upper left corner of the external rectangular frame of the preset window is marked as (X1, Y1), and the vertex sitting at the lower right corner is marked as (X2, Y2). In the first embodiment of the present invention, x1=69, y2=169, x2=254, and y2=408, but other values may be set. Assuming that the vertex sitting at the upper left corner of the rectangular frame of the face of the detected face region is marked (x 1, y 1), the vertex sitting at the lower right corner is marked (x 2, y 2). Then, the overlapping area a of the detected face area and the preset window may be expressed as:
a= (min (X2, X2) -max (X1, X1) +1) (min (Y2, Y2) -max (Y1, Y1) +1), wherein max and min represent maximum and minimum operations, respectively.
The coincidence degree I of the detected face region and the preset window can be expressed as:
if I < preset value, the detected face size is not matched with the preset window, returning to S1011, otherwise, executing S102. The preset value may be set to 0.4, but may be other empirical values.
S102, generating a random action instruction related to the human face.
In the first embodiment of the present invention, the random action command related to the face may include one or any combination of head shaking, nodding, blinking, and mouth opening.
S103, acquiring an image sequence.
The image sequence includes images corresponding to actions performed by the user in coordination with the random action instruction, and the length of the image sequence acquired by the first embodiment of the invention is 100 frames considering the waiting and finishing time of the actions, which can be other experience values.
S104, detecting the face action of the user according to the acquired image sequence, calculating statistics related to each random action instruction according to the positions of the key points of the face, respectively calculating the relative variation of all statistics related to each random action instruction, judging whether the face action of the user is consistent with the random action instruction according to the relative variation, and if so, judging that the current face is a living body.
In the first embodiment of the present invention, S104 may specifically include the following steps:
s1041, reading a frame of image in the acquired image sequence, then executing S1042, and if all the images in the image sequence are read, directly executing S1045.
S1042, locating key points of the face.
Currently, locating the key points of the face generally requires locating 68 key points, however, most key points have no meaning for the living body detection of the face, and the locating of the redundant key points increases the resource occupancy rate and reduces the operation efficiency. There are also only 12 key points to be located, but these key points are not sufficiently expressed for the eyes and mouth regions, for example, only one key point is located on the upper eyelid, the lower eyelid and the upper lip and the lower lip, and these parts have no special locating feature, so that the error of judging the states of the eyes and the mouth is easily caused by the locating error. In this way, the accuracy of the face biopsy is liable to be lowered.
The embodiment of the invention aims at four actions of head shaking, nodding, blinking and mouth opening to position 19 face key points, wherein 6 key points are respectively left eye, right eye and mouth, and 1 key point is at the tip of the nose. The method comprises the following steps: the upper eyelid and the lower eyelid of the left eye are respectively provided with 2 key points, and the two side corners of the left eye are respectively provided with 1 key point; the upper eyelid and the lower eyelid of the right eye are respectively provided with 2 key points, and the two side corners of the right eye are respectively provided with 1 key point; the upper lip and the lower lip are respectively provided with 2 key points, and the corners of the two sides of the lips are respectively provided with 1 key point. The eye and mouth have more key points, mainly because the blink and mouth opening movement amplitude is relatively smaller, and the requirement on the positioning precision of the key points of the eye and the mouth is higher. The upper eyelid, the lower eyelid and the upper lip and the lower lip are provided with 2 key points, so that not only can the error judgment of the state of eyes or mouths caused by the positioning error of a single key point be avoided, but also the error judgment of the state of eyes or mouths caused by the shape difference of eyes and mouths of different users can be distinguished. The relative position change of the nose tip, eyes and mouth can be used for judging the head shaking and nodding actions. 25 face key points, namely 8 key points of the left eye, the right eye and the mouth, and 1 key point of the nose tip can also be positioned. The method comprises the following steps: the upper eyelid and the lower eyelid of the left eye are respectively provided with 3 key points, and the two side corners of the left eye are respectively provided with 1 key point; the upper eyelid and the lower eyelid of the right eye are respectively provided with 3 key points, and the two side corners of the right eye are respectively provided with 1 key point; the upper lip and the lower lip are respectively provided with 3 key points, and the corners of the two sides of the lips are respectively provided with 1 key point. Of course, 18 or 24 facial keypoints, i.e., keypoints on the tip of the nose, may also be located. Other numbers of face keypoints are also possible, as long as at least 6 keypoints are guaranteed for the left eye, right eye and mouth, respectively.
The positioning method of the key points of the face adopts an active shape model method. In the embodiment of the invention, the key point positions of 500 face images are marked manually and used as a training data set.
Because the face actions are mainly performed in a preset window, in order to reduce the operation amount, the positioning face key points are specifically as follows: and positioning the key points of the human face in a preset window.
S1043, calculating statistics related to the random action instruction generated at present according to the positions of the key points of the human face.
The invention only calculates the statistic related to each generated random action instruction, does not need to calculate the statistic related to other random action instructions, and does not need to describe different statistic by means of a state machine and the like. Thus, the operation amount can be greatly reduced, and the operation efficiency can be improved. For example, in S102, the random motion instruction currently generated is a shaking head, and in S1043, only statistics related to the shaking head are calculated.
For the head shaking and nodding actions, a common method is to calculate three euler angles (pitch, yaw, roll) related to face pose information, and the calculation of the three values involves complex angle calculation and matrix operation, so that the calculation complexity is high. Because the invention only needs to judge whether the face action of the user is consistent with the random action instruction, the invention does not need to judge the specific action of the current face. Therefore, the invention calculates statistics only according to the facial position when the head is swung or nodded, and judges whether the facial motion of the user is consistent with the random motion instruction according to the relative variation.
Specifically, when the currently generated random motion instruction is head shaking, calculating statistics U related to head shaking according to the positions of key points of the human face 1 The method comprises the following steps:
where x1 is the x-axis coordinate value of the right corner of the right eye, x10 is the x-axis coordinate value of the left corner of the left eye, and x13 is the x-axis coordinate value of the tip of the nose. When the face gesture is a quasi-frontal, U 1 Near 1, U when shaking head 1 Away from 1.
When the currently generated random action instruction is nodding, calculating statistics U related to nodding according to the positions of the key points of the human face 2 The method comprises the following steps:
where y13 is the y-axis coordinate of the tip of the nose, y4 is the y-axis coordinate of the left corner of the right eye, y7 is the y-axis coordinate of the right corner of the left eye, y14 is the y-axis coordinate of the right corner of the mouth of the lips, and y17 is the y-axis coordinate of the left corner of the lips. When the face gesture is a quasi-frontal, U 2 Near 1, U when nodding 2 Away from 1.
When the currently generated random action instruction is blink, calculating statistics U related to blink according to the positions of key points of human faces 3 The method comprises the following steps:
wherein (x 2, y 2) and (x 3, y 3) are the coordinate values of 2 key points of the upper eyelid of the right eye, respectively, (x 5, y 5) and (x 6, y 6) are the coordinate values of 2 key points of the lower eyelid of the right eye, respectively, (x 4, y 4) are the coordinate values of the left corner of the right eye, (x 1, y 1) are the coordinate values of the right corner of the right eye, (x 8, y 8) and (x 9, y 9) are the coordinate values of 2 key points of the upper eyelid of the left eye, respectively, (x 11, y 11) and (x 12, y 12) are the coordinate values of 2 key points of the lower eyelid of the left eye, respectively, (x 10, y 10) are the coordinate values of the left corner of the left eye, (x 7, y 7) are the coordinate values of the right corner of the left eye, U 3 The larger the value, the greater the opening degree of the eyes; conversely, the smaller the opening degree of both eyes is.
When the current random action instruction is mouth opening, the user closes according to the human faceCalculation of statistics U relating to mouth opening by key point position 4 The method comprises the following steps:
wherein (x 15, y 15) and (x 16, y 16) are the coordinate values of 2 key points of the upper lip, respectively, (x 18, y 18) and (x 19, y 19) are the coordinate values of 2 key points of the lower lip, respectively, (x 14, y 14) are the coordinate values of the right mouth corner of the lip, and (x 17, y 17) are the coordinate values of the left mouth corner of the lip. U (U) 4 The larger the value, the larger the opening degree of the mouth is; conversely, the smaller the opening degree of the mouth is.
Therefore, in the first embodiment of the invention, the calculation complexity of calculating the statistics related to the random action instruction generated at present is low, the calculation amount is small, and the quick solution can be realized on the embedded terminal such as the mobile phone.
S1044, caching the statistics related to the currently generated random action instruction.
S1045, after the image sequence is read, reading all statistics related to the random action instructions of the cache, and respectively calculating the relative variation of all statistics related to each random action instruction.
Assuming that the random motion command is k (k=1 represents a shaking head, k=2 represents a nodding, k=3 represents a blinking, and k=4 represents a mouth opening), the corresponding statistic is U k The number of statistics in the buffer space is N (N.ltoreq.100 because the length of the image sequence acquired in S103 is 100 frames in the present invention, and S1042 may have a critical point positioning failure). All statistics aiming at each random action instruction can be connected into a curve, and the change condition of statistics in the face action process is recorded. Wherein, the peaks and troughs of the curve reflect the extreme states of the face motion, such as: u when the oscillating motion moves to the leftmost side 1 Minimum, reaching the trough position; u when the oscillating motion moves to the rightmost side 1 Maximum, peak position is reached. U when nodding motion moves to the uppermost side 2 Minimum, reaching the trough position; u when nodding motion moves to the lowest position 2 Maximum, peak position is reached. U when the eye opening degree is maximum 3 Reaching the peak position and U when the eye is closed 3 Reaching the trough position. U when mouth opening degree is maximum 4 Reaching the peak position and U when the mouth is closed 4 Reaching the trough position.
In order to reduce the data calculation error, the embodiment of the invention firstly carries out filtering processing on the statistic, the filtering window is 3 (of course, other empirical values can be adopted, such as 4, 5, etc.), the average filtering method (of course, other filtering methods can be adopted), and the filtered statistic is
N is the number of statistics in the cache space;
considering the continuity of the face motion, the maximum value and the minimum value are used to replace the wave crest and the wave trough for the sake of simplicity, and are respectively recorded asAnd->
Considering that the opening degree of eyes, the closing degree of mouth and the relative positions of five sense organs are different under the normal postures of the quasi-front face of different users, the average value of all statistics related to each random action instruction is taken as the reference value of the users
Calculating the relative variation DeltaU of all statistics related to each random action instruction k Is that
The physical meaning of the relative change quantity is that the more remarkable the action of the human face is, thenAnd->The larger the difference of (a) is, the relative change amount DeltaU k The larger. Thus, the relative variation Δu can be made k As the basis for action decision.
S1046, when the relative variation of all statistics related to the random action instruction is greater than or equal to a preset threshold value of the action corresponding to the random action instruction, judging that the face action of the user is consistent with the random action instruction, and judging that the current face is a living body, otherwise, judging that the face action of the user is inconsistent with the random action instruction.
The first embodiment of the invention adopts a simple threshold judgment strategy to judge. Let T be k For a preset threshold of actions of the kth class, (k=1 for shaking, k=2 for nodding, k=3 for blinking, and k=4 for opening) when the relative change amount Δu is k Greater than or equal to a preset threshold T k And if not, judging that the face action of the user is inconsistent with the random action instruction. T (T) k An empirical value is used, in one embodiment of the invention T 1 =T 2 =0.6,T 3 =0.3,T 4 =0.9. The invention judges whether the face action of the user is consistent with the random action instruction according to the relative variation, thereby avoiding complex angle and matrix operation and improving the operation efficiency. Due to the inventionIn the first embodiment, only the mean value, the maximum value and the minimum value of each statistic are calculated, a simple threshold judgment strategy is adopted to realize action judgment, compared with a machine learning method such as a classifier, the efficiency is greatly improved, training data is not needed, and the resource occupancy rate is very small.
Embodiment two:
the second embodiment of the present invention provides a computer readable storage medium, where a computer program is stored, where the computer program implements the steps of the face living body detection method according to the first embodiment of the present invention when the computer program is executed by a processor.
Embodiment III:
fig. 2 shows a specific block diagram of a face recognition device according to a third embodiment of the present invention, and a face recognition device 100 includes: one or more processors 101, a memory 102, and one or more computer programs, wherein the processors 101 and the memory 102 are connected through a bus, the one or more computer programs are stored in the memory 102 and configured to be executed by the one or more processors 101, and the steps of the face living body detection method provided as the first embodiment of the present invention are implemented when the processor 101 executes the computer programs.
In the third embodiment of the present invention, the face recognition device may be a mobile terminal (e.g., a mobile phone, a tablet computer, etc.) or a desktop computer, etc.
In the invention, as the statistics related to each random action instruction are calculated according to the positions of the key points of the human face, the relative variation of all the statistics related to each random action instruction is calculated respectively, whether the human face action of the user is consistent with the random action instruction is judged according to the relative variation, and if so, the current human face is judged to be a living body. Therefore, the method for detecting the human face living body does not need to describe different statistics by means of a state machine and the like, can greatly reduce the operation amount, reduce the complexity of an algorithm, improve the operation efficiency of the algorithm, and can realize high-efficiency human face living body detection on embedded terminals with fewer resources such as mobile phones, flat plates and the like.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.