WO2023071190A1

WO2023071190A1 - Liveness detection method and apparatus, computer device, and storage medium

Info

Publication number: WO2023071190A1
Application number: PCT/CN2022/096444
Authority: WO
Inventors: 胡宇轩; 于志鹏; 石华峰; 吴一超; 梁鼎
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-10-28
Filing date: 2022-05-31
Publication date: 2023-05-04
Also published as: CN113971841A

Abstract

A liveness detection method and apparatus, a computer device, and a storage medium. The method comprises: in response to a liveness detection request, obtaining a plurality of frames of images to be detected corresponding to a target action, the target action being an action instructed to be performed by a user when liveness detection is performed (S101); on the basis of feature point position information matching the target action in each frame of image to be detected, determining a detection value respectively corresponding to each frame of image to be detected and used for representing a completion condition of the target action (S102); and detecting the detection value on the basis of a detection scheme matching the target action to obtain a liveness detection result (S103).

Description

A living body detection method, device, computer equipment and storage medium

This disclosure is required to be submitted to the China Patent Office on October 28, 2021. The application number is 202111265552.1, and the title of the invention is "a living body detection method, device, computer equipment, and storage medium", the entire contents of which are incorporated by reference in this disclosure.

technical field

The present disclosure relates to the technical field of face recognition, and in particular to a living body detection method, device, computer equipment and storage medium.

Background technique

At present, live body detection technology is widely used in various smart devices to detect whether the "user" who is currently performing face recognition is a real user.

In related technologies, when performing liveness detection, it is necessary to obtain a user's face image in real time through an image acquisition device, and then detect whether there is a face image that meets preset conditions in the face image acquired in real time, such as a face image with an open mouth , if yes, it can be determined that the liveness detection is passed. However, in the process of performing liveness detection, illegal registrants can deceive the image acquisition device by forging face images, making the security of identity verification based on face recognition low.

Contents of the invention

Embodiments of the present disclosure at least provide a living body detection method, device, computer equipment, and storage medium.

In a first aspect, an embodiment of the present disclosure provides a living body detection method, including:

Responding to the liveness detection request, acquiring multiple frames of images to be detected corresponding to the target action, wherein the target action is an action instructed by the user during liveness detection;

Based on the feature point position information matching the target action in each frame of the image to be detected, determine detection values corresponding to each frame of the image to be detected for indicating the completion of the target action;

Based on the detection scheme matched with the target action and the detection value, a living body detection result is obtained.

In this way, multiple frames of images to be detected corresponding to the target action are obtained by responding to the living body detection request, wherein the target action is an action instructed by the user when performing live body detection; The feature point position information matched by the target action determines the detection values corresponding to each frame of the image to be detected to indicate the completion of the target action; in this way, by determining the detection value corresponding to the target action, each frame to be detected can be The completion of the target action in the detection image is quantified, so as to facilitate subsequent detection and quantitative analysis of the detection value; the detection value is detected based on the detection scheme that matches the target action, and the living body detection result is obtained. In this way, by Detection of the detection values of multiple frames of images to be detected can reduce the influence of a single frame of images on the detection results, making the accuracy of living body detection higher.

In a possible implementation manner, the obtaining a living body detection result based on the detection scheme matched with the target action and the detection value includes:

Based on the detection threshold corresponding to the target action and the detection scheme matching the target action, the detection values of the multiple frames of images to be detected are detected to obtain a living body detection result.

In this way, by setting the corresponding detection threshold for the target action, the set detection threshold can be made more in line with the actual situation of the target action; by integrating the detection values of multiple frames of images to be detected through the detection threshold, it is possible to make the determination of the living body detection result. The accuracy rate is higher.

In a possible implementation manner, when the target action is nodding or shaking the head, the detection value includes a head deviation angle; the detection threshold includes a positive deviation threshold, a negative deviation threshold, an image frame number threshold;

Based on the detection threshold corresponding to the target action and the detection scheme matching the target action, the detection values of the multiple frames of images to be detected are detected to obtain a living body detection result, including:

determining a first object detection image with a head offset angle greater than the positive offset threshold, and a second object detection image with a head offset angle smaller than the negative offset threshold;

If the number of the first target detection images exceeds the first image frame number threshold and the second target detection image number exceeds the second image frame number threshold, it is determined that the living body detection is passed.

In this way, by setting the three frame thresholds of positive offset threshold, negative offset threshold, and image frame number threshold, it can effectively detect whether the user has completed the target action of nodding or shaking the head, thereby effectively improving the liveness detection time. the accuracy rate.

In a possible implementation manner, when the target action is opening and closing the mouth, the feature points matching the target action include mouth feature points;

The determining the detection values corresponding to each frame of the image to be detected and used to indicate the completion of the target action based on the feature point position information matching the target action in each frame of the image to be detected includes:

Based on the mouth feature point position information, determine a first mouth distance representing the opening amplitude at the central position of the mouth and a second mouth distance representing the opening amplitude at the corner position of the mouth;

The detection value is determined based on the first mouth distance and the second mouth distance.

In this way, by performing detection and calculation on the feature points of the mouth, the determined detection value can better represent the state of the mouth, and the detection process and calculation process also save computing resources.

In a possible implementation manner, the detection threshold includes a mouth opening threshold, a mouth closing threshold, and a mouth opening frame number threshold;

Determine the multi-frame first image to be detected whose detection value is the shut-up threshold;

Determining a second image to be detected between every two adjacent frames of the first image to be detected among the multiple frames of the first image to be detected;

If it is detected that the second image to be detected satisfies the first preset condition, it is determined that the living body detection is passed.

In this way, by setting the three frame thresholds of mouth opening threshold, mouth closing threshold, and mouth opening frame number threshold, it can effectively detect whether the user has completed the target action of opening and closing the mouth, thereby effectively improving the accuracy of liveness detection.

In a possible implementation manner, the first preset condition includes:

Among the second images to be detected, the number of the third images to be detected whose corresponding detection value is the mouth opening threshold is a first preset value;

In the plurality of frames of the third images to be detected, the number of the second images to be detected between two adjacent frames of the third images to be detected is greater than the threshold of the number of mouth opening frames.

In this way, by setting a variety of first preset conditions, the real situation when the mouth is opened and closed can be better simulated, thereby effectively improving the accuracy rate of the living body detection.

In a possible implementation manner, the first preset condition further includes:

Among the multiple frames of the second images to be detected between two adjacent frames of the third images to be detected, the difference between the detected values of any two frames of the second images to be detected is smaller than a second preset value.

In a possible implementation manner, the determining the second image to be detected between every two adjacent first images to be detected in the multiple frames of the first image to be detected includes:

Determining a second image to be detected between two frames of adjacent first images to be detected that meet a second preset condition among the multiple frames of the first image to be detected;

Wherein, the second preset condition includes:

The number of images to be detected between adjacent first images to be detected is greater than the mouth opening frame number threshold; the maximum detection value of the images to be detected between adjacent first images to be detected meets the threshold corresponding to the mouth opening filter criteria.

In this way, by setting the second preset condition, one pass of screening can be performed before the final judgment, so that the speed of subsequent more accurate judgment is faster and computing resources are saved.

In a possible implementation manner, when the target action is opening and closing eyes, the determination of each frame of the image to be detected is based on the position information of the feature point in each frame of the image to be detected that matches the target action. Corresponding detection values used to represent the completion of the target action include:

For each frame of the image to be detected in the multiple frames of the image to be detected, based on the feature point position information of the image to be detected, the image to be detected is corrected;

The rectified image to be detected is input to a pre-trained neural network to determine a detection value corresponding to the image to be detected.

In this way, by performing correction processing on the image to be detected, the accuracy of the obtained detection value can be made higher, so that the accuracy rate when performing living body detection can be effectively improved.

In a possible implementation manner, the detection value includes a first detection value used to describe the situation of eye occlusion, and a second detection value used to describe the completion of eye opening and closing;

The detection threshold includes an eye opening threshold, an eye closing threshold, an eye opening frame number threshold, and an eye occlusion threshold;

determining a fourth image to be detected that satisfies a third preset condition based on the eye-opening threshold, the eye-closing threshold, and the second detection values of the multiple frames of images to be detected;

determining the target quantity of the fourth image to be detected whose corresponding first detection value is less than the eye occlusion threshold;

When the number of targets exceeds the eye-opening frame number threshold, it is determined that the living body detection is passed.

In this way, by setting the four frame thresholds of eye-opening threshold, eye-closing threshold, eye-opening frame number threshold, and eye occlusion threshold, it can effectively detect whether the user has completed the target action of opening and closing eyes, thereby effectively improving the performance of the living body. The accuracy of detection.

In the second aspect, the embodiment of the present disclosure also provides a living body detection device, including:

An acquisition module, configured to respond to a liveness detection request, and acquire multiple frames of images to be detected corresponding to a target action, wherein the target action is an action instructed by a user during liveness detection;

A determining module, configured to determine detection values corresponding to each frame of the image to be detected and used to indicate the completion of the target action based on the feature point position information matching the target action in each frame of the image to be detected;

The detection module is configured to obtain a living body detection result based on the detection scheme matched with the target action and the detection value.

In a possible implementation manner, the detection module is configured to detect detection values of the multiple frames of images to be detected based on a detection threshold corresponding to the target action and a detection scheme matching the target action, to obtain Liveness test results.

When the detection module detects the detection values of the multiple frames of images to be detected based on the detection threshold corresponding to the target action and the detection scheme matching the target action, and obtains a living body detection result, it is used for:

When the determination module determines the detection values corresponding to each frame of the image to be detected and used to indicate the completion of the target action based on the feature point position information matching the target action in each frame of the image to be detected, Used for:

In a possible implementation manner, the first preset condition includes:

In a possible implementation manner, the detection module detects the detection values of the multiple frames of images to be detected based on the detection threshold corresponding to the target action and the detection scheme matching the target action, to obtain the living body When testing results, use to:

Wherein, the second preset condition includes:

In a possible implementation manner, when the target action is opening and closing eyes, the determining module determines each When the detection values corresponding to the frames to be detected are respectively used to represent the completion of the target action, it is used for:

In a third aspect, an embodiment of the present disclosure further provides a computer device, including: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the computer device is running, the processing The processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the above-mentioned first aspect, or the steps in any possible implementation manner of the first aspect are executed.

In a fourth aspect, embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned first aspect, or any of the first aspects of the first aspect, may be executed. Steps in one possible implementation.

For the effect description of the above-mentioned living body detection device, computer equipment and storage medium, please refer to the description of the above-mentioned living body detection method, which will not be repeated here.

In order to make the above-mentioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments will be described in detail below together with the accompanying drawings.

Description of drawings

In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the accompanying drawings used in the embodiments. The accompanying drawings here are incorporated into the specification and constitute a part of the specification. The drawings show the embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those skilled in the art, they can also make From these drawings other related drawings are obtained.

FIG. 1 shows a flow chart of a living body detection method provided by an embodiment of the present disclosure;

Fig. 2 shows a schematic diagram of determining the first mouth distance and the second mouth distance in the living body detection method provided by the embodiment of the present disclosure;

FIG. 3 shows a flow chart of a specific method for determining a living body detection result in the living body detection method provided by an embodiment of the present disclosure;

FIG. 4 shows a flow chart of another specific method for determining a living body detection result in the living body detection method provided by an embodiment of the present disclosure;

Fig. 5 shows a schematic diagram of determining the first image to be detected in the living body detection method provided by the embodiment of the present disclosure;

Fig. 6 shows a flow chart of another specific method for determining a living body detection result in the living body detection method provided by an embodiment of the present disclosure;

FIG. 7 shows a schematic structural diagram of a living body detection device provided by an embodiment of the present disclosure;

FIG. 8 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only It is a part of the embodiments of the present disclosure, but not all of them. The components of the disclosed embodiments generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative efforts shall fall within the protection scope of the present disclosure.

It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

The term "and/or" in this article only describes an association relationship, which means that there can be three kinds of relationships, for example, A and/or B can mean: there is A alone, A and B exist at the same time, and B exists alone. situation. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, which may mean including from A, Any one or more elements selected from the set formed by B and C.

After research, it is found that when performing liveness detection, it is necessary to obtain the user's face image in real time through the image acquisition device, and then detect whether there is a face image that meets the preset conditions in the face image acquired in real time, such as a face image with an open mouth , if yes, it can be determined that the liveness detection is passed. However, in the process of performing liveness detection, illegal registrants can deceive the image acquisition device by forging face images, making the security of identity verification based on face recognition low.

Based on the above research, the present disclosure provides a living body detection method, device, computer equipment, and storage medium, which respond to a live body detection request and acquire multiple frames of images to be detected corresponding to target actions, wherein the target action is when performing live body detection Indicating the action made by the user; based on the feature point position information matching the target action in each frame of the image to be detected, determining detection values corresponding to each frame of the image to be detected for indicating the completion of the target action; In this way, by determining the detection value corresponding to the target action, the completion of the target action in each frame of the image to be detected can be quantified, thereby facilitating subsequent detection and quantitative analysis of the detection value; The detection scheme detects the detection value to obtain the detection result of the living body. In this way, by detecting the detection value of multiple frames of images to be detected, the influence of a single frame image on the detection result can be reduced, so that the accuracy of the living body detection is higher.

In order to facilitate the understanding of this embodiment, a living body detection method disclosed in the embodiment of the present disclosure is first introduced in detail. The execution subject of the living body detection method provided in the embodiment of the present disclosure is generally a computer device with a certain computing power. The computer The equipment includes, for example: terminal equipment or server or other processing equipment, and the terminal equipment can be user equipment (User Equipment, UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant, PDA) , handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc. In some possible implementation manners, the living body detection method may be implemented by a processor invoking computer-readable instructions stored in a memory.

Referring to FIG. 1 , which is a flowchart of a living body detection method provided by an embodiment of the present disclosure, the method includes steps S101 to S103, wherein:

S101: Responding to a live body detection request, acquire multiple frames of images to be detected corresponding to a target action, wherein the target action is an action instructed to a user when performing live body detection.

S102: Based on the feature point position information matching the target action in each frame of the image to be detected, determine detection values corresponding to each frame of the image to be detected and used to indicate the completion of the target action.

S103: Obtain a living body detection result based on the detection scheme matched with the target action and the detection value.

The following is a detailed description of the above steps.

For S101, the target action can be nodding, shaking the head, opening mouth, closing mouth, opening eyes, closing eyes, etc., wherein the target action used in liveness detection can be preset, for example, the completion of a liveness detection requires Only by completing the two actions of nodding and shaking the head in sequence can it be determined that the liveness detection has passed; or, the target action can also be selected by the user, for example, the user selects the target action to be performed as opening and closing eyes; or, it can It is determined according to the current user's face recognition results. For example, if it is detected that the user is wearing a mask (that is, the mouth is covered and the feature points of the mouth cannot be recognized), the target action can be determined as opening and closing eyes. If you are wearing sunglasses (that is, the eyes are blocked and the feature points of the eyes cannot be recognized), then the target action can be determined as opening and closing the mouth.

Specifically, when obtaining multiple frames of images to be detected corresponding to the target action, after responding to the live body detection request, the image acquisition device of the terminal device may be controlled to collect the video to be detected corresponding to the target action. The multiple frames of images to be detected can be obtained by sampling the video to be detected.

Here, the position information of the feature points can be determined by detecting the image to be detected through a face feature point detection algorithm. Wherein, according to different target actions, the feature points matched with the target action may also be different, and the determined detection values may also be different.

Specifically, according to different target actions, it can be divided into the following situations:

Case 1. The target action is nodding or shaking the head.

In this case, the determined detection value used to indicate the completion of the target action may be the angle of nodding (or shaking the head), correspondingly, at this time, the feature point matching the target action is able to characterize nodding (or shaking the head) As for the feature points of the angle, the feature points used may also be different according to the detection algorithm used to determine the angle of nodding (or shaking the head).

Exemplarily, the feature points corresponding to shaking the head may be the feature points corresponding to the outer corner of the left eye, the outer corner of the right eye, and the tip of the nose respectively. The horizontal distance from the outer corner of the eye to the tip of the nose is similar to the horizontal distance from the outer corner of the right eye to the tip of the nose. The decreasing speed is slower than the decreasing speed of the horizontal distance from the outer corner of the right eye to the tip of the nose, so the ratio of the horizontal distance from the outer corner of the left eye to the tip of the nose and the horizontal distance from the outer corner of the right eye to the tip of the nose is gradually increasing, which can be further passed The ratio determines how far the user shakes his head to the right.

Case 2. The target action is to open and close the mouth.

Here, the opening and closing of the mouth means opening and closing the mouth, and when the user opens and closes the mouth first, it can be confirmed that the user has completed opening and closing the mouth.

In this case, the determined detection value used to indicate the completion of the target action may be the mouth state score indicating the mouth opening range. Correspondingly, the feature point matching the target action at this time is the mouth Feature points.

In a possible implementation manner, when determining the mouth state score, the first mouth distance representing the opening amplitude at the central position of the mouth and the opening distance representing the opening at the corner position of the mouth may be determined based on the position information of the mouth feature points. A second mouth distance of the opening width; based on the first mouth distance and the second mouth distance, the detection value is determined.

Exemplarily, a schematic diagram of determining the first mouth distance and the second mouth distance can be shown in Figure 2. In Figure 2, when determining the first mouth distance, it can first be determined that the The first mouth feature point at the center of the lips (as shown at point A in Figure 2), and the second mouth feature point at the center of the lower lip corresponding to the first mouth feature point (as shown in Figure 2 2), wherein the connection line between the first mouth feature point and the second mouth feature point is the same as the opening direction of the mouth; based on the first mouth feature point and The position information corresponding to the second mouth feature points determines the first mouth distance.

In addition, when determining the second mouth distance, at least one mouth corner, the third mouth feature point located on the upper lip (as shown at point C in Figure 2), and the fourth mouth feature point located on the lower lip can be determined first. Mouth feature point (as shown in D point among Fig. 2), wherein, the connecting line between the 3rd mouth feature point and the 4th mouth feature point is the same as the opening direction of the mouth; based on the The position information corresponding to the third mouth feature point and the fourth mouth feature point determines the second mouth distance.

It should be noted that, when determining the second mouth distance, in order to improve the accuracy of the determined second mouth distance, the second mouth distance corresponding to each corner of the mouth can be calculated respectively, and the two The average value of the second mouth distance corresponding to the corners of the mouth is used as the second mouth distance corresponding to the image to be detected; The second mouth distance of is used as the second mouth distance corresponding to the image to be detected, which is not limited in this embodiment of the present disclosure.

Specifically, when the user closes the mouth, there is not much difference between the first mouth distance and the second mouth distance at this time, and the ratio between the second mouth distance and the first mouth distance can be approximately 1; when When the user opens his mouth, the increase of the first mouth distance is larger than the increase of the second mouth distance. At this time, the ratio of the second mouth distance to the first mouth distance is less than 1, and as the The ratio of the mouth opening range decreases gradually, so the ratio of the second mouth distance to the first mouth distance can be used to represent the mouth state score of the mouth opening range.

It should be noted that, in the process of determining the first mouth distance and the second mouth distance, the two points used to determine the first mouth distance are compared with the two points used to determine the second mouth distance. There is a difference in the distance from the longitudinal midline of the lips, so that when the mouth is opened, there is an obvious difference between the first mouth distance and the second mouth distance. In practical applications, the points used for determining the distance to the first mouth and the distance to the second mouth may include but are not limited to the cases listed above.

Case 3. The target action is to open and close the eyes.

Here, the opening and closing of eyes means opening and closing of eyes, and when the user closes eyes first and then opens them again, it can be confirmed that the user has completed opening and closing eyes.

In this case, the determined detection value used to represent the completion of the target action may include an eye state score (second detection value) representing the degree of eye opening. The matched feature points may be eye feature points. At this time, the method for determining the eye state score may be similar to the method for determining the mouth state score in Case 2, and details are not repeated here.

In addition, the feature points matched with the target action can also be feature points that can represent the angle of deflection of the face, such as the feature points that represent the angle of rotation of the face in case 1, and will not be described again here.

In a possible implementation manner, when determining the eye state score, the eye image in the image to be detected may also be input into a pre-trained neural network to obtain the output of the neural network and the value to be detected. The corresponding eye state score for the image.

Wherein, when training the neural network, the sample image can be input into the neural network to be trained to obtain the sample prediction score output by the neural network; based on the sample prediction score, determine the opening and closing of the eyes in the sample object (eye opening or eyes closed); according to the determined opening and closing situation and the pre-marked label data representing the opening and closing situation of the eyes of the sample image, determine the loss value of this training, and adjust the network parameters of the neural network based on the loss value .

In practical applications, in order to improve the detection accuracy of the neural network, for each frame of the image to be detected in the multiple frames of the image to be detected, based on the feature point position information of the image to be detected, the image to be detected is corrected; and then The rectified image to be detected is input to a pre-trained neural network to determine a detection value corresponding to the image to be detected.

Specifically, since the area of the eye part in the face image is small, and the distance between the eye feature points is also small, it is greatly affected by the rotation of the face, in order to improve the final score of the eye state Accuracy, before inputting into the neural network, use the corresponding face feature points and face correction algorithm to correct the image to be detected, and input the eye image in the image to be detected after correction processing to In the neural network, the eye state score output by the neural network can be obtained.

Further, since the eyes are easily blocked by objects such as hair, when the target action is opening and closing the eyes, the detection value may also include an eye occlusion score (the first detection value ), which can be determined by the number of successful recognitions of the eye feature points and the number of eye standard recognitions. For example, if the number of eye standard recognitions is 10 and the number of successful recognitions is 8, it can be determined that the eyes that have not been successfully recognized There are 2 facial feature points, and the corresponding eye occlusion score is 0.2; or, it can also be obtained from the output of the neural network, such as inputting the eye image with the occluded eyes into the neural network, then The resulting output eye occlusion score is 0.1.

In a possible implementation manner, when the detection value is detected based on a detection scheme that matches the target action to obtain a living body detection result, it may be based on a detection threshold corresponding to the target action and a detection threshold that matches the target action. The detection scheme is used to detect the detection values of the multiple frames of images to be detected to obtain a living body detection result.

Here, the detection threshold is the threshold set for the target action, and the corresponding detection threshold is different according to the target action.

Specifically, when detecting the detection values of the multiple frames of images to be detected based on the detection threshold corresponding to the target action and the detection scheme matching the target action, according to different target actions, it can be divided into the following types: Cases:

Case 1. The target action is nodding or shaking the head.

In this case, the detection value includes a head offset angle; the detection threshold includes a positive offset threshold, a negative offset threshold, and an image frame number threshold.

In a possible implementation manner, as shown in FIG. 3, the living body detection result may be determined through the following steps:

S301: Determine a first target detection image whose head deviation angle is greater than the positive deviation threshold, and a second target detection image whose head deviation angle is smaller than the negative deviation threshold.

Exemplarily, the rightward offset of the head is taken as the positive offset, the positive offset threshold is 15°, and the offset angles corresponding to the images 1 to 5 to be detected are 12° and 16° to the right , 18°, 16°, and 12° as examples, the images to be detected 2, 3, and 4 may be determined as the first target detection images.

Carrying on from the above example, the rightward offset of the head is still regarded as a positive offset, the negative offset threshold is negative 15°, and the offset angles corresponding to images 6 to 10 to be detected are rightward offset negative 12° (A negative 12° offset to the right means a 12° left offset, the same below), negative 16°, negative 18°, negative 16°, and negative 12° are examples, and images 7, 8, and 9 to be detected can be determined An image is detected for the second object.

S302: When the number of the first object detection images exceeds the first image frame number threshold and the number of the second object detection images exceeds the second image frame number threshold, determine that the living body detection is passed.

In the embodiment of the present disclosure, the first image frame number threshold and the second image frame number threshold may be the same, and the first image frame number threshold and the second image frame number threshold may also be different.

For example, the first image frame number threshold is the same as the second image frame number threshold (such as 3), and at this time, when the number of the first target detection images and the second target detection images are detected to be greater than 3, that is It can be determined that the living body test is passed.

For another example, the first image frame number threshold is different from the second image frame number threshold. Specifically, the first image frame number threshold can be set to 3 for the number of the first target detection images, and the second image frame number threshold can be set for the second target detection images. The image frame number threshold is 4. At this time, in a case where the number of first object detection images exceeds three and the number of second object detection images exceeds four, it is determined that the living body detection is passed.

Case 2. The target action is to open and close the mouth.

In this case, the detection threshold includes a mouth opening threshold, a mouth closing threshold, and a mouth opening frame number threshold.

In a possible implementation, as shown in Figure 4, the living body detection result can also be determined through the following steps:

S401: Determine multiple frames of first images to be detected whose detection values are the mouth shutting threshold.

Here, taking the detected value as the mouth state score as an example, the obtained mouth state scores of multiple frames of images to be detected can be shown in Table 1 below:

Table 1

帧数number of frames	分数Fraction	帧数number of frames	分数Fraction	帧数number of frames	分数Fraction	帧数number of frames		分数Fraction
11	0.50.5	99	0.950.95	1717	0.50.5	2525	0.80.8
22	0.550.55	1010	0.980.98	1818	0.60.6	2626	0.960.96
33	0.610.61	1111	0.950.95	1919	0.70.7	the	the
44	0.680.68	1212	0.80.8	2020	0.80.8	the	the

55	0.750.75	1313	0.70.7	21twenty one	0.950.95	the	the
66	0.80.8	1414	0.60.6	22twenty two	0.980.98	the	the
77	0.90.9	1515	0.50.5	23twenty three	0.80.8	the	the
88	0.940.94	1616	0.40.4	24twenty four	0.70.7	the	the

In Table 1, columns 1, 3, 5, and 7 represent the number of frames of the image to be detected in the video to be detected, and columns 2, 4, 6, and 8 represent those corresponding to columns 1, 3, 5, and 7, respectively. Mouth status score for .

Exemplarily, taking the shut-up threshold as 0.8 as an example, the schematic diagram of determining the first image to be detected may be shown in FIG. 5. In FIG. Frame 6 (point O in Fig. 5), frame 12 (point A in Fig. 5), frame 20 (point C in Fig. 5), frame 23 (point D in Fig. 5), frame 25 (point D in Fig. point F in Figure 5).

Specifically, determining the multiple frames of the first image to be detected is to determine the intersection point corresponding to the broken line and the straight line Y=0.8 (closed mouth threshold) in Figure 5 (representing the corresponding relationship between the image to be detected and the mouth state score). For video frames, if the abscissa corresponding to the intersection point is located between two frames, the video frame closest to the intersection point may be used as the first image to be detected.

S402: Determine a second image to be detected between every two adjacent frames of the first image to be detected among the multiple frames of the first image to be detected.

In a possible implementation manner, when determining the second image to be detected, it may be determined that among the multiple frames of the first image to be detected, the number of adjacent frames of the first image to be detected that satisfies the second preset condition is The second image to be detected between;

Wherein, the second preset condition includes: the number of images to be detected between adjacent first images to be detected is greater than the mouth opening frame number threshold; the number of images to be detected between adjacent first images to be detected The maximum detection value satisfies the filter condition corresponding to the mouth opening threshold.

Here, the first images to be detected that are adjacent to each other in two frames indicate that they are sequentially adjacent in the determined multiple frames of the first images to be detected. Taking the determination of the first image to be detected from left to right as an example, the The first image to be detected O point determined by the first frame, and the second image to be detected A point determined by the second frame are the adjacent first image to be detected; the maximum value of the detection value satisfies the corresponding threshold of the mouth opening The filtering condition may be that the minimum value of the mouth state score is less than the mouth opening threshold.

Following the above example, there are 5 frames of images to be detected between O and A in Figure 5, 7 frames of images to be detected between A and C, 2 frames of images to be detected between C and D, and 1 frame between D and F Frames of images to be detected, if the mouth opening frame number threshold is 3 frames, it can be determined that C to D and D to F do not meet the second preset condition, if the mouth opening threshold is 0.6, due to the The mouth state score corresponding to the minimum value B of is 0.4, which is less than the mouth opening threshold of 0.6, then the 7 frames of images to be detected between A and C can be determined as the second images to be detected.

S403: When it is detected that the second image to be detected satisfies the first preset condition, determine that the living body detection is passed.

Wherein, the first preset condition may be:

Condition 1. Among the second images to be detected, the number of third images to be detected whose corresponding detection value is the mouth opening threshold is a first preset value.

Here, the process of judging the third image to be detected is similar to the process of judging the first image to be detected above, that is, to determine the line Y=0.6 (mouth opening threshold) in the second image to be detected in FIG. The abscissa of the intersection point, the first preset value can be 2, which means that when a normal person opens and closes his mouth, the number of the third image to be detected is 1 during the mouth opening process, and the number of the third image to be detected is 1 during the closing process. The number of detected images is also 1.

Continuing the above example, among A to C, A to B represent the mouth opening process, B to C represent the mouth closing process, and the mouth opening process and the mouth closing process respectively intersect with the straight line Y=0.6 once, then the above condition 1 is satisfied.

Condition 2: Among the multiple frames of the third images to be detected, the number of the second images to be detected between two adjacent frames of the third images to be detected is greater than the threshold of the number of frames of mouth opening.

Exemplarily, taking the mouth opening frame number threshold as 2 frames as an example, the second image to be detected between A to C between two adjacent frames of the third image to be detected is the 15th frame, the 16th frame, the 16th frame 17 frames, the number of which is 3 frames, which is greater than the mouth opening frame number threshold, then the condition 2 is met.

Condition 3: Among multiple frames of the second images to be detected between two adjacent frames of the third images to be detected, the difference between the detection values of any two frames of the second images to be detected is smaller than a second preset value.

Here, since the speed of the user opening and closing the mouth is limited, it is impossible to complete the actions of opening and closing the mouth in a very short time, which is reflected in the detection value of the two adjacent frames of the image to be detected in the image to be detected The difference between will not exceed a maximum value, and for this feature, this judgment condition can be set, so as to improve the security of liveness detection.

Following the above example, if the second preset value is 0.15, since the mouth state score difference between the 15th frame and the 16th frame is 0.1, the mouth state score difference between the 16th frame and the 17th frame The value is also 0.1, both of which are smaller than the second preset value 0.15, then the condition 3 is met.

It should be noted that, when judging whether the multiple frames of second images to be detected satisfy the first preset condition, any one or several of the above conditions may be passed. For example, condition 1 and condition 2 need to be satisfied at the same time, or condition 1, condition 2, and condition 3 must be satisfied at the same time. Different solutions may also be adopted in different application scenarios, which is not limited in this embodiment of the present disclosure.

Case 3. The target action is to open and close the eyes.

In this case, the detection threshold includes an eye opening threshold, an eye closing threshold, an eye opening frame number threshold, and an eye occlusion threshold.

In practical applications, since there are two eyes, there will be two corresponding eye state scores and two corresponding eye occlusion scores in the same image to be detected. Liveness detection is performed for the right eyes, or target eye state scores for liveness detection may also be determined from eye state scores corresponding to the left eye and the right eye.

Exemplarily, taking the eye state score similar to the mouth state score, the larger the eye opening range, the smaller the corresponding eye state score as an example, when determining the target eye state score, for For any frame of the image to be detected, it can be determined that among the eye state scores corresponding to the two eyes of the image to be detected, the eye state score with a larger score is the target eye state score corresponding to the frame image, that is, take The eye state score corresponding to the eye with the smaller eye opening range in the two eyes is used for liveness detection.

In a possible implementation manner, as shown in FIG. 6, the living body detection result may also be determined through the following steps:

S601: Based on the eye-opening threshold, the eye-closing threshold, and the second detection values of the multiple frames of images to be detected, determine a fourth image to be detected that satisfies a third preset condition.

Here, similar to the above-mentioned steps S401 and S402, it is possible to first determine the fifth frame of the fifth image to be detected whose second detection value is the eye-closing threshold, and then determine the fifth frame of the fifth image to be detected in the multiple frames of the fifth image to be detected, every two adjacent frames A fourth image to be detected that satisfies the third preset condition among the five images to be detected.

Wherein, the third preset condition may be at least one of the following conditions:

Condition ①. Among the fifth images to be detected, the number of fourth images to be detected whose corresponding detection value is the eye-opening threshold is a third preset value.

Condition ②: Among the multiple frames of fifth images to be detected, the number of fourth images to be detected between two adjacent frames of fifth images to be detected is greater than the threshold of open-eye frames.

Condition ③. In the fourth image to be detected between two adjacent frames of the fifth image to be detected, the difference between the second detection values of any two frames of the fourth image to be detected is smaller than the fourth preset value.

For the relevant descriptions of the above conditions ① to ③, refer to the detailed content of the conditions 1 to 3 in the above S403, which will not be repeated here.

S602: Determine a target number of fourth to-be-detected images whose corresponding first detection value is smaller than the eye occlusion threshold.

S603: When the number of targets exceeds the eye-opening frame number threshold, determine that the living body detection is passed.

Specifically, when determining the number of targets of the fourth image to be detected, it may be determined after determining the fourth detection value that satisfies the third preset value; or, it may also be determined after obtaining the eye occlusion After scoring the score, the images to be detected whose eye occlusion scores are less than the eye occlusion threshold are directly deleted, so that when subsequent judgments are made, it is not necessary to judge these deleted images to be detected.

In a possible implementation manner, if the living body detection is not completed within the preset time period, it may be determined that the corresponding living body detection result is failed.

Exemplarily, taking the preset duration as 10 seconds and the target action as opening and closing the mouth as an example, if the user fails to complete the liveness detection within 10 seconds, it may be that the user has not performed the target action (opening and closing the mouth), or the corresponding detection If the value does not meet the corresponding detection threshold, it can be determined that the current living body detection result fails.

Further, when it is determined that the result of the liveness detection fails, a prompt message may also be sent to the user, prompting the reasons for failing the liveness test, such as no human face detected, irregular movements, and the like.

The living body detection method provided by the embodiments of the present disclosure responds to the living body detection request and acquires multiple frames of images to be detected corresponding to the target action, wherein the target action is an action instructed by the user when performing live body detection; based on the frames The feature point position information in the image to be detected that matches the target action determines the detection values corresponding to each frame of the image to be detected to indicate the completion of the target action; in this way, by determining the detection value corresponding to the target action value, which can quantify the completion of the target action in each frame of the image to be detected, so as to facilitate the subsequent detection and quantitative analysis of the detection value; the detection value is detected based on the detection scheme that matches the target action, and the living body In this way, by detecting the detection values of multiple frames of images to be detected, the influence of a single frame image on the detection result can be reduced, so that the accuracy of living body detection is higher.

Those skilled in the art can understand that in the above method of specific implementation, the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possible The inner logic is OK.

Based on the same inventive concept, the embodiment of the present disclosure also provides a living body detection device corresponding to the living body detection method. Since the problem-solving principle of the device in the embodiment of the present disclosure is similar to the above-mentioned living body detection method in the embodiment of the present disclosure, the implementation of the device Reference can be made to the implementation of the method, and repeated descriptions will not be repeated.

Referring to FIG. 7 , it is a schematic structural diagram of a living body detection device provided by an embodiment of the present disclosure, the device includes: an acquisition module 701, a determination module 702, and a detection module 703; wherein,

The acquiring module 701 is configured to respond to a living body detection request, and acquire multiple frames of images to be detected corresponding to a target action, wherein the target action is an action instructed by a user during live body detection;

Determining module 702, for determining the detection value corresponding to each frame of the image to be detected for indicating the completion of the target action based on the feature point position information matched with the target action in each frame of the image to be detected;

The detection module 703 is configured to obtain a living body detection result based on the detection scheme matched with the target action and the detection value.

In a possible implementation manner, the detection module 703 is configured to:

The detection module 703, when detecting the detection values of the multiple frames of images to be detected based on the detection threshold corresponding to the target action and the detection scheme matching the target action, and obtaining a living body detection result, is used to:

The determining module 702, when determining the detection values corresponding to each frame of the image to be detected and used to indicate the completion of the target action based on the feature point position information matching the target action in each frame of the image to be detected , for:

In a possible implementation manner, the first preset condition includes:

Among the two images to be detected, the number of the third images to be detected whose corresponding detection value is the mouth opening threshold is the first preset value;

In a possible implementation manner, the detection module 703, when determining the second image to be detected between every two adjacent frames of the first image to be detected in the multiple frames of the first image to be detected, :

Wherein, the second preset condition includes:

In a possible implementation manner, when the target action is opening and closing eyes, the determination module 702 determines, based on the feature point position information matching the target action in each frame of the image to be detected When each frame of the image to be detected corresponds to the detection value used to indicate the completion of the target action, it is used for:

The living body detection device provided by the embodiments of the present disclosure responds to the living body detection request and acquires multiple frames of images to be detected corresponding to the target action, wherein the target action is an action instructed by the user when performing live body detection; based on the frames The feature point position information in the image to be detected that matches the target action determines the detection values corresponding to each frame of the image to be detected to indicate the completion of the target action; in this way, by determining the detection value corresponding to the target action value, which can quantify the completion of the target action in each frame of the image to be detected, so as to facilitate the subsequent detection and quantitative analysis of the detection value; the detection value is detected based on the detection scheme that matches the target action, and the living body In this way, by detecting the detection values of multiple frames of images to be detected, the influence of a single frame image on the detection result can be reduced, so that the accuracy of living body detection is higher.

For the description of the processing flow of each module in the device and the interaction flow between the modules, reference may be made to the relevant description in the above method embodiment, and details will not be described here.

Based on the same technical idea, the embodiment of the present disclosure also provides a computer device. Referring to FIG. 8 , it is a schematic structural diagram of a computer device 800 provided by an embodiment of the present disclosure, including a processor 801 , a memory 802 , and a bus 803 . Among them, the memory 802 is used to store execution instructions, including a memory 8021 and an external memory 8022; the memory 8021 here is also called an internal memory, and is used to temporarily store calculation data in the processor 801 and exchange data with an external memory 8022 such as a hard disk. The processor 801 exchanges data with the external memory 8022 through the memory 8021. When the computer device 800 is running, the processor 801 communicates with the memory 802 through the bus 803, so that the processor 801 executes the following instructions:

The detection value is detected based on a detection scheme matched with the target action to obtain a living body detection result.

Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored. When the computer program is run by a processor, the steps of the living body detection method described in the foregoing method embodiments are executed. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiment of the present disclosure also provides a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the living body detection method described in the above method embodiment, for details, please refer to the above method The embodiment will not be repeated here.

Wherein, the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.

Those skilled in the art can clearly understand that for the convenience and brevity of description, the specific working process of the above-described system and device can refer to the corresponding process in the foregoing method embodiments, which will not be repeated here. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

If the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Finally, it should be noted that: the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, rather than limit them, and the protection scope of the present disclosure is not limited thereto, although referring to the aforementioned The embodiments have described the present disclosure in detail, and those skilled in the art should understand that any person familiar with the technical field can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure Changes can be easily imagined, or equivalent replacements can be made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be included in this disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be defined by the protection scope of the claims.

Claims

A living body detection method, characterized in that, comprising:

Responding to the liveness detection request, acquiring multiple frames of images to be detected corresponding to the target action, wherein the target action is an action instructed by the user during liveness detection;

Based on the feature point position information matching the target action in each frame of the image to be detected, determine detection values corresponding to each frame of the image to be detected for indicating the completion of the target action;

Based on the detection scheme matched with the target action and the detection value, a living body detection result is obtained.
The method according to claim 1, characterized in that the obtaining the living body detection result based on the detection scheme matched with the target action and the detection value comprises:

Based on the detection threshold corresponding to the target action and the detection scheme matching the target action, the detection values of the multiple frames of images to be detected are detected to obtain a living body detection result.
The method according to claim 2, wherein when the target action is nodding or shaking the head, the detection value includes a head deviation angle; the detection threshold includes a positive deviation threshold, a negative deviation Offset threshold, image frame number threshold;

Based on the detection threshold corresponding to the target action and the detection scheme matching the target action, the detection values of the multiple frames of images to be detected are detected to obtain a living body detection result, including:

determining a first object detection image with a head offset angle greater than the positive offset threshold, and a second object detection image with a head offset angle smaller than the negative offset threshold;

If the number of the first target detection images exceeds the first image frame number threshold and the second target detection image number exceeds the second image frame number threshold, it is determined that the living body detection is passed.
The method according to any one of claims 1 to 3, wherein when the target action is opening and closing the mouth, the feature points matching the target action include mouth feature points;

The determining the detection values corresponding to each frame of the image to be detected and used to indicate the completion of the target action based on the feature point position information matching the target action in each frame of the image to be detected includes:

Based on the mouth feature point position information, determine a first mouth distance representing the opening amplitude at the central position of the mouth and a second mouth distance representing the opening amplitude at the corner position of the mouth;

The detection value is determined based on the first mouth distance and the second mouth distance.
The method according to claim 2 or 3, wherein the detection threshold comprises a mouth opening threshold, a mouth closing threshold, and a mouth opening frame number threshold;

Based on the detection threshold corresponding to the target action and the detection scheme matching the target action, the detection values of the multiple frames of images to be detected are detected to obtain a living body detection result, including:

Determine the multi-frame first image to be detected whose detection value is the shut-up threshold;

Determining a second image to be detected between every two adjacent frames of the first image to be detected among the multiple frames of the first image to be detected;

If it is detected that the second image to be detected satisfies the first preset condition, it is determined that the living body detection is passed.
The method according to claim 5, wherein the first preset condition comprises:

Among the second images to be detected, the number of the third images to be detected whose corresponding detection value is the mouth opening threshold is a first preset value;

In the plurality of frames of the third images to be detected, the number of the second images to be detected between two adjacent frames of the third images to be detected is greater than the threshold of the number of mouth opening frames.
The method according to claim 5 or 6, wherein the first preset condition further comprises:

Among the multiple frames of the second images to be detected between two adjacent frames of the third images to be detected, the difference between the detected values of any two frames of the second images to be detected is smaller than a second preset value.
The method according to any one of claims 5 to 7, wherein the determining of the second image to be detected between every two adjacent frames of the first image to be detected among the multiple frames of the first image to be detected ,include:

Determining a second image to be detected between two frames of adjacent first images to be detected that meet a second preset condition among the multiple frames of the first image to be detected;

Wherein, the second preset condition includes:

The number of images to be detected between adjacent first images to be detected is greater than the mouth opening frame number threshold; the maximum detection value of the images to be detected between adjacent first images to be detected meets the threshold corresponding to the mouth opening filter criteria.
The method according to any one of claims 1 to 8, wherein when the target action is opening and closing eyes, the feature points in the images to be detected based on the frames to be detected are matched with the target action. The position information is used to determine the detection values corresponding to the images to be detected in each frame to indicate the completion of the target action, including:

For each frame of the image to be detected in the multiple frames of the image to be detected, based on the feature point position information of the image to be detected, the image to be detected is corrected;

The rectified image to be detected is input to a pre-trained neural network to determine a detection value corresponding to the image to be detected.
The method according to any one of claims 2, 3, 5 to 8, wherein the detection values include a first detection value used to describe the eye occlusion situation, and a first detection value used to describe the completion of eye opening and closing. Two detection values;

The detection threshold includes an eye opening threshold, an eye closing threshold, an eye opening frame number threshold, and an eye occlusion threshold;

Based on the detection threshold corresponding to the target action and the detection scheme matching the target action, the detection values of the multiple frames of images to be detected are detected to obtain a living body detection result, including:

determining a fourth image to be detected that satisfies a third preset condition based on the eye-opening threshold, the eye-closing threshold, and the second detection values of the multiple frames of images to be detected;

determining the target quantity of the fourth image to be detected whose corresponding first detection value is less than the eye occlusion threshold;

When the number of targets exceeds the eye-opening frame number threshold, it is determined that the living body detection is passed.
A living body detection device, characterized in that it comprises:

An acquisition module, configured to respond to a liveness detection request, and acquire multiple frames of images to be detected corresponding to a target action, wherein the target action is an action instructed by a user during liveness detection;

A determining module, configured to determine detection values corresponding to each frame of the image to be detected and used to indicate the completion of the target action based on the feature point position information matching the target action in each frame of the image to be detected;

The detection module is configured to obtain a living body detection result based on the detection scheme matched with the target action and the detection value.
A computer device, characterized in that it includes: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the computer device is running, the connection between the processor and the memory The machine-readable instructions execute the steps of the living body detection method according to any one of claims 1 to 10 when executed by the processor.
A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the living body detection method according to any one of claims 1 to 10 are executed .