WO2020248780A1

WO2020248780A1 - Living body testing method and apparatus, electronic device and readable storage medium

Info

Publication number: WO2020248780A1
Application number: PCT/CN2020/091047
Authority: WO
Inventors: 王鹏; 姚聪; 卢江虎; 李念
Original assignee: 北京迈格威科技有限公司
Priority date: 2019-06-13
Filing date: 2020-05-19
Publication date: 2020-12-17
Also published as: CN110378219B; CN110378219A

Abstract

The embodiment of the present application relates to the technical field of data processing. Provided are a living body testing method and apparatus, an electronic device and a readable storage medium. The living body testing method comprises: extracting multiple frames of video images from a video collected for an object to be tested; for each frame of a video image in the multiple frames of video images, determining, according to features of the frame of a video image, a first probability that the frame of a video image represents whether the object to be tested is a living body; and determining whether the object to be tested is a living body, according to first probabilities respectively corresponding to the multiple frames of video images. According to the living body testing method provided in the present application, the accuracy of testing a living body can be improved.

Description

Living body detection method, device, electronic equipment and readable storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201910512041.1, and the invention title is "live detection methods, devices, electronic equipment and readable storage media" on June 13, 2019. The reference is incorporated in this application.

Technical field

The embodiments of the present application relate to the field of data processing technology, and in particular, to a living body detection method, device, electronic device, and readable storage medium.

Background technique

As the identification technology in the field of data processing technology is widely used in security, finance and other fields, such as face recognition, palmprint recognition or fingerprint recognition based access control unlocking, mobile phone unlocking, remote payment, remote account opening, etc., identification technology Security is getting more and more attention. For example, people will pay attention to how to determine that the recognition object comes from a real person when the recognition object is recognized through the device. For this reason, related technologies have proposed live detection methods.

Taking face recognition technology as an example, when performing live detection on face images, the detection method proposed by related technologies is: first, the object to be detected is required to complete specified facial actions such as opening the mouth and blinking in front of the camera, and the camera captures the specified facial actions. Based on the face image, the processor determines whether the object to be detected in the face image is a living body. However, facial actions such as opening the mouth and blinking will affect the accuracy of face recognition and reduce the user experience. And whether it is for face recognition or palmprint recognition, etc., live detection is performed based on a single image, and the accuracy of live detection is low.

Summary of the invention

The embodiments of the present application provide a living body detection method, device, electronic device, and readable storage medium, aiming to improve the accuracy of living body detection.

The first aspect of the embodiments of the present application provides a living body detection method, the method including:

Extract multiple frames of video images from the video collected for the object to be detected;

For each frame of video image in the multiple frames of video images, according to the characteristics of the frame of video image, determine the first probability that the frame of video image represents whether the object to be detected is a living body;

According to the first probability corresponding to each of the multiple frames of video images, it is determined whether the object to be detected is a living body.

Optionally, the method further includes:

Determining, according to the inter-frame correlation of the multiple frames of video images, the second probability that the inter-frame correlation represents whether the object to be detected is a living body;

According to the first probability corresponding to each of the multiple frames of video images, determining whether the object to be detected is a living body includes:

According to the second probability and the first probability corresponding to each of the multiple frames of video images, it is determined whether the object to be detected is a living body.

Optionally, the method further includes:

The respective characteristics of the multiple frames of video images are spliced to obtain the characteristics of the video, and the video characteristics are used to characterize the inter-frame correlation.

Optionally, determining whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images includes:

Assigning weights to the second probability and the first probability corresponding to each of the multiple frames of video images, wherein the weight corresponding to the second probability is greater than the weight corresponding to each of the first probabilities;

According to the second probability and the corresponding probability, and the first probability and the corresponding weight corresponding to each of the multiple frames of video images, it is determined whether the object to be detected is a living body.

Optionally, the method further includes:

Obtain a sample video set, where the sample set includes a plurality of sample videos carrying tags, and the tags carried by the sample videos represent whether the sample videos are videos collected for a living body;

For each sample video with a mark included in the sample video set, the following steps are performed:

Extracting multiple frames of sample video images from the sample video carrying the mark;

Input each frame of sample video image in the multi-frame sample video image into the convolutional layer of the model to be trained to obtain the characteristics of the frame sample video image;

The characteristics of the sample video image of the frame are input into the first fully connected layer of the model to be trained, and the third probability corresponding to the sample video image of the frame is obtained. The third probability characterizes whether the sample video image of the frame is derived from a live body Video of

Input the third probability corresponding to each of the multiple frames of sample video images into the second fully connected layer of the model to be trained to obtain the estimated probability of whether the sample video is a video collected for a living body;

Establish a loss function according to the estimated probability and the third probability corresponding to each of the multi-frame sample video images, so as to update the model to be trained and obtain a live detection model;

For each frame of video image in the multiple frames of video images, determining the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image includes:

Inputting each frame of the video image of the multiple frames of video images into the convolutional layer of the living body detection model to obtain the characteristics of the frame of video image;

The feature of the frame of video image is input into the first fully connected layer of the living body detection model to determine the first probability that the frame of video image represents whether the object to be detected is a living body.

Optionally, the method further includes:

Splicing the respective features of the multiple frames of sample video images to obtain the features of the sample video;

Input the features of the sample video into the third fully connected layer of the model to be trained to obtain the fourth probability of whether the sample video is a video collected for a living body;

Inputting the third probability corresponding to each of the multiple frames of sample video images into the second fully connected layer of the model to be trained to obtain whether the sample video is an estimated probability of a video collected from a living body includes:

The fourth probability and the third probability corresponding to the multi-frame sample video image are input into the second fully connected layer of the model to be trained to obtain the estimated probability of whether the sample video is a video collected by a living body .

Optionally, determining whether the object to be detected is a living body according to the first probability corresponding to each of the multiple frames of video images includes:

The first probability corresponding to each of the multiple frames of video images is input into the second fully connected layer of the living body detection model to determine whether the object to be detected is a living body.

Optionally, the method further includes:

Obtain the video collected by the video collection device when the object to be detected is in a silent state.

A second aspect of the embodiments of the present application provides a living body detection device, which includes:

The first extraction module is configured to extract multiple frames of video images from the video collected for the object to be detected;

The first determining module is configured to, for each frame of video image in the multiple frames of video images, determine the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image;

The second determining module is configured to determine whether the object to be detected is a living body according to the first probability corresponding to each of the multiple frames of video images.

Optionally, the device further includes:

A third determining module, configured to determine, according to the inter-frame correlation of the multiple frames of video images, the second probability that the inter-frame correlation characterizes whether the object to be detected is a living body;

The second determining module includes:

The first determination submodule is configured to determine whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images.

Optionally, the device further includes:

The first splicing module is used to splice the respective characteristics of the multiple frames of video images to obtain the characteristics of the video, and the video characteristics are used to characterize the inter-frame correlation.

Optionally, the first determining submodule includes:

An allocation subunit, configured to allocate weights to the first probabilities corresponding to the second probability and the multiple frames of video images, wherein the weight corresponding to the second probability is greater than the weight corresponding to each of the first probabilities;

The determining subunit is configured to determine whether the object to be detected is a living body according to the second probability and its corresponding probability, and the first probability and its corresponding weight corresponding to each of the multiple frames of video images.

Optionally, the device further includes:

The first obtaining module is configured to obtain a sample video set, the sample set includes a plurality of sample videos carrying tags, and the tags carried by the sample videos indicate whether the sample video is a video collected for a living body;

The second extraction module is configured to extract a multi-frame sample video image from the sample video with a mark for each sample video with a mark included in the sample video set;

The first input module is configured to input each frame of the sample video image in the multi-frame sample video image into the convolutional layer of the model to be trained to obtain the characteristics of the frame sample video image;

The second input module is used to input the characteristics of the frame sample video image into the first fully connected layer of the model to be trained to obtain a third probability corresponding to the frame sample video image, and the third probability represents the frame sample video image Whether it comes from a video collected from a living body;

The third input module is configured to input the third probability corresponding to each of the multi-frame sample video images into the second fully connected layer of the model to be trained to obtain an estimate of whether the sample video is a video collected by a living body Probability

The second obtaining module is configured to establish a loss function according to the estimated probability and the third probability corresponding to each of the multi-frame sample video images, so as to update the model to be trained and obtain a live detection model;

The first determining module includes:

The first input submodule is configured to input each frame of the video image in the multi-frame video image into the convolutional layer of the living body detection model to obtain the characteristics of the frame of video image;

The second input submodule is configured to input the characteristics of the frame of video image into the first fully connected layer of the living body detection model to determine the first probability that the frame of video image represents whether the object to be detected is a living body.

Optionally, the device further includes:

The second splicing module is used to splice the respective characteristics of the multiple frames of sample video images to obtain the characteristics of the sample video;

The fourth input module is configured to input the characteristics of the sample video into the third fully connected layer of the model to be trained to obtain the fourth probability of whether the sample video is a video collected for a living body;

The third input module includes:

The third input sub-module is used to input the fourth probability and the third probability corresponding to each of the multi-frame sample video images into the second fully connected layer of the model to be trained to obtain whether the sample video is for a living body The estimated probability of the captured video.

Optionally, the third determining module includes:

The fourth input sub-module is configured to input the first probability corresponding to each of the multiple frames of video images into the second fully connected layer of the living body detection model to determine whether the object to be detected is a living body.

Optionally, the device further includes:

The third obtaining module is used to obtain the video captured by the video capturing device when the object to be detected is in a silent state.

A third aspect of the embodiments of the present application provides a readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps in the method described in the first aspect of the present application are implemented.

A fourth aspect of the embodiments of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor implements the method described in the first aspect of the present application when executed A step of.

Using the living body detection method provided in this application, by extracting multiple frames of video images from the video collected for the object to be detected, for each frame of video image, according to the characteristics of the frame of video image, it is determined whether the frame of video image represents the object to be detected It is the first probability of a living body, and finally, according to the determined multiple first probabilities, it is comprehensively determined whether the object to be detected is a living body.

On the one hand, because the living body detection method provided in this application is based on a video collected for the object to be detected, the living body detection is performed. Specifically, extract multiple frames of video images from the video, use multiple frames of video images to characterize the video, and then for each frame of video image, according to the characteristics of the frame of video image, determine whether the frame of video image represents whether the object to be detected is a living body Finally, according to the determined multiple first probabilities, comprehensively determine whether the object to be detected is a living body. Compared with the prior art that performs live body detection for a single image, this application uses video as the basis to perform live body detection, and the detection result is more accurate.

On the other hand, since the living body detection method provided by the present application extracts multiple frames of video images from the video collected by the object to be detected, the redundant information of the video can be reduced, thereby reducing the amount of calculation and improving the detection efficiency.

On the other hand, the living body detection method provided by the present application does not require the subject to be detected to complete specified facial actions such as opening mouth and blinking in front of the camera, which can not only avoid the impact of facial actions on the accuracy of face recognition, but also make it unnecessary for users to make In the case of designated facial movements, complete the living body detection, thereby improving the user experience.

The above description is only an overview of the technical solution of the present invention. In order to understand the technical means of the present invention more clearly, it can be implemented in accordance with the content of the description, and in order to make the above and other objectives, features and advantages of the present invention more obvious and understandable. In the following, specific embodiments of the present invention are specifically cited.

Description of the drawings

In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings used in the description of the embodiments or the prior art. Obviously, the drawings in the following description These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.

FIG. 1 is a schematic diagram of a training process of a living body detection model in an embodiment of the present application;

2 is a flowchart of a living body detection method proposed in an embodiment of the present application;

FIG. 3 is another flowchart of the living body detection method proposed by an embodiment of the present application;

4 is a schematic diagram of a living body detection device provided by an embodiment of the present application;

FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of the embodiments of the present invention, not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

In application scenarios such as access control unlocking, mobile phone unlocking, remote payment, and remote account opening based on identity recognition technology, the device needs to collect the user's fingerprint or palm print, or needs to capture the user's face or palm print and other identification objects. Take the face or palm print of the photographed object as an example, in order to prevent the attacker from showing the face photo or palm print photo of another person to the camera, causing the attacker to pass the verification without the permission of others. In the case of unauthorized access to another person’s account or account, it is necessary to perform a live detection of the face or palm print in the photo taken by the camera to determine whether it is from a real person, that is, to determine whether it is from a living body.

A living body judgment method provided by related technologies: First, the object to be detected is required to complete specified facial actions such as opening mouth and blinking in front of the lens. The lens collects a face image for the specified facial action, and the processor judges the face image based on the face image. Whether the object to be detected in the face image is a living body. However, facial actions such as opening the mouth and blinking will affect the accuracy of face recognition and reduce the user experience. And whether it is for face recognition or palmprint recognition, etc., live detection is performed based on a single image, and the accuracy of live detection is low.

In order to improve the accuracy of live detection, the applicant proposes to perform live detection based on a video collected for the object to be detected. In order to characterize the video, this application extracts multiple frames of video images from the video, and then for each frame of video image, according to the characteristics of the frame of video image, it is determined whether the frame of video image characterizing the object to be detected is the first of a living body. Probability, and finally according to the determined multiple first probabilities, comprehensively determine whether the object to be detected is a living body. Compared with the prior art that performs live body detection for a single image, this application uses video as the basis to perform live body detection, and the detection result is more accurate.

In order to implement the above method proposed by the applicant more intelligently and make the application of the method wider, the applicant first constructed the model to be trained, and trained the model to be trained based on the sample video set to obtain a live detection model (for example : The first living body detection model or the second living body detection model described below), the applicant uses the living body detection model to perform part or all of the steps in the above method.

Referring to FIG. 1, FIG. 1 is a schematic diagram of a training process of a living body detection model in an embodiment of the present application. In Figure 1, the living body detection model includes: a convolutional layer, a first fully connected layer, and a second fully connected layer. Among them, the convolutional layer can specifically adopt a convolutional neural network. It should be understood that the model structure of the model to be trained pre-built by the applicant is the same as the model structure of the living body detection model shown in Figure 1. The model to be trained also includes a convolutional layer, a first fully connected layer, and a second fully connected layer. After training, the model parameters of the model to be trained are updated and adjusted, and finally the live detection model is obtained.

In order to train the model to be trained to obtain the living body detection model, an embodiment of the present application proposes the following steps. It should be noted in advance that, in the following steps, the sample video set is a sample video set about a human face as an example, and each step is introduced. It should be understood that the type of sample video set is not limited to the sample video set about human faces. For example, it can also be a sample video set about palm prints. If the training model is trained based on the sample video set about palm prints, the final result is The living body detection model can be used for living body detection for palmprint videos.

S110: Obtain a sample video set, the sample set includes a plurality of sample videos carrying tags, and the tags carried by the sample videos indicate whether the sample videos are videos collected for a living body.

In this embodiment, part or all of the sample videos in the sample video set may be videos collected by the video collection device when the training participant is in a silent state. When collecting videos of the training participants, the training participants only need to look at the video collection device, and the training participants are not required to complete designated facial actions such as opening their mouths, blinking eyes, and reading aloud in front of the camera.

For example, a silent video can be taken for the face of each training participant among multiple training participants (real people). The duration of the video can be controlled within 1 to 3 seconds, and such videos taken for real people can be labeled , Make this kind of video carry a tag, which indicates that this kind of video is a video collected from a living body. You can shoot a video for each of the non-living bodies such as multiple printed photos, screen photos, and masks. The length of the video can be controlled within 1 to 3 seconds. This type of video shot for non-living bodies can be marked to make This type of video carries a tag that indicates that the video is not a video collected from a living body.

S120: For each sample video with a mark included in the sample video set, extract multiple frames of sample video images from the sample video with a mark.

For example, for each sample video carrying a mark, it can be first divided into N sub-segments, and then a frame of RGB video image is extracted from each sub-segment as the sample video image, and finally from each sample carrying the mark A total of N frames of sample video images can be extracted from the video.

S130: Input each frame of the sample video image in the multi-frame sample video image into the convolutional layer of the model to be trained to obtain the characteristics of the frame sample video image.

For example, N frames of sample video images can be sequentially input to the convolutional neural network of the model to be trained, and the convolutional neural network outputs a three-dimensional convolution feature for each frame of sample video image, that is, the feature of the sample video image of the frame. It should be understood that multiple frames of sample video images can share a convolutional neural network, and each frame of sample video images in the multiple frame of sample video images can also correspond to a convolutional neural network. Therefore, the number of convolutional neural networks included in the model to be trained may be one or multiple.

S140: Input the features of the sample frame of video image into the first fully connected layer of the model to be trained to obtain a third probability corresponding to the sample frame of video image, and the third probability represents whether the sample frame of video image is derived from a living body The captured video.

For example, each feature of the N frames of sample video images can be sequentially input to the first fully connected layer, and the first fully connected layer outputs a probability vector of the shape (x, y) for the features of each frame of sample video image, namely In the third probability, x represents the probability that the sample video image of the frame is derived from a video collected for a living body, and y represents the probability that the sample video image of the frame is derived from a video collected for a non-living body.

S150: Input the third probability corresponding to each of the multiple frames of sample video images into the second fully connected layer of the model to be trained to obtain the estimated probability of whether the sample video is a video collected for a living body.

For example, N third probabilities corresponding to N frames of sample video images can be input to the second fully connected layer, and the second fully connected layer outputs a probability vector of the shape (X, Y) for the N third probabilities, namely The estimated probability, where X represents the probability that the sample video is a video collected for a living body, and Y represents the probability that the sample video is a video collected for a non-living body.

S160: Establish a loss function according to the estimated probability and the third probability corresponding to each of the multi-frame sample video images, so as to update the model to be trained and obtain a live detection model.

For example, based on the estimated probability, such as a probability vector of (X, Y), and a third probability corresponding to each of the multi-frame sample video images, such as a probability vector of (x, y), the loss function is established, Using the gradient descent method, update the parameters of the model to be trained, and put the updated model to be trained into the next round of training. After multiple rounds of training, a live detection model is obtained. For example, after a fixed M round of training, such as 1000 rounds of training, the training ends, and the living body detection model is obtained. For another example, when the loss function of multiple consecutive rounds of training reflects that the model to be trained can accurately predict whether the sample video is a live body, the training is ended to obtain a live body detection model.

For example, the implementation manner of establishing the loss function may be: comparing the N third probabilities with the markers carried by the sample video respectively, where each third probability is the prediction result, and the marker carried by the sample video represents the real situation, and the Nth probabilities are obtained. A comparison result, the N first comparison results can represent the accuracy of the sample video prediction of the model to be trained in this round of training. Then compare the estimated probability with the mark carried by the sample video, where the estimated probability is the predicted result, and the mark carried by the sample video represents the real situation, and a second comparison result is obtained. The second comparison result can also represent the model to be trained. In rounds of training, the accuracy of the sample video prediction.

Finally, according to the second comparison result and the N first comparison results, the parameters of the model to be trained are adjusted to update the model to be trained. Put the updated model to be trained into the next round of training, and after multiple rounds of training, a live detection model is obtained.

By considering the estimated probability corresponding to the sample video and the third probability corresponding to each frame of the sample video image when establishing the loss function, on the one hand, the convergence speed of the model to be trained can be accelerated, and on the other hand, the model not only Based on the prediction accuracy of the model to be trained on the sample video, the parameters of the model to be trained are updated, and the parameters of the model to be trained are updated based on the prediction accuracy of each frame of the sample video image of the model to be trained, so that the final live detection The model can output more accurate prediction results.

By performing step S110 to step S160, the first living body detection model is obtained. During the application of the first living body detection model, some or all of the following steps can be performed: Extract from the video collected for the object to be detected Generate multiple frames of video images, and then for each frame of video image, according to the characteristics of the frame of video image, determine the first probability that the frame of video image represents whether the object to be detected is a living body, and finally according to the determined multiple first probabilities, Comprehensively determine whether the object to be detected is a living body.

In order to further improve the accuracy of living body detection, the applicant of this application found that in addition to the multi-frame video image extracted from a video can characterize the video, the inter-frame correlation of the multi-frame video image can also be used to characterize the video. If multiple frames of video images and the inter-frame correlation of the multi-frame video images are used to characterize the video at the same time, the inter-frame correlation can be further introduced when performing living detection, which can further improve the accuracy of living detection.

Based on the above findings, the applicant further proposes that the inter-frame correlation is introduced into the living body detection method, and the second probability that the inter-frame correlation characterizes whether the object to be detected is a living body is determined first, and then the second probability and the multi-frame video image The first probability corresponding to each determines whether the object to be detected is a living body. Thereby further improving the accuracy of living body detection.

In order to implement the above method proposed by the applicant more intelligently and make the application of the method wider, the applicant first constructed the model to be trained, and trained the model to be trained based on the sample video set to obtain the live detection model. The applicant uses the living body detection model to perform some or all of the steps in the method further proposed above.

Please continue to refer to FIG. 1. In FIG. 1, the living body detection model may also include: a feature combination module and a third fully connected layer. It should be understood that the model structure of the model to be trained constructed by the applicant in advance is the same as the model structure of the living body detection model shown in FIG. 1. The model to be trained may also include a feature combination module and a third fully connected layer, and after training , The model parameters of the model to be trained are updated and adjusted, and finally the live detection model is obtained.

In order to train the training model to obtain the living body detection model, an embodiment of the present application further proposes steps S142, S144, and S150' on the basis of steps S110, S120, S130, S140, and S160. What needs to be explained in advance S130, S140, S142, S144, S150' and S160 are the steps of each round of training in multiple rounds:

S142: Splicing each feature of the multiple frames of sample video images to obtain the feature of the sample video.

For example, after obtaining N three-dimensional convolution features through step S130, the feature combination module can be used to stack the N three-dimensional convolution features to obtain a new three-dimensional convolution feature as the sample video Features. The features of the sample video can characterize the inter-frame correlation of multiple frames of sample video images. For example, after step S130, 8 36*36*25 convolution features are obtained, and after these 8 36*36*25 convolution features are stacked, a 36*36*200 convolution feature is obtained. The 36*200 convolution feature is used as the feature of the sample video.

S144: Input the characteristics of the sample video into the third fully connected layer of the model to be trained, and obtain a fourth probability of whether the sample video is a video collected for a living body.

For example, the characteristics of the sample video can be input to the third fully connected layer, and the third fully connected layer outputs the probability vector of the shape (x', y') for the characteristics of the sample video, that is, the fourth probability, where x 'Represents the probability that the sample video is a video collected for a living body, and y'represents the probability that the sample video is a video collected for a non-living body.

S150': Input the third probability corresponding to each of the multiple frames of sample video images into the second fully connected layer of the model to be trained to obtain the estimated probability of whether the sample video is a video collected by a living body. This step specifically includes: inputting the fourth probability and the third probability corresponding to each of the multi-frame sample video images into the second fully connected layer of the model to be trained to obtain whether the sample video is collected for a living body The estimated probability of the video.

For example, the fourth probability corresponding to the sample video and the N third probabilities corresponding to the N frames of the sample video image can be input to the second fully connected layer, and the second fully connected layer outputs a form such as (X , Y) is the estimated probability, where X represents the probability that the sample video is a video collected for a living body, and Y represents the probability that the sample video is a video collected for a non-living body.

By performing steps S110, S120, S130, S140, S142, S144, S150' and S160, a second living body detection model is obtained. During the application of the second living body detection model, some or all of the following steps can be executed : Extract multiple frames of video images from the video collected for the object to be detected, for each frame of video image, according to the characteristics of the frame of video image, determine the first probability that the frame of video image represents whether the object to be detected is a living body; and The inter-frame correlation of the multi-frame video image determines the second probability that the inter-frame correlation characterizes whether the object to be detected is a living body, and finally the object to be detected is determined according to the second probability and the first probability corresponding to each of the multi-frame video images Whether it is alive.

The foregoing embodiments of the present application mainly propose two training processes for the model to be trained based on a sample video set, and finally obtain the first living body detection model and the second living body detection model respectively. Hereinafter, this application will focus on the live body detection method, and schematically introduce how to apply the first live body detection model or the second live body detection model to the live body detection method.

Referring to FIG. 2, FIG. 2 is a flowchart of a living body detection method proposed in an embodiment of the present application. As shown in Figure 2, the method includes the following steps:

S22: Extract multiple frames of video images from the video collected for the object to be detected.

In this embodiment, the object to be detected refers to an object that needs to be detected whether it is a living body. For example, the object to be detected is not limited to only the face to be detected. For example, the object to be detected may also be a palm print or fingerprint to be detected. If the object to be detected is a palm print, the video collected for the object to be detected is a video shot for the palm print to be detected.

In this embodiment, the method further includes: obtaining a video collected by the video collection device when the object to be detected is in a silent state.

In other words, the video collected for the object to be detected may be a silent video collected for the object to be detected. For example, when the object to be detected is in a silent state, a video is collected for the object to be detected, for example, a short video of 1 to 3 seconds is collected for the object to be detected. In this embodiment, when collecting a video from a user, the user only needs to look at the video capture device, and the user is not required to complete specified facial actions such as opening mouth, blinking, and reading aloud in front of the camera, which not only avoids the accuracy of face recognition by facial actions The impact of this can also enable users to complete living body detection without having to make specified facial actions, thereby improving user experience.

In this embodiment, when multiple frames of video images are extracted from a video, they may be extracted at equal intervals between frames, and the extracted video images may be RGB images. For example, for a piece of video, for example, every 5 frames of video image, one frame of video image is extracted. Taking a video including 48 frames of video images as an example, the extracted video images of each frame are: frame 6, frame 12, frame 18, frame 24, frame 30, frame 36, frame 42 and frame 48 frames.

Alternatively, in this embodiment, when extracting multiple frames of video images from a video, the video may be divided into multiple sub-segments first, and then one frame of video image is extracted from each sub-segment. For example, for a piece of video, for example, the video is equally divided into N sub-segments, and for each sub-segment, one frame of video image is randomly extracted therefrom, or one frame of video image is extracted from the middle of the sub-segment.

In the above embodiments, multiple frames of video images are extracted at equal intervals between frames, or by dividing the video into multiple sub-segments, and then extracting one frame of video images from each sub-segment, so that the extracted multiple frames of video images are evenly distributed Among the video images in the video, multiple frames of video images can more accurately characterize the content of the video, thereby further improving the accuracy of living body detection.

In this embodiment, the multi-frame video image extracted from the video collected for the object to be detected is characterized by using the multi-frame video image to characterize the video, so that the living body detection method proposed in this application is based on the video. Detection. Compared with the prior art that performs live body detection for a single image, this application uses video as the basis to perform live body detection, and the detection result is more accurate. In addition, since this application extracts multiple frames of video images from the video collected by the object to be detected, the redundant information of the video can be reduced, thereby reducing the amount of calculation and improving the detection efficiency.

S24: For each frame of video image in the multiple frames of video images, determine the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image.

In this embodiment, the feature of the video image may be a convolution feature. For example, in order to determine the first probability corresponding to each frame of video image according to the characteristics of each frame of video image, the first living body detection model obtained through training may be used. Specifically, each frame of the video image in the multi-frame video image is input into the convolutional layer of the living body detection model to obtain the characteristics of the frame of video image; and then the characteristics of the frame video image are input into the living body detection model. The first fully connected layer is used to determine the first probability that the frame of video image represents whether the object to be detected is a living body.

Wherein, the first probability corresponding to each frame of video image may be a probability vector in the form of (x, y), where x represents the probability that the object to be detected is a living body, and y represents the probability that the object to be detected is inanimate.

In practical applications, the feature of each frame of video image can be obtained through a convolutional neural network, or other image feature extraction methods can be used to extract the feature of each frame of video image. Then, the feature of each frame of video image is input into the first fully connected layer of the first living body detection model to determine the first probability that the frame of video image represents whether the object to be detected is a living body.

S26: Determine whether the object to be detected is a living body according to the first probability corresponding to each of the multiple frames of video images.

For example, the first living body detection model obtained through training may be used to determine whether the object to be detected is a living body. Specifically, the first probability corresponding to each of the multiple frames of video images is input into the second fully connected layer of the living body detection model to determine whether the object to be detected is a living body. For example, each frame of video image in the multiple frames of video images is input to the convolutional layer in the living body detection model, and the convolutional layer outputs the characteristics of each frame of video image; the characteristics of each frame of video image are then input to the living body detection model The first fully connected layer, the first fully connected layer outputs the first probability corresponding to each frame of video image; the first probability corresponding to each frame of video image is then input to the second fully connected layer of the live detection model, the second fully connected The layer outputs an estimated probability, which is a comprehensive probability that characterizes whether the object to be detected is a living body.

Among them, the estimated probability may be a probability vector in the form of (X, Y), where X represents the comprehensive probability that the object to be detected is a living body, and Y represents the comprehensive probability that the object to be detected is a non-living body. By comparing the size relationship between X and Y, when X is greater than Y, it can be determined that the object to be detected is a living body.

Or as an example, after obtaining the respective first probabilities of the multiple frames of video images through step S24, the average value of the multiple first probabilities may be calculated to determine whether the object to be detected is a living body. For example, the first probability corresponding to multiple frames of video images is a probability vector of the shape (x, y), where x represents the probability that the object to be detected is a living body, and y represents the probability that the object to be detected is non-living. Suppose the probability vectors corresponding to the 8 frames of video images extracted from the video are: (35.9,13.0), (43.2,5.6), (34.7,14.3), (44.6,5.4), (58.6,2.1), (41.8) ,6.7), (29.2,17.8), (21.4,22.8), based on the above 8 probability vectors, the integrated average probability vector is calculated as (38.7,11.0), where the probability that the object to be detected is alive is greater than that of the object to be detected The probability that the object is a non-living body, determines that the object to be detected is a living body.

By performing step S22, step S24, and step S26, a live body detection is performed based on a video collected for the object to be detected. Specifically, extract multiple frames of video images from the video, use multiple frames of video images to characterize the video, and then for each frame of video image, according to the characteristics of the frame of video image, determine whether the frame of video image represents whether the object to be detected is a living body Finally, according to the determined multiple first probabilities, comprehensively determine whether the object to be detected is a living body. Compared with the prior art that performs live body detection for a single image, this application uses video as the basis to perform live body detection, and the detection result is more accurate.

The applicant of this application found that in addition to multiple frames of video images extracted from a piece of video, the inter-frame correlation of multiple frames of video images can also be used to characterize the piece of video. If multiple frames of video images and multiple The inter-frame correlation of a frame video image characterizes the video. When performing live detection, further introducing inter-frame correlation can further improve the accuracy of live detection.

In order to further improve the accuracy of the living body detection, refer to FIG. 3, which is another flowchart of the living body detection method proposed in an embodiment of the present application. As shown in Figure 3, the method includes the following steps:

S25: According to the inter-frame correlation of the multi-frame video image, determine the second probability that the inter-frame correlation characterizes whether the object to be detected is a living body;

In this embodiment, the inter-frame correlation refers to: information between frames of a multi-frame video image. Specifically, for each frame of video images in a multi-frame video image, the feature of the frame of video image can be extracted, and the respective features of the multi-frame video image can be spliced to obtain the feature of the video, and the video feature is used to characterize all the video images. The inter-frame correlation.

For example, multiple frames of video images can be input to the convolutional layer of the second living detection model obtained through training, and the convolutional layer outputs the three-dimensional convolution feature of each frame of video image, and the three-dimensional convolution feature is It is a feature of video images. Then stacking a plurality of three-dimensional convolution features to obtain a new three-dimensional convolution feature as the feature of the video, and the video feature is used to characterize the inter-frame correlation. For example, after 8 frames of video images are input to the living body detection model, the convolutional layer of the living body detection model outputs 8 36*36*25 convolutional features, and the feature combination module of the living body detection model combines these 8 36*36*25 After stacking the convolution features of, a 36*36*200 convolution feature is obtained, and the 36*36*200 convolution feature is used as the video feature.

In this embodiment, in order to determine the second probability that the inter-frame correlation characterizes whether the object to be detected is a living body, the video feature characterizing the inter-frame correlation may be input to the second living body detection model obtained through training. The third fully connected layer outputs a second probability that characterizes whether the object to be detected is a living body according to the video feature. The second probability may be a probability vector in the form of (x', y'), where x'represents the probability that the object to be detected is a living body, and y'represents the probability that the object to be detected is non-living.

S26': Determine whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images.

For example, when determining whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images, the second living body detection model obtained through training can be used to Determine whether the object to be detected is alive. Specifically, the second probability of the inter-frame correlation characterizing whether the object to be detected is a living body, and the first probability corresponding to each of the multiple frames of video images may be input to the second fully connected layer of the second living body detection model, and the second fully connected layer The connection layer outputs an estimated probability, which is a comprehensive probability that characterizes whether the object to be detected is a living body.

Or as an example, determining whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images may specifically include:

S26'-1: Assign weights to the first probabilities corresponding to the second probability and the multiple frames of video images, wherein the weight corresponding to the second probability is greater than the weight corresponding to each of the first probabilities;

S26'-2: Determine whether the object to be detected is a living body according to the second probability and its corresponding probability, and the first probability and its corresponding weight corresponding to each of the multiple frames of video images.

For example, after 8 frames of video images are input to the living body detection model, they pass through the convolutional layer and the first fully connected layer of the living body detection model, and output the first probabilities corresponding to each of the 8 frames of video images. Suppose the 8 first probabilities are: (35.9 ,13.0), (43.2,5.6), (34.7,14.3), (44.6,5.4), (58.6,2.1), (41.8,6.7), (29.2,17.8), (21.4,22.8). The output features of the convolutional layer for each frame of video image are all 36*36*25 convolution features. The feature combination module stacks these 8 36*36*25 convolution features to obtain a 36*36*200 The convolution feature of 36*36*200 is used as the feature of the video, which represents the inter-frame correlation of multiple frames of video images. After the feature of the video passes through the third fully connected layer of the live detection model, the second probability is output, assuming that the second probability is (50.1, 3.5).

Then, the second probability and the first probability corresponding to the multiple frames of video images are assigned weights. For example, the weight assigned to the second probability is 1/2, and the weight assigned to each first probability is 1/16. The weighted average probability is calculated according to the second probability and its corresponding probability, and the first probability and its corresponding weight corresponding to each of the multiple frames of video images, and whether the object to be detected is a living body is determined according to the weighted average probability. Specifically, after weighted average calculation, the weighted average probability obtained is (44.4, 7.3), where the probability that the object to be detected is alive is greater than the probability that the object to be detected is non-living, and it is determined that the object to be detected is alive.

By performing step S26'-1 and step S26'-2 to assign a larger weight to the second probability, it is possible to highlight the proportion of the inter-frame correlation of multi-frame video images in characterizing a piece of video information, as well as in the live detection process It plays a role in improving the accuracy of detection, thereby further improving the accuracy of live detection.

It should be understood that the numerical values listed in the foregoing embodiments of the present application, such as the specific numerical values of the first probability and the second probability, and the numerical values of each dimension of the convolution feature, are all illustrative values and are used for illustrative purposes. The steps of each embodiment are explained.

Based on the same inventive concept, an embodiment of the present application provides a living body detection device. Referring to FIG. 4, FIG. 4 is a schematic diagram of a living body detection device provided by an embodiment of the present application. As shown in Figure 4, the device includes:

The first extraction module 41 is configured to extract multiple frames of video images from the video collected for the object to be detected;

The first determining module 42 is configured to, for each frame of video image in the multi-frame video image, determine the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image;

The second determining module 43 is configured to determine whether the object to be detected is a living body according to the first probability corresponding to each of the multiple frames of video images.

Optionally, the device further includes:

The second determining module includes:

Optionally, the device further includes:

Optionally, the first determining submodule includes:

Optionally, the device further includes:

The first determining module includes:

Optionally, the device further includes:

The third input module includes:

Optionally, the third determining module includes:

Optionally, the device further includes:

Based on the same inventive concept, an embodiment of the present application also provides an electronic device. A schematic structural diagram of the electronic device is shown in FIG. 4. The electronic device 7000 includes at least one processor 7001, a memory 7002, and a bus 7003. The memory 7001 is electrically connected to the storage 7002; the memory 7002 is configured to store at least one computer-executable instruction, and the processor 7001 is configured to execute the at least one computer-executable instruction, so as to execute any one in the first embodiment of the present application. Steps of any living body detection method provided by an embodiment or any optional implementation.

Further, the processor 7001 may be an FPGA (Field-Programmable Gate Array) or other devices with logic processing capabilities, such as MCU (Microcontroller Unit), CPU (Central Process Unit, Central Processing Unit) ).

The application of the embodiments of this application has at least the following beneficial effects:

On the one hand, because the living body detection method provided by the present application is based on a video collected for the object to be detected, the living body detection is performed. Specifically, extract multiple frames of video images from the video, use multiple frames of video images to characterize the video, and then for each frame of video image, according to the characteristics of the frame of video image, determine whether the frame of video image represents whether the object to be detected is a living body Finally, according to the determined multiple first probabilities, comprehensively determine whether the object to be detected is a living body. Compared with the prior art that performs live body detection for a single image, this application uses video as the basis to perform live body detection, and the detection result is more accurate. On the other hand, since the living body detection method provided by the present application extracts multiple frames of video images from the video collected by the object to be detected, the redundant information of the video can be reduced, thereby reducing the amount of calculation and improving the detection efficiency. On the other hand, the living body detection method provided by this application does not require the object to be detected to complete specified facial actions such as opening mouth and blinking in front of the camera, which can not only avoid the impact of facial actions on the accuracy of face recognition, but also make it unnecessary for users to do In the case of designated facial movements, complete the living body detection, thereby improving the user experience.

Based on the same inventive concept, the embodiment of the present application also provides a computer-readable storage medium, such as the memory 7002 in FIG. 4, in which a computer program 7002a is stored, which is used to implement the implementation of the present application when executed by a processor. Steps of any embodiment or any living body detection method in Example 1.

The computer-readable storage medium provided by the embodiments of this application includes but is not limited to any type of disk (including floppy disk, hard disk, optical disk, CD-ROM, and magneto-optical disk), ROM (Read-Only Memory), RAM ( Random Access Memory), EPROM (Erasable Programmable Read-Only Memory, Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), Flash memory, Magnetic Card or light card. That is, a readable storage medium includes any medium that stores or transmits information in a readable form by a device (for example, a computer).

On the one hand, because the living body detection method provided by the present application is based on a video collected for the object to be detected, the living body detection is performed. Specifically, extract multiple frames of video images from the video, use multiple frames of video images to characterize the video, and then for each frame of video image, according to the characteristics of the frame of video image, determine whether the frame of video image represents whether the object to be detected is a living body Finally, according to the determined multiple first probabilities, comprehensively determine whether the object to be detected is a living body. Compared with the prior art that performs live body detection for a single image, this application uses video as the basis to perform live body detection, and the detection result is more accurate. On the other hand, since the living body detection method provided by the present application extracts multiple frames of video images from the video collected by the object to be detected, the redundant information of the video can be reduced, thereby reducing the amount of calculation and improving the detection efficiency. On the other hand, the living body detection method provided by the present application does not require the subject to be detected to complete specified facial actions such as opening mouth and blinking in front of the camera, which can not only avoid the impact of facial actions on the accuracy of face recognition, but also make it unnecessary for users to make In the case of designated facial movements, complete the living body detection, thereby improving the user experience.

Those skilled in the art can understand that computer program instructions can be used to implement each block in these structure diagrams and/or block diagrams and/or flow diagrams and combinations of blocks in these structure diagrams and/or block diagrams and/or flow diagrams. . Those skilled in the art can understand that these computer program instructions can be provided to processors of general-purpose computers, professional computers, or other programmable data processing methods for implementation, so that the computer or other programmable data processing method processors can execute this The structure diagram and/or block diagram and/or flow diagram disclosed in the application or the scheme specified in multiple boxes.

Those skilled in the art can understand that the various operations, methods, and steps, measures, and solutions in the process that have been discussed in this application can be alternated, changed, combined, or deleted. Further, various operations, methods, and other steps, measures, and solutions in the process that have been discussed in this application can also be alternated, changed, rearranged, decomposed, combined, or deleted. Further, steps, measures, and schemes in the prior art that have the various operations, methods, and procedures disclosed in this application can also be alternated, changed, rearranged, decomposed, combined or deleted.

The above are only part of the implementation of this application. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of this application, several improvements and modifications can be made, and these improvements and modifications are also Should be regarded as the scope of protection of this application.

Claims

A living body detection method, characterized in that the method includes:

Extract multiple frames of video images from the video collected for the object to be detected;

For each frame of video image in the multiple frames of video images, according to the characteristics of the frame of video image, determine the first probability that the frame of video image represents whether the object to be detected is a living body;

According to the first probability corresponding to each of the multiple frames of video images, it is determined whether the object to be detected is a living body.
The method of claim 1, wherein the method further comprises:

Determining, according to the inter-frame correlation of the multiple frames of video images, the second probability that the inter-frame correlation represents whether the object to be detected is a living body;

According to the first probability corresponding to each of the multiple frames of video images, determining whether the object to be detected is a living body includes:

According to the second probability and the first probability corresponding to each of the multiple frames of video images, it is determined whether the object to be detected is a living body.
The method of claim 2, wherein the method further comprises:

The respective characteristics of the multiple frames of video images are spliced to obtain the characteristics of the video, and the video characteristics are used to characterize the inter-frame correlation.
The method according to claim 2, wherein the determining whether the object to be detected is a living body according to the second probability and the first probability corresponding to each of the multiple frames of video images comprises:

Assigning weights to the second probability and the first probability corresponding to each of the multiple frames of video images, wherein the weight corresponding to the second probability is greater than the weight corresponding to each of the first probabilities;

According to the second probability and the corresponding probability, and the first probability and the corresponding weight corresponding to each of the multiple frames of video images, it is determined whether the object to be detected is a living body.
The method of claim 1, wherein the method further comprises:

Obtain a sample video set, where the sample set includes a plurality of sample videos carrying tags, and the tags carried by the sample videos represent whether the sample videos are videos collected for a living body;

For each sample video with a mark included in the sample video set, the following steps are performed:

Extracting multiple frames of sample video images from the sample video carrying the mark;

Input each frame of sample video image in the multi-frame sample video image into the convolutional layer of the model to be trained to obtain the characteristics of the frame sample video image;

The characteristics of the sample video image of the frame are input into the first fully connected layer of the model to be trained, and the third probability corresponding to the sample video image of the frame is obtained. The third probability characterizes whether the sample video image of the frame is derived from a live body Video of

Input the third probability corresponding to each of the multiple frames of sample video images into the second fully connected layer of the model to be trained to obtain the estimated probability of whether the sample video is a video collected for a living body;

Establish a loss function according to the estimated probability and the third probability corresponding to each of the multi-frame sample video images, so as to update the model to be trained and obtain a live detection model;

For each frame of video image in the multiple frames of video images, determining the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image includes:

Inputting each frame of the video image of the multiple frames of video images into the convolutional layer of the living body detection model to obtain the characteristics of the frame of video image;

The feature of the frame of video image is input into the first fully connected layer of the living body detection model to determine the first probability that the frame of video image represents whether the object to be detected is a living body.
The method of claim 5, wherein the method further comprises:

Splicing the respective features of the multiple frames of sample video images to obtain the features of the sample video;

Input the features of the sample video into the third fully connected layer of the model to be trained to obtain the fourth probability of whether the sample video is a video collected for a living body;

Inputting the third probability corresponding to each of the multiple frames of sample video images into the second fully connected layer of the model to be trained to obtain whether the sample video is an estimated probability of a video collected from a living body includes:

The fourth probability and the third probability corresponding to the multi-frame sample video image are input into the second fully connected layer of the model to be trained to obtain the estimated probability of whether the sample video is a video collected by a living body .
The method according to claim 5, wherein the determining whether the object to be detected is a living body according to the first probability corresponding to each of the multiple frames of video images comprises:

The first probability corresponding to each of the multiple frames of video images is input into the second fully connected layer of the living body detection model to determine whether the object to be detected is a living body.
The method according to any one of claims 1-7, wherein the method further comprises:

Obtain the video collected by the video collection device when the object to be detected is in a silent state.
A living body detection device, characterized in that the device comprises:

The first extraction module is configured to extract multiple frames of video images from the video collected for the object to be detected;

The first determining module is configured to, for each frame of video image in the multiple frames of video images, determine the first probability that the frame of video image represents whether the object to be detected is a living body according to the characteristics of the frame of video image;

The second determining module is configured to determine whether the object to be detected is a living body according to the first probability corresponding to each of the multiple frames of video images.
A readable storage medium with a computer program stored thereon, wherein the program is executed by a processor to implement the steps in the method according to any one of claims 1-8.
An electronic device, comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the processor implements the method according to any one of claims 1-8 when executed step.
A computer program, comprising computer readable code, when the computer readable code runs on a computing processing device, causing the computing processing device to execute the living body detection method according to any one of claims 1-8.